講義情報/ウェブ工学

ウェブ工学

豊田正史(生産研)
電子情報学専攻, 2007 冬, 月 14:45-16:15

講義内容

10/01 イントロダクション
講義資料
Anna Patterson. Why Writing Your Own Search Engine is Hard. ACM Queue vol.2 no.2, 2004
10/15 大規模検索エンジンの仕組み
Sergey Brin and Lawrence Page. The Anatomy of a Large-Scale Hypertextual Web Search Engine. WWW7, 1998
11/05 ページランク
Page, Lawrence; Brin, Sergey; Motwani, Rajeev; Winograd, Terry. The PageRank Citation Ranking: Bringing Order to the Web. 1999
11/12 ハブ・オーソリティ解析
J. Kleinberg. Authoritative Sources in a Hyperlinked Environment, Proc. 9th ACM-SIAM Symposium on Discrete Algorithms, 1998. Extended version in Journal of the ACM 46(1999). Also appears as IBM Research Report RJ 10076, May 1997.
Krishna Bharat and Monika R. Henzinger. Improved algorithms for topic distillation in a hyperlinked environment, SIGIR '98, 1998.
Soumen Chakrabarti, Byron Dom, Prabhakar Raghavan, Sridhar Rajagopalan David Gibson, Jon Kleinberg. Automatic resource list compilation by analyzing hyperlink structure and associated text, WWW7, 1998. Jeffrey Dean, Monika R. Henzinger. Finding Related Pages in the World Wide Web, WWW8, 1999.
11/19 リンクデータベース, ミラー検出
Keith H. Randall, Raymie Stata, Rajiv Wickremesinghe, Janet L. Wiener. The Link Database: Fast Access to Graphs of the Web, Compaq Systems Research Center, Tech Report: SRC-RR-175, 2001
Andrei Z. Broder, Steven C. Glassman, Mark S. Manasse, Geoffrey Zweig. Syntactic Clustering of the Web, WWW6, 1997
11/26 ウェブグラフ全体の構造
Ravi Kumar, Prabhakar Raghavan, Sridhar Rajagopalan, Andrew Tomkins. Trawling the Web for Emerging Cyber-Communities. WWW8, 1999
Andrei Broder, Ravi Kumar, Farzin Maghoul, Prabhakar Raghavan, Sridhar Rajagopalan, Raymie Stata, Andrew Tomkins, Janet Wiener. Graph structure in the web. WWW9, 2000
12/10 並列クローラー
Junghoo Cho, Hector Garcia-Molina. Parallel crawlers. WWW2002, 2002
12/17 ウェブページの進化
Dennis Fetterly, Mark Manasse, Marc Najork, and Janet Wiener. A Large-Scale Study of the Evolution of Web Pages. WWW2003
Alexandros Ntoulas, Junghoo Cho, Christopher Olston. What's new on the web?: the evolution of the web from a search engine perspective. WWW2004
01/21
02/04

評価

2回のレポート提出による。

11/19 第1回レポート課題 (12/17 〆切)

WWW, SIGIR, SIGKDD等の著名な国際会議において発表された、Webに関係した full paperの中から、興味深いものを1本選び、その内容を6ページ以内でまと めよ。ただし、講義で扱った上記の論文は選ばないこと。以下の項目を 必ず含めること。

レポートはPDFフォーマットで、以下のメールアドレスへ送付すること。 Subject: の先頭に [Web Engineering Report] と記入すること。

toyoda [@] tkl.iis.u-tokyo.ac.jp

12/17 第2回レポート課題発表 (2/4 〆切)

以下の課題から一つを選んでレポートとして提出せよ。

レポートはPDFフォーマットで、以下のメールアドレスへ送付すること。 Subject: の先頭に [Web Engineering Report 2] と記入すること。

toyoda [@] tkl.iis.u-tokyo.ac.jp

Web Engineering

Masashi Toyoda (Institute of Industrial Science)
Information and Communication Engineering, 2007 Winter, Monday 14:45-16:15

Topics

10/01 Introduction
Resume
Anna Patterson. Why Writing Your Own Search Engine is Hard. ACM Queue vol.2 no.2, 2004
10/15 The Anatomy of a Large-Scale Search Engine
Sergey Brin and Lawrence Page. The Anatomy of a Large-Scale Hypertextual Web Search Engine. WWW7, 1998
11/05 PageRank
Page, Lawrence; Brin, Sergey; Motwani, Rajeev; Winograd, Terry. The PageRank Citation Ranking: Bringing Order to the Web. 1999
11/12 Hub and Authority Analysis
J. Kleinberg. Authoritative Sources in a Hyperlinked Environment, Proc. 9th ACM-SIAM Symposium on Discrete Algorithms, 1998. Extended version in Journal of the ACM 46(1999). Also appears as IBM Research Report RJ 10076, May 1997.
Krishna Bharat and Monika R. Henzinger. Improved algorithms for topic distillation in a hyperlinked environment, SIGIR '98, 1998.
Soumen Chakrabarti, Byron Dom, Prabhakar Raghavan, Sridhar Rajagopalan David Gibson, Jon Kleinberg. Automatic resource list compilation by analyzing hyperlink structure and associated text, WWW7, 1998.
Jeffrey Dean, Monika R. Henzinger. Finding Related Pages in the World Wide Web, WWW8, 1999.
11/19 Link Database, and Near Mirror Detection
Keith H. Randall, Raymie Stata, Rajiv Wickremesinghe, Janet L. Wiener. The Link Database: Fast Access to Graphs of the Web, Compaq Systems Research Center, Tech Report: SRC-RR-175, 2001
Andrei Z. Broder, Steven C. Glassman, Mark S. Manasse, Geoffrey Zweig. Syntactic Clustering of the Web, WWW6, 1997
11/26 Graph Structure of the Web
Ravi Kumar, Prabhakar Raghavan, Sridhar Rajagopalan, Andrew Tomkins. Trawling the Web for Emerging Cyber-Communities. WWW8, 1999
Andrei Broder, Ravi Kumar, Farzin Maghoul, Prabhakar Raghavan, Sridhar Rajagopalan, Raymie Stata, Andrew Tomkins, Janet Wiener. Graph structure in the web. WWW9, 2000
12/10 Parallel Crawlers
Junghoo Cho, Hector Garcia-Molina. Parallel crawlers. WWW2002, 2002
12/17
Dennis Fetterly, Mark Manasse, Marc Najork, and Janet Wiener. A Large-Scale Study of the Evolution of Web Pages. WWW2003
Alexandros Ntoulas, Junghoo Cho, Christopher Olston. What's new on the web?: the evolution of the web from a search engine perspective. WWW2004
01/21
02/04

Evaluation

Two reports:

11/19 Announcement (Deadline 12/17)

Choose an interesting full paper relating to the Web from measure international conferences, such as WWW, SIGIR, SIGKDD, and report a summary of that paper within 6 pages in total. Do not select papers explained in the lecture (listed above). You must include the following.

Send the report to the following e-mail address in the PDF format. The "Subject:" should begin with "[Web Engineering Report 2]".

toyoda [@] tkl.iis.u-tokyo.ac.jp

12/17 Announcement (Deadline 2/4)

Make one of the following reports.

Send the report to the following e-mail address in the PDF format. The "Subject:" should begin with "[Web Engineering Report 2]".

toyoda [@] tkl.iis.u-tokyo.ac.jp