International World Wide Web Conference, 28th March - 1st April 2011, Hyderabad, India

Conference Theme

Search

Official Airline

Tutorial :
Distributed Web Retrieval

Tutorial id	Invited2
Tutorial name	Distributed Web Retrieval
Presenter	• Ricardo Baeza-Yates
	Yahoo! Research Barcellona, Spain rbaeza@acm.org
	View Tutorial
Abstract In the ocean of Web data, Web search engines are the pri mary way to access content. As the data is on the order of petabytes, current search engines are very large centralized systems based on replicated clusters. Web data, however, is always evolving. The number of Web sites continues to grow rapidly (over 270 millions at the beginning of 2011) and there are currently more than 20 billion indexed pages. On the other hand, Internet users are above one billion and hun dreds of million of queries are issued each day. In the near future, centralized systems are likely to become less e ective against such a data-query load, thus suggesting the need of fully distributed search engines. Such engines need to main tain high quality answers, fast response time, high query throughput, high availability and scalability; in spite of net work latency and scattered data. In this tutorial we present the architecture of current search engines and we explore the main challenges behind the design of all the processes of a distributed Web retrieval system crawling, indexing, and query processing.

Host

In Association With

IW3C2

Quick Links