Refereed Papers
Track: Search: Corpus Characterization and Search Performance
Paper Title:
Performance of Compressed Inverted List Caching in Search Engines
Authors:
- Jiangong Zhang(Polytechnic University)
- Xiaohui Long(Microsoft Corporation)
- Torsten Suel(Polytechnic University)
Abstract:
Due to the rapid growth in the size of the web, web search engines are facing
enormous performance challenges. The larger engines in particular have to be
able to process tens of thousands of queries per second on tens of billions
of documents, making query throughput a critical issue. To satisfy this heavy
workload, search engines use a variety of performance optimizations including
index compression, caching, and early termination.
We focus on two techniques, inverted index compression and index caching,
which play a crucial rule in web search engines as well as other
high-performance information retrieval systems. We perform a comparison and
evaluation of several inverted list compression algorithms, including new
variants of existing algorithms that have not been studied before. We then
evaluate different inverted list caching policies on large query traces, and
finally study the possible performance benefits of combining compression and
caching. The overall goal of this paper is to provide an updated discussion
and evaluation of these two techniques, and to show how to select the best
set of approaches and settings depending on parameter such as disk speed
and main memory cache size.
Inquiries can be sent to: