Next: Collection of Pages
Up: Experimental Environment
Previous: TREC Implementation
We compare the effectiveness of our implementation of TREC ad-hoc
algorithm to four commercial search engines: Excite, Google, Lycos,
and AltaVista Raging. For a given query, if a page is not found in the
top ten ranks by a search engine, that engine gets no credit for that
particular query. The assumption here is that if a user can't find a
page in the top result page, the user will simply give up. This
assumption is strongly supported by the fact that almost 85% of users
don't request beyond just the first results screens for their
query [18]. For every system we count the number of queries
for which it retrieves the desired site at rank-1, up to rank-2, up to
rank-3, and so on, and plot this on a graph (see
Section 5). The higher the number of queries for which
an engine retrieves the desired site at a certain rank, the better is
the engine.
Using the top ten pages per query also allowed us to manually judge
every run. Even though it is simple in principle to find if two URLs
will get you the same page, in light of redirections (via the refresh HTML meta-tag), pages generated by javascripts, mirror
sites, etc., this becomes a non-trivial exercise in the current web
environment. Therefore, we check all the results by hand to find ranks
of the relevant pages retrieved (as they may be retrieved under a
completely different URL).
Next: Collection of Pages
Up: Experimental Environment
Previous: TREC Implementation
Amit Singhal
2001-02-18