Methods for Measuring Search Engine Performance over Time
Judit Bar-Ilan
The Hebrew University of Jerusalem
School of Library, Archive and Information Studies
P.O. Box 1255, Jerusalem, 91904, Israel
judit@cc.huji.ac.il
ABSTRACT
This study introduces
methods for evaluating search engine performance over a time period. Several
measures are defined, which as a whole describe search engine functionality
over time. The use of these measures is illustrated thorugh a specific
example.
Keywords
search engines, Web, performance,
stability, measures
1. INTRODUCTION
The Web is extremely dynamic:
new pages are published; old pages are removed while the content of other
pages is changed. Such behavior is not encountered in classical information
systems. Web search engines are expected to be able to handle these dynamic
changes. The aim of the current study is to introduce methods for evaluating
search engine performance during a time period.
2. DEFINING THE MEASURES
Any study monitoring search
engine performance over time, must examine its behavior periodically. The
length of the period between consecutive searches has to be set. We call
each search point a search round. At each search round the same
query or set of queries is presented to each of the search engines under
examination. For each query, in each search round, all the results from
all the search engines under examination must be collected.
A document is defined
to be technically relevant if it fulfills all the conditions posed
by the query: all search terms and phrases that suppose to appear in the
document do appear, and all terms and phrases that are supposed to be missing
from the document - terms preceded by a minus sign or a NOT operator, do
not appear in the document. Technical precision is defined as the
percentage of technically relevant retrieved documents out of the total
number of retrieved documents.
The set of well-handled
URLs is defined as the URLs that are either continuously retrieved
by the engine from the first time they was discovered by it, or if the
URLs are not retained by the search engine, it is because they either disappeared
from the Web or cease to be technically relevant.
The set of mishandled
URLs is the set of URLs that disappeared from the search results
during the search period (such URLs are counted with multiplicity one even
if they were "forgotten" several times), even though the URLs continued
to exist and continued to be technically relevant. The set of mishandled
URLs can be further partitioned into the set of mishandled reappeared
URLs (these are URLs which disappeared from the search results at one
stage, but reappeared later) and the set of mishandled disappeared URLs,
which are mishandled URLs that never reappeared at a later stage.
3. CALCULATING THE MEASURES FOR THE SPECIFIC EXAMPLE
As our example, we picked
a single word query, which has no stems or extensions (to avoid problems
with search engines handling multiple word queries and stemming/extension
in a different manner). Our choice was "aporocactus". The searches were
carried out once a month over a period of ten months between January and
October, 2000. The six largest search engines as of January 2000 were queried
in each search round: AltaVista, Excite, Fast, Google, Hotbot and NorthernLight.
Figure 1 displays the number of URLs each search engine retrieved in each
search round.
Figure 1: URLs per search round and search engine
The technical
precision of the search engines was rather high between 76.3% and 99.5%.
Excite and Hotbot had the highest technical precision. Note that besides
pages not containing the search term, unreachable URLs and documents not
found (404 errors) were also categorized as technically irrelevant.
Figure 2 depicts the
extent to which each search engine mishandled URLs during the search period.
Figure 2: Well
handled and mishandled URLs over the search period
4. CONCLUSIONS
The example illustrates
that there is a need to study search engine stability
(or rather instability) over time.