![]() |
(7) |
where
is the number of unique terms in the document collection;
is the frequency of a term
in the query
;
and
is the document weight:
![]() |
![]() |
(8) |
where
is the term frequency of a term
in the document
;
is the total number of documents in the collection;
is the number of documents in the collection that contain the query term
;
is the length of the document (in bytes); and
is the average document length in the collection (in bytes).
For reasons similar to the VSM method, the Okapi similarity
measurement cannot be applied directly in evaluating the precision of search
engines [20]. We need values for
and
.
In our research, we estimate the values of
and
in the way described in the last section for VSM. In addition, the
average length of a Web document (
)
is estimated as to be 10,939 bytes after removing all the HTML tags and
Java scripts.