nextupprevious
Next:Three-Level Scoring Method (TLS)Up:Combining the HITS-based AlgorithmsPrevious:Okapi Similarity Measurement (Okapi)

Cover Density Ranking (CDR)

Instead of computing the relevance based on term appearance, such as VSM, other methods including CDR are based on the appearance of phrases. CDR is developed to meet user expectation better - a document containing most or all of the query terms should be ranked higher than a document containing fewer terms, regardless of the frequency of term occurrence [25]. In CDR, the results of phrase queries are ranked in the following two steps [7]:
  1. Documents containing one or more query terms are ranked by coordination level, i.e., a document with a larger number of distinct query terms ranks higher. The documents are thus sorted into groups according to the number of distinct query terms each contains, with the initial ranking given to each document based on the group in which it appears.
  2. The documents at each coordination level are ranked to produce the overall ranking. The score of the cover set $\omega=\{(p_1, q_1), (p_2, q_2), \ldots,(p_n, q_n)\}$ is calculated as follows:
  3. \begin{displaymath}S(\omega) = \sum_{j=1}^{n} I(p_j, q_j)\qquad \mbox{and}\end{displaymath}
    \begin{displaymath}I(p_j, q_j) = \left \{ \begin{array}{ll}\frac{\lambda}{q_......p_j+1>\lambda \\1 &\mbox{otherwise}\end{array} \right .\end{displaymath} (9)
    where $(p_j, q_j)$ is an ordered pair over a document, called cover, specifying the shortest interval of two distinct terms in the document [7]. $p_j$ is the position of one term, $q_j$ the position of another term, and $q_j$ is assumed to be larger than $p_j$$\lambda$ is a constant and is set to 16 in our experiments because it has been shown to produce good results [6]. Covers of length $\lambda$ or shorter are given score 1, and longer covers are assigned scores less than 1 in proportional to the inverse of their lengths.
To adapt CDR to the Web, we need to find out how many distinct query terms a document has and rank the documents with more distinct terms higher. Our version of CDR method computes the relevance scores of documents in two steps:
  1. Documents are scored according to the regular CDR method. Each document belongs to a coordination level group and has a score within that group.
  2. The scores are normalized to range (0, 1] for documents containing only one term, to range (1, 2] for documents containing two different terms, and so on, so forth.
The benefit of this method is that it not only considers the number of distinct terms in a document, but also how these distinct terms appeared in the document, such as how close they are.


nextupprevious
Next:Three-Level Scoring Method (TLS)Up:Combining the HITS-based AlgorithmsPrevious:Okapi Similarity Measurement (Okapi)
2002-02-18