Next:Introduction
Improvement of HITS-based Algorithms on Web Documents
Longzhuang Li, Yi Shang, and Wei Zhang
Department of Computer Engineering and Computer Science
University of Missouri-Columbia
Columbia, MO 65211, USA
Copyright is held by the author/owner(s).
WWW2002, May 7-11, 2002, Honolulu, Hawaii, USA.
ACM 1-58113-449-5/02/0005.
Abstract:
In this paper, we present two ways to improve the precision of HITS-based
algorithms on Web documents. First, by analyzing the limitations of current
HITS-based algorithms, we propose a new weighted HITS-based method that
assigns appropriate weights to in-links of root documents. Then, we combine
content analysis with HITS-based algorithms and study the effects of four
representative relevance scoring methods, VSM, Okapi, TLS, and CDR,
using a set of broad topic queries. Our experimental results show that
our weighted HITS-based method performs significantly better than Bharat's
improved HITS algorithm. When we combine our weighted HITS-based method
or Bharat's HITS algorithm with any of the four relevance scoring methods,
the combined methods are only marginally better than our weighted HITS-based
method. Between the four relevance-scoring methods, there is no significant
quality difference when they are combined with a HITS-based algorithm.
Categories and Subject Descriptors: H.3 [Information Systems]:
Information Storage and Retrieval; G.3 [Mathematics of Computing]:
Probability and Statistics; D.2 [Software]: Software Engineering
General Terms: Algorithms, Measurement, Performance
Keywords: HITS-based algorithms, relevance scoring methods, information
retrieval
Next:Introduction
2002-02-18