nextupprevious
Next:Combining the HITS-based AlgorithmsUp:Improvement of HITS-based AlgorithmsPrevious:Current HITS-based Algorithms and

A New Weighted HITS-based Algorithm

Without content analysis, both BHITS and HITS can not solve the above problem. BHITS only gives the links on the same host/document small weights, whose sum equals to 1. While in the above problem, most of the out-links of a small-in-large-out link are in different domains. HITS usually makes the problem worse because it gives the same weight to all the documents in the base set. In this section, without resorting to content analysis, user feedback, or Web log records, to prevent the BHITS algorithm and HITS algorithm from converging to such small-in-large-out links, we use the link information in the base set, and add more weights to the in-links of root set links if a small-in-large-out link exists. In the new method, the equations (1) and (2) are modified as follows:
  1. For all $i' \in I$ which points to $i$,
  2.  
    \begin{displaymath}a_i = \sum_{i'} w_{a_{i'}} \cdot h_{i'}\end{displaymath} (3)
  3. For all $i' \in I$ which is pointed to by $i$,
  4.  
    \begin{displaymath}h_i = \sum_{i'} w_{h_{i'}} \cdot a_{i'}\end{displaymath} (4)
In equation (3), the value of $h_{i'}$ can be HITS hub weight, BHITS hub weight, or the hub weight of other HITS-based algorithms. Similarly, this applies to the value of $a_{i'}$ in equation (4).

In our current implementation, all the $w_{h_{i'}}$ values are set to 1. The setting of $w_{a_{i'}}$ consists of two parts:

  1. Before starting a HITS-based algorithm, if there exists a root link whose in-degree is among the three smallest ones and whose out-degree is among the three largest ones, then set $w_{a_{i'}}$ to 4 for in-links of all the root links.
  2. Otherwise, set all $w_{a_{i'}}$ to 1. Run the HITS-based algorithm for one iteration without normalization. If there exists a root link whose authority value is among the three smallest ones and whose hub value is among the three largest ones, set $w_{a_{i'}}$ to 4 for in-links of all the root links.
In the above two steps, usually the in-degree of a small-in-large-out link is as small as 0, 1, or 2, while the out-degree can be more than several hundred. Intuitively, in most cases, it is hard to believe that a root link with no or few in-links can point to many highly relevant documents. Even if it points to many good documents, due to the large number of documents in the base set, there may be some duplicates between the out-links of the small-in-large-out link and the neighborhood of other links, and these good duplicated documents still have the chance to top the hub set or the authority set. The method of setting in-link weights are very simple and can be further improved by adaptively changing the weights of both in- and out-links of a small-in-large-out link.

Let's use WBHITS to represent the weighted BHITS algorithm. The bottom half of Table 1 shows the top 10 authorities and top 10 hits from WBHITS, which are much better than those from BHITS.


nextupprevious
Next:Combining the HITS-based AlgorithmsUp:Improvement of HITS-based AlgorithmsPrevious:Current HITS-based Algorithms and
2002-02-18