Refereed Papers
Track: Search: Corpus Characterization and Search Performance
Paper Title:
A Graph-Theoretic Approach to Webpage Segmentation
Authors:
- Deepayan Chakrabarti(Yahoo! Research)
- Ravi Kumar(Yahoo! Research)
- Kunal Punera(Yahoo! Research)
Abstract:
We consider the problem of segmenting a webpage into visually and
semantically cohesive pieces. Our approach is based on formulating an
appropriate optimization problem on weighted graphs, where the weights
capture if two nodes in the DOM tree should be placed together or
apart in the segmentation; we present a learning framework to learn
these weights from manually labeled data in a principled manner. Our
work is a significant departure from previous heuristic and rule-based
solutions to the segmentation problem. The results of our empirical
analysis bring out interesting aspects of our framework, including
variants of the optimization problem and the role of learning.
Inquiries can be sent to: