|
Focused Crawling: Experiences in a Real World Project
Track: Posters In this paper, we describe our experience building a focused web crawler, that is, a web crawler that retrieves only pages about a given topic. We review some of the problems encountered, roughly dividing them into practical or engineering issues (related to the lack of standards and control for the web, and not addressed in most research) and conceptual issues (related to the task at hand -determining if a certain page is about a given topic-, over which considerable research has been done). We then overview the system we designed and built, and provide some preliminary evidence of its performance. We conclude with some observations and suggestions for further research. Other items being presented by these speakers |
Platinum SponsorsSponsor of The CIO Dinner |