Posters
Track: Posters
Paper Title:
Representing a Web Page as Sets of Named Entities of Multiple Types - A Model and Some Preliminary Applications
Authors:
- Nan Di(Peking University)
- Conglei Yao(Peking University)
- Mengcheng Duan(Peking University)
- Jonathan J. H. Zhu(City University of Hong Kong)
- Xiaoming Li(Peking University)
Abstract:
As opposed to representing a document as a "bag of words" in most
information retrieval applications, we propose a model of
representing a web page as sets of named entities of multiple types.
Specifically, four types of named entities are extracted, namely
person, geographic location, organization, and time. Moreover, the
relations among these entities are also extracted, weighted,
classified and marked by labels. On top of this model, some
interesting applications are demonstrated. In particular, we introduce
a notion of person-activity, which contains four different elements:
person, location, time and activity. With this notion and based on a
reasonably large set of web pages, we are able to show how one
person's activities can be attributed by time and location, which
gives a good idea of the mobility of the person under question.
Inquiries can be sent to: