WebAssist: a user profile specific information retrieval assistant

Christian Kurzke, Michael Galle and Manfred Bathelt

University of Erlangen, Erlangen, Germany
cnkurzke@cip.informatik.uni-erlangen.de, endc01@rrze.uni-erlangen.de and
mdbathel@cip.informatik.uni-erlangen.de

Abstract: Despite the ever increasing information available on the Internet today, finding accurate information for a specific topic becomes more and more difficult. This paper describes the concept of a proxy based information classification and filtering utility, named WebAssist. On the behalf of users a private view of the WWW is generated based on a previously determined profile. This profile is created by monitoring the user (and group) activities when browsing WWW pages. Additional features are integrated to allow for easy interoperability in workgroups with similar project interests, maintain personal and common hotlists with automatic modification checks and a sophisticated search engine front-end. The proxy architecture allows easy configurability using standard Web browsers and eliminates the necessity to install additional platform specific software.
Keywords: Information retrieval; Group profiles; Monitoring; Proxy

Summary

The focus of the WebAssist project is to pre-classify and filter information based on user profiles database which is automatically created by tracing user activities when browsing the Web.

When a user joins the WebAssist system by using the WebAssist proxy, the keywords (determined from META statements within the HTML code) of all Web pages read are collected within the database. Other information about Web documents like request or keyword frequencies, time of last visit and modification time are also stored in the database as they are evaluated over time.

After startup time, a huge set of keywords and URLs has been collected and can assist the user in information retrieval. In this phase, the user can also benefit from profiles of other users working on the same project. The resulting URLs will be classified, compared to the user profiles and added to the database.

Besides supporting the user in information retrieval tasks, the WebAssist also helps to maintain personal and group wide hotlists of relevant URLs. New URLs can be made available to all users dependent on the assigned permissions. This greatly simplifies the communication between project members.
Based on this database, automatic tasks like checking for modification can be done periodically, depending on flags associated with the URL.

It is also possible to think of completely automatic information retrieval using the system, based on the set of acquired keywords. These database updates can be accomplished after business hours or during idle times.

Another aproach to increase the accuracy of the database would be using sophisticated mechanisms to generate new relevant keywords, not only from meta statements, but to extract them from the document text as proven to be possible in other studies.

A very close look has to be taken concerning data protection and user privacy. Since WebAssist scans the whole traffic of a user surfing the Internet, the databases offer a deep insight in the user's interest profiles.
Here a way has to be found to secure their privacy, even from the system administrator. This could be achieved by encrypting the database entries.

Still, with the current concept, WebAssist can increase the productivity of working groups, collaborating on projects.

Fig. 1. Screenshot of a WebAssist classified page.

For more detailed information and references look at:

http://wwwcip.informatik.uni-erlangen.de/user/cnkurzke/WebAssist/