Abstract
We have develop a system that addresses two needs of Web publishers: obtaining demographic and other relevant information about their users and, actively communicating with Internet users in a non-intrusive and acceptable fashion. There are three key elements to our approach, 1) a system for extracting as much valuable information as possible from Web servers log files, 2) a free personalized Web page for users which companies and institutions can use to communicate with those interested in their information, and, 3) a mechanism for Web publishers to obtain the profile of users who visit their site, while protecting the privacy of the users, and eliminating any hassle or inconvenience on their part. Our model is based on giving users something valuable in exchange for their information, and allowing them to determine who can obtain such information.
Another limitation of the current WWW model is its passive nature. There are currently no effective mechanisms in place for WWW publishers to notify interested Internet users of on-going developments in their servers. Generic "what's new" servers such as NCSA's What's New can be monitored by users to learn about new servers or major changes to existing ones, but they are too broad for learning about day to day developments. Furthermore, the number of new WWW servers is growing so fast, that generic "what's new" servers are becoming inefficient even for that purpose. Each publisher can have its own what's new page, but then the problem is the opposite, they are too narrow and too much of a hassle for users to monitor.
This paper describes a system which allows WWW publishers to learn more about their users and to communicate actively and effectively with those who are interested in their information. The system combines the comprehensive analysis of log files and related databases, a personalized Web page for each user that is constantly updated for information that matches the user's interests, and a mechanism for Web publishers to obtain information about the users that visit their site, without inconveniencing users with multiple forms or invading their privacy.
Existing log analysis programs break down access by various time periods, files, and domains. But much more valuable information can be obtained from the logs. Following are the most significant additional information that our system provides:
The Personalized Web Pages is a free service for Internet users which serves two purposes in our system. First, it allows Web publishers to actively reach users who have indicated an interest in their information. This is extremely important because the Web is such a passive mechanism, and, up until now, no system has been implemented that satisfies companies and institutions' desires to keep users informed about developments that might be of interest to them - without resorting to mass mailings that reach mostly non-interested users, or mailing lists that are usually too broad in scope or cater to a small niche of users. Secondly, the Personalized Web Pages, which in addition to commercial information contain relevant information from user groups, mailing lists, and other sources, is a valuable service for users that gives them something back for providing information about themselves. Instead of receiving junk mail and telemarketing calls, users receive a valuable service that also protects their privacy.
To specify their interests, users will be able to determine what sources of information they are interested in (e.g., USENET, Material submitted directly by companies, news services), what their particular interests are (e.g. "Internet ISDN rates California"), as well as how much information and with what frequency they desire it. Companies and institutions will be able to submit their product and service information to the system, to be filtered and forwarded to those users who have indicated an interest in their information. Users will be able to "test run" their queries and will be warned if they are too broad or too narrow.
Users will receive an identification code associated with their profile which they will be able to use to update their interests, and to identify themselves when entering other Net servers. The use of this code will guarantee their privacy and provide additional benefits (see next section).
We have developed a system which, we believe, addresses these issues thus creating value for both Web publishers and users. Instead of having to provide their information over and over, users can provide publishers with their Personalized Web Page identification code. Web publishers will then use standard software to request each user's code, and add it to every entry in the log file. Finally, companies and institutions can use the log analysis services described above to obtain information about the characteristics of their users. To protect the privacy of users, they will be able to determine what information they desire to maintain confidential (e.g., name, e-mail). Such information will not be disclosed.
Users who are not using the Personalized Web Page service will be able to obtain an identification code instantly by providing their profile. They will also be able to join the Personalized Web Page service at any time, if they so desire. The Web publishers benefit from the information about their users, and, more significantly, the users benefit by receiving a share of the value of their information in the form of free services and other valuable things. The Personalized Web Page service will be one such service, and others will be correlated to the number of servers to which users provide their identification - and will be provided both by participating companies and institutions as well as by us.
There are still many issue to resolve, including the distribution of the value of information, and the degrees of privacy and confidentiality that are appropriate, yet we believe the system described in this paper is a good step towards a more efficient exchange of information between Web users and Web publishers.
[2] Tak W. Yan and Hector Garcia-Molina. SIFT - A Tool for Wide-Area Information Dissemination. 1994.
[3] Global Network Navigator user registration.