Improving WWW Marketing through User Information and Non-Intrusive Communications

Ariel Poler
Internet Profiles Corporation (I/PRO)
520 California Avenue #6
Santa Monica, CA 90403

Abstract

We have develop a system that addresses two needs of Web publishers: obtaining demographic and other relevant information about their users and, actively communicating with Internet users in a non-intrusive and acceptable fashion. There are three key elements to our approach, 1) a system for extracting as much valuable information as possible from Web servers log files, 2) a free personalized Web page for users which companies and institutions can use to communicate with those interested in their information, and, 3) a mechanism for Web publishers to obtain the profile of users who visit their site, while protecting the privacy of the users, and eliminating any hassle or inconvenience on their part. Our model is based on giving users something valuable in exchange for their information, and allowing them to determine who can obtain such information.

Introduction

The World Wide Web is a great mechanism for disseminating information [1]. It makes it extremely easy for anyone on the Internet to obtain information from any publisher. The Web could also be a great mechanism for publishers to learn about those who access their information. However, the latter capability is not being exploited. Log files record every information request of Web servers, but they provide very limited information about the specific user behind each transaction. Furthermore, the information that they do contain is not being analyzed and interpreted to its full potential.

Another limitation of the current WWW model is its passive nature. There are currently no effective mechanisms in place for WWW publishers to notify interested Internet users of on-going developments in their servers. Generic "what's new" servers such as NCSA's What's New can be monitored by users to learn about new servers or major changes to existing ones, but they are too broad for learning about day to day developments. Furthermore, the number of new WWW servers is growing so fast, that generic "what's new" servers are becoming inefficient even for that purpose. Each publisher can have its own what's new page, but then the problem is the opposite, they are too narrow and too much of a hassle for users to monitor.

This paper describes a system which allows WWW publishers to learn more about their users and to communicate actively and effectively with those who are interested in their information. The system combines the comprehensive analysis of log files and related databases, a personalized Web page for each user that is constantly updated for information that matches the user's interests, and a mechanism for Web publishers to obtain information about the users that visit their site, without inconveniencing users with multiple forms or invading their privacy.

A Hidden Marketing Treasure: Analysis of Log Files

Before trying to obtain additional information about each user of a Web server, we decided to make the most out of the information that Web servers already have. Every time that information is requested from a WWW server, the log file records the name of the file requested, the time, the client address, and other parameters. With firewalls, dynamic IP addressing, and gateways, it is difficult to determine the individual that makes each information request. However, we can determine the domain from which the connection was made.

Existing log analysis programs break down access by various time periods, files, and domains. But much more valuable information can be obtained from the logs. Following are the most significant additional information that our system provides:

Break down accesses by domain characteristics. For example, for information requests from commercial domains, the system determines the SIC code, geographic location, revenue and employee size, etc. This is accomplished by using the WHOIS database to obtain the full name for each domain, and commercial databases of corporations and institutions to complement the information.
Allow companies to analyze the log data in a multi-dimensional fashion, correlating as many variables as their analysis requires, and using a Web based interface. For example, a company could determine the most popular files for accesses from a given location in a particular interval of time, or, the daily usage pattern for a specific SIC code from a certain geographic region.
Estimate additional usage variables such as time spent by users in each screen, navigational patterns of users, and the amount of information actually obtained by users (as opposed to files or bytes). To estimate this information, the system takes into consideration each Web site's structure, including the links that are located in each page. Proxy servers, local caches, and other factors limit the accuracy of these estimates - and of any log analysis in general. However, the information obtained is still extremely useful, and clearly better than a simple count of bytes and information requests.

Giving something back to Users: A Personalized Web Page

We have combined the concept of an information filter[2], with the WWW, and developed a system that creates Personalized Web Pages (and URLs) for Internet users. Every day, or as often as each user desires, a Web page will be created for each user with links and summary descriptions to the documents that match the interests that the user specified. This method combines the convenience of automatically filtering a vast amount of information for whatever is most relevant to the user, with the hyperlinked and user friendly nature of the Web.

The Personalized Web Pages is a free service for Internet users which serves two purposes in our system. First, it allows Web publishers to actively reach users who have indicated an interest in their information. This is extremely important because the Web is such a passive mechanism, and, up until now, no system has been implemented that satisfies companies and institutions' desires to keep users informed about developments that might be of interest to them - without resorting to mass mailings that reach mostly non-interested users, or mailing lists that are usually too broad in scope or cater to a small niche of users. Secondly, the Personalized Web Pages, which in addition to commercial information contain relevant information from user groups, mailing lists, and other sources, is a valuable service for users that gives them something back for providing information about themselves. Instead of receiving junk mail and telemarketing calls, users receive a valuable service that also protects their privacy.

To specify their interests, users will be able to determine what sources of information they are interested in (e.g., USENET, Material submitted directly by companies, news services), what their particular interests are (e.g. "Internet ISDN rates California"), as well as how much information and with what frequency they desire it. Companies and institutions will be able to submit their product and service information to the system, to be filtered and forwarded to those users who have indicated an interest in their information. Users will be able to "test run" their queries and will be warned if they are too broad or too narrow.

Users will receive an identification code associated with their profile which they will be able to use to update their interests, and to identify themselves when entering other Net servers. The use of this code will guarantee their privacy and provide additional benefits (see next section).

Obtaining Additional Information about the Users who visit Web Sites

The Web allows servers to request additional information from users through the use of forms and scripts - some companies are already doing it[3]. However, this server independent approach has two problems:

It makes it extremely inconvenient for users to have to fill a questionnaire every time they go to a different server.
It raises serious privacy issues regarding what companies and other institutions might do with such information - thus some users might not be willing to provide it.

Additionally, although requesting user information is simple, tracking and analyzing each user's accesses is significantly more complicated. Yet not collecting any information about users is a waste of resources, both for the publishers and for the users. For the former, because such information can help them improve their server to better address their users, and for the latter, because theirs is valuable information from which they can benefit directly.

We have developed a system which, we believe, addresses these issues thus creating value for both Web publishers and users. Instead of having to provide their information over and over, users can provide publishers with their Personalized Web Page identification code. Web publishers will then use standard software to request each user's code, and add it to every entry in the log file. Finally, companies and institutions can use the log analysis services described above to obtain information about the characteristics of their users. To protect the privacy of users, they will be able to determine what information they desire to maintain confidential (e.g., name, e-mail). Such information will not be disclosed.

Users who are not using the Personalized Web Page service will be able to obtain an identification code instantly by providing their profile. They will also be able to join the Personalized Web Page service at any time, if they so desire. The Web publishers benefit from the information about their users, and, more significantly, the users benefit by receiving a share of the value of their information in the form of free services and other valuable things. The Personalized Web Page service will be one such service, and others will be correlated to the number of servers to which users provide their identification - and will be provided both by participating companies and institutions as well as by us.

Conclusions

As the number of companies and institutions publishing information on the Web, and the number of Web users grows, it becomes more important to develop systems that help both publishers and users obtain information about the other. Companies and institutions will need to learn about their users in order to provide the right information, and users will find that they can benefit by disclosing information about themselves in a secure and practical fashion.

There are still many issue to resolve, including the distribution of the value of information, and the degrees of privacy and confidentiality that are appropriate, yet we believe the system described in this paper is a good step towards a more efficient exchange of information between Web users and Web publishers.

Acknowledgments

Thanks to Tak W. Yan, of Stanford University, for his ideas and work on the Personalized Web Pages and on information filtering systems.

References

[1] Ed Krol. The Whole Internet User's Guide & Catalog. O'Reilly & Associates, Inc. 1994.

[2] Tak W. Yan and Hector Garcia-Molina. SIFT - A Tool for Wide-Area Information Dissemination. 1994.

[3] Global Network Navigator user registration.