An architecture to support personalized Web applications

- A User Profile Management Proxy -

Laurent Frelechoux and Tomonari Kamba
C&C Research Labs, NEC Corporation
4-1-1 Miyazaki, Miyamae-ku, Kawasaki, Kanagawa 216, Japan
laurent@mmp.cl.nec.co.jp, kamba@mmp.cl.nec.co.jp

Abstract

To present relevant information to users, more and more applications on the World Wide Web are using personalization techniques. Such methods require Web providers to manage and store user profiles. The most common architectures for personalized Web applications are to store the user profiles either at the server or at the client (browser). Both architectures have advantages and disadvantages, but neither can at the same time support the "collaborative filtering" personalization technique and the sharing of user profiles between different applications. Furthermore, programmers do not get the benefit of a standard environment for developing personalized applications. We therefore introduce the concept of a user profile management agent and locate it at a Web proxy. Our proxy manages and stores the user profiles and mediates the communications between the users and the Web servers. This not only relieves the personalized servers from the tasks of user profile management but also makes it possible to provide support for advanced personalization techniques such as collaborative filtering. The effectiveness of the proposed architecture has been demonstrated by a prototype of the proxy and by two implemented applications.

Keywords: personalization, user profile, proxy, agent, HTTP

Introduction

Although the World Wide Web is without doubt the widest source of electronic information, it lacks effective tools for retrieving, filtering, and displaying the information that each user needs. And as the size of the Internet keeps increasing, the task of retrieving useful information keeps becoming more and more difficult and time consuming.

To meet the growing demand for seeking tools, several Web sites already offer general search engines which, given a combination of keywords, provide a list of hypertext links potentially matching the user's interest (Lycos [1], Yahoo [2], AltaVista [3], etc.). Web applications that extract relevant information can be built on the top of these search engines, but such applications can only produce satisfying results if the targeted information can be characterized by a combination of keywords.

Fig. 1. Personalization architecture for Web applications.

We think that many Web applications will, as a next step, adopt a personalization architecture (Fig. 1) and provide services that suit the interest of the user as an individual or as a member of a group. The Krakatoa Chronicle [4] is a prime example of what can be achieved when using the "content-based filtering" personalization technique. The layout of this electronic newspaper is personalized (the most interesting articles are displayed first) according to a user profile that reflects the user's interests. By scoring articles, the user provides a feedback that enables a learning engine to regularly update the user profile to reflect the changes of the user's interest. The popularity of other personalized Web applications such as Crayon [5], PowerStart [6], and PointCast Network [7] suggests that personalization techniques have a bright future.

In this paper we propose an architecture that provides support for personalized Web applications. The proposed user profile management agent relieves the Web applications from the tasks of managing the user profiles and facilitates the use of advanced personalization techniques. To demonstrate the effectiveness of this architecture, we have implemented the agent as a user profile management proxy and have built two applications using its services.

User profiles

Although there is no standard definition of a personalized application, for the purposes of this paper we consider a personalized application to be one that requires data specific to an individual or a group. A personalized application uses a user profile to offer a service that is not identical for all users, but that is tailored to suit the individual user's preferences or interests.

Techniques using profiles

The most common way to use a profile is to store information that enables personalization on an individual basis (Fig. 2A). Included in this category is content-based filtering which, applied to a textual document, evaluates the document's relevance by matching the keywords contained in a user profile with the keywords extracted from the text [8]. On the Web, the most efficient location for storing this type of information is at the server so the user profiles do not have to be transmitted through the network.

Social or collaborative filtering [9] is another effective way to take advantage of user profiles. This method collects the user profiles of a group of people and generates recommendations based on the similarities of the profiles (Fig. 2B). To implement collaborative filtering, the profiles of all users must be compared - and therefore the best storage location is also to centralize them at the server.

Fig. 2. Different uses of profiles

A user profile can also be shared between different personalized applications that require the same user profile's content (Fig. 2C). For example, two providers of personalized newspapers can collaboratively contribute to the building of a user profile that reflects the interest of the reader. This collaboration enables both applications to gain a deeper knowledge about the user's interest. Because all the personalized Web applications (even on different servers) need to have access to the complete set of profiles for a specific user, the best place to store a user profile is at the browser.

The common Web architectures for managing user profiles

Several architectures are used for personalized services on the Web and they differ mainly in the locations of the management and storage functions.

Fig. 3. Common architectures for managing user profiles.

The most common architecture (adopted for the Krakatoa Chronicle) is the server-based architecture, in which the user profiles are both stored and managed at the server (Fig. 3A). Since the profiles of all users are centralized, the server needs to identify the user in order to extract the right user profile. This is done by using an authentication mechanism (ID + password). This architecture is efficient in that the user profiles do not transit through the network. The centralization of all the user profiles enables the use of both content-based and collaborative filtering but prevents user profiles from being shared between applications on different servers. With this architecture, the service provider has to supply both hardware and software for the management and storage of the users' profiles. For a worldwide service, those profiles may represent a large amount of data.

The second architecture (adopted for PowerStart) stores the user profiles on the client side and manages them on server side (Fig. 3B). This architecture enables the use of content-based filtering and profile sharing but not of collaborative filtering. The browser must provide a mechanism for permanently storing data on the user's computer, and this is a sensitive issue because most browsers, for security reasons, do not allow a Web application (e.g., a Java applet[10]) to permanently store any information on the terminal. The "Cookie" mechanism introduced by Netscape [11] [12] is an exception to this rule. By setting a cookie, an application can get data permanently stored by the browser and automatically sent back when the user accesses the application again. The main advantage of this second architecture is the distributed nature of the storage, which frees the service provider from supplying software and disk space for the database, but the transmission of the user profile between its storage location (client) and the management location (server) increases the response delay.

The third architecture (adopted by PointCast Network) manages and stores the user profiles on the client side (Fig. 3C). In PointCast Network the personalization is done by the browser, and the architecture is therefore not a client-server architecture anymore (at least with respect to the personalization). This architecture enables the use of content-based filtering and user profile sharing but not of collaborative filtering.

Although all these architectures enable the use of content-based filtering, none of them can at the same time support collaborative filtering and the sharing of user profiles among different applications. Furthermore, no standard such as the Common Gateway Interface (CGI) [13] has been defined for the management and storage of the user profiles on the server side. Each personalized Web application that uses the server-based architecture has to interface individually with the database that contains the user profiles.

User profile management agent

To solve the problems described in the previous section, we introduce the concept of a user profile management agent. One can define an agent as "anyone who acts on behalf or in the interest of somebody else" [14]. Accordingly, our agent is one responsible for managing and storing the user profiles and thus for providing personalization support to Web applications. The tasks of this agent are storing and retrieving user profiles, enabling these profiles to be shared, and providing support for collaborative filtering.

One way to implement the user profile management agent would be to use an RPC (Remote Procedure Call) architecture and put the agent on a separate server. The main drawback of this model is that it requires a network connection to be made each time that a Web application needs a service from the agent. This adds a significant delay to the overall execution time of the application.

A proxy-based user profile management agent

Nowadays many companies use one or more proxies between their intranet and the Internet. A proxy acts as a firewall, prohibiting unauthorized visitors from accessing the intranet and denying accesses from the intranet to certain Web sites (e.g., pornographic sites).

Fig. 4. User profile management proxy.

Our proposal takes advantage of this existing bottleneck and locates the user profile management agent there (Fig. 4). We refer to this extended proxy as a user profile management proxy.

The proposed architecture has the following advantages:

It makes it possible to build an agent that provides support for content-based filtering, for user profiles sharing, and for collaborative filtering between the users accessing the Internet via the same proxy.
It reduces service delays: the user profile management agent acts as a stream transducer and therefore does not require an extra IP connection for transferring the user profile between the agent and the personalized application.
It relieves the Web service providers from the user profile management tasks. This simplifies and standardizes the development of personalized Web applications, and it transfers the cost of storage to the client side.
It enables the user to access a personalized Web application from different computers as long as the access is done through the same proxy.
It is completely transparent to the browser and therefore is compatible with common browsers.

The use of a user profile management proxy is not restricted to companies that own a firewall; a Web provider can use it to centralize the authentication of its clients, to store the user profiles, and to share them between the services it provides.

Personalization services and behavior

The user profile management proxy acts as a regular proxy for any URL not known to be one providing a personalized service. To indicate the presence of a user profile management agent along the communication path, the agent adds a flag to each HTTP (Hypertext Transfer Protocol [15]) request before forwarding it.

Fig. 5. Communication flow for a regular access to a personalized application.

To be able to retrieve the user profiles owned by the user that issued an HTTP request and to prevent any abuse from inside the firewall, access to the proxy is protected by a username-password pair (as it is when a user is accessing a restricted WWW server).

Whenever the proxy receives an HTTP request, it checks to see if the pair of the target URL and the username is registered (Fig. 5). If it is, and if the corresponding user profile exists, the profile is retrieved from the database and added to the HTTP request, which is then forwarded to the Web server. The application uses the user profile for personalization purposes and then, if the user profile needs to be updated, includes a new profile within the HTTP response. As this response goes through the proxy, the profile is extracted, stored into the database, and removed from the HTTP response. This assures complete transparency with respect to the browser. Every time a user profile is stored, the agent updates the list of URLs known to be providing a personalized service.

Support for collaborative filtering is provided by managing individual user profiles that contribute to a group profile. We refer to an element of a group profile as a group profile component. For example, to create a personalized Web newspaper, each reader can contribute to a group profile that reflects the most popular articles. In this case, each group profile component is associated to a different user and contains a list of the best articles that user selected. The storage of a group profile component is similar to the storage of a regular user profile, but the retrieval is different because the user profile management agent has to transfer to the Web application all the group profile components of the group to which the user that issued the HTTP request belongs.

To provide user profile sharing, the user profile management agent generates a catalogue that enumerates the sharable profiles. Each entry of this catalogue contains not only the name of a user profile and its creator (name of the application, URL, etc.) but also a description of the profile's content. This description includes such information as a list of keywords, a textual summary of the content, a formal description of the syntax (e.g., profile type and position of fields). To free the user profile management agent from the tasks associated with managing the descriptions and to avoid the problems of duplicate storage in a distributed environment, we decided to create a separate text file for each user profile's description and to locate it at the same Web site as the Web application. The agent thus must store only the URL of the description text file associated with each user profile.

Fig. 6. Communication flow for the user profile sharing service.

The communication flow for sharing a user profile is divided into two steps (Fig. 6). The application first gets a catalogue of all the user profiles that belongs to the user that issued of the HTTP request. The application can then get more details by downloading the description text files of the candidate user profiles. Once the selection has been made, the application indicates to the user profile management agent which user profiles it wants to share. As a response to this request, the agent sends the content of the shared user profiles to the application. The shared user profiles are subsequently accessed via the regular "get" and "set" operations. Therefore, the process of selecting a user profile occurs only at the first access (when no user profile is available).

Prototyping the user profile management proxy.

Our user profile management proxy prototype provides all the services mentioned in the previous section: storage and retrieval of user profiles, sharing user profiles, and management of group profile components.

The prototype has been implemented on a Unix platform by inserting a user profile management agent into the publicly available CERN proxy-server [16]. The proxy authentication schema implemented follows the specification of the proposed standard HTTP 1.1 [17]. Since the common browsers are caching proxy authentication, the user has to be authenticated only once per browser session.

Fig. 7. Examples of communication with the extended Cookie syntax.

The prototype mediates the communications under HTTP and uses an extended version of the Cookie syntax to transmit the user profiles. As shown in Fig. 7, extra fields have been added to transfer the information necessary for extended services, but our syntax extension is upper compatible with Netscape's Cookie specifications [11]. In this implementation, we decided to extend the Cookie syntax instead of defining new HTTP headers, so that the user profile management agent stores not only the user profiles, but also all the cookies that would normally be handled by the browser.

The proxy always inserts the flag "UP/version" in the "via" header to indicate the presence and the version of a user profile agent. Given that information, the application can optimize the requests for services (plain cookies, catalogue or sharing services, etc.).

A program library has been built to ease and standardize the management of user profiles for CGI scripts written in Perl. This library provides high level routines to store and retrieve user profiles, filter and search a catalogue of user profiles, download description files, etc. The library optimizes the requests for the user profile services according to the version of the user profile management agent's flag. For example, if no flag is specified, the library stores user profiles as regular Cookies, and a request to get the catalogue of the existing user profiles is ignored.

Fig. 8. A screen of our personalized technical journal.

Personalized news on the World Wide Web.

To demonstrate the effectiveness of our user profile management proxy and evaluate its performance, we have created two personalized applications that use the full extent of the support it provides.

The first application is a personalized newspaper very similar to the Krakatoa Chronicle. This application presents news articles gathered from a newspaper publisher, and its personalization is the result of content-based learning and scoring engines. The scoring engine uses a user profile (list of keywords with weights) to rate daily articles. The articles which get the highest scores are considered the most relevant to the user and are presented first. Whenever the user has read an article, he can indicate the level of interest and the learning engine of the newspaper can update the user profile to reflect the new information.

The second application is a personalized technical journal. This application offers to scientists a journal containing abstracts of scientific papers gathered on the Internet. The scientific articles are gathered by a robot [18] that visits several Web sites known for hosting scientific papers. We use the public domain robot called Harvest [19] to download the technical papers, extract the abstracts, and store them in a database. Whenever the user accesses the technical journal, the engine retrieves the abstracts from the database, scores them to match the user's interest, and presents them in a way similar to the way our personalized newspaper presents articles (Fig. 8). Each abstract is linked to the original document, and the user can read the entire paper if the abstract seems interesting. This application also makes use of content-based learning and scoring engines in order to anticipate the relevance of articles for each user.

Scientists who are working on the same project usually have similar interests in technical papers, and scientists often mention a URL to a co-worker who is looking for information about a specific topic. We have therefore integrated the collaborative filtering technique to the learning and scoring engines of the technical journal. The learning engine generates not only the user profile containing the keywords, but also a group profile component which contains the references to the best-rated (by the user, not the engine) abstracts along with their scores. The scoring engine then implements collaborative filtering by first rating the articles according to the user profile containing the keywords and then by adjusting the scores according to the recommendations of the co-workers.

Since both applications require a user profile that reflects the user's interest, it is shared between the two applications. Whenever one of the applications is accessed and no user profile is associated with the HTTP request, the application queries the user profile management proxy about the existing user profiles. If the query reveals that the user profile associated with the other application exists, the newspaper requests the user profile management proxy to share the existing user profile between the two applications.

Discussion

Related Work

Brooks, Maze, Meeks, and Miller [20] propose to extend proxies by using specialized HTTP transducers (called OreO). A transducer is a specialized processing module that gets an input stream, performs some operations on it, and generates an output stream. Although the architectures of the OreO system and the user profile management proxy look very similar, in the OreO system, documents are personalized by installing a personalization oriented transducer at an OreO extended proxy, but a Web application has no control on the personalization performed by the transducer. In our architecture, the user profile management proxy provides support by managing the necessary user profiles but does not itself perform any personalization. This task is performed by the Web application which can therefore keep full control on the type and quality of personalization performed.

Performance evaluation

To estimate the performance of the user profile management proxy, we have measured the time elapsed between the reception of an HTTP request at the proxy and the forwarding of the corresponding HTTP response. These measures have been held when the user sends a feedback to two different versions of the personalized newspaper presented earlier. Both applications have a similar personalization engine, but in the first case the storage of the user profile is done by the user profile management proxy and in the second case the application stores the profiles at the server. Our measures show that the proxy based architecture is about 20-30% slower than the server based architecture. This result should be considered as a "worst case" because, in order to have identical performance for storage, the user profiles management proxy and both newspaper applications are running on the same machine implying that the network latency between the proxy and the server is minimum. From the point of view of the user, the delay added by the user profile management agent is imperceptible because it is in the range of half a second, much smaller than the delay due to the transfer of the HTTP request and HTTP response through the network.

No measurement has been held when a personalized Web application uses the catalogue/share services. However, the user can notice some extra delay - especially when the application has to download several user profile's description files. These services though, are designed to be used only in special circumstances; for example, when a user accesses a personalized application for the first time.

Future work

Our current aim is to gain experience in the design and implementation of personalized Web applications that are supported by our user profile management agent. One ongoing project is the improvement of the personalization of the newspaper and the technical journal by combining content-based filtering and collaborative filtering more effectively.

Another goal is to design a system in which user profile management proxies cooperate. The current model does not enforce storage consistency when user profile management proxies are cascaded, as within a company (group, department, and company proxies). Cascaded proxies can also be used to provide hierarchical collaborative filtering: between members of a group, members of a department, etc.

We are also considering the use of a distributed database for the storage of the user profiles. Some firewalls have several proxies that connect, in parallel, the intranet to the Internet. In this situation, the user should have transparent access to his user profile regardless of the proxy he is using.

Conclusion

We have pointed out the limitations of the current user profile management architectures and have proposed the concept of a user profile management agent providing a standardized support for personalized World Wide Web applications. An architecture in which this agent is located at a proxy instead of at the browsers or servers enables the use of advanced personalization techniques (including user profile sharing and collaborative filtering), transfers the storage of user profiles to the client side, and is efficient in terms of service latency and network load.

Several issues, such as the dynamic configuration of groups for collaborative filtering support and how user profile management proxies can cooperate, have not yet been resolved. Nevertheless, the prototype and the personalized Web applications implemented have already demonstrated that support for personalized Web applications can be efficiently implemented as a user profile management proxy.

Personalization is an important technology for presenting effective information to people. We think our user profile management agent can help Web service providers manage personalized services more easily and can thus encourage the development of personalized applications on the World Wide Web.