Tomonari Kamba
Krishna Bharat
Michael C. Albers
Although non-commercial usage has dominated the Internet for a long time, some commercial applications have been emerging recently, at a rate which is increasing. According to Outing [1], "more than 230 supplemental online services are operated or under development by newspapers worldwide, an increase of about 130% since the end of 1994". Of these, online newspapers are particularly well suited for the WWW, since the web readily facilitates information retrieval, presentation, and to some extent, layout. However, there are a lot of challenges to be met before they become as pervasive as their hard-copy counterparts. Some of the issues are social and some are technical. While printed hard-copy newspapers tend to be more portable and easier to manipulate, online newspapers have a powerful argument in their favor - personalization.
Bogart notes that personalization has a strong appeal to newspaper readers [2]. Practical considerations have prevented them from being realized under the conventional hard-copy publishing setup. Web newspapers are not subject to the constraints of printed matter. Their reach is equally large and electronic dissemination allows newsfeed to be custom-tailored for individual users. The presentation can be personalized in terms of contents, layout, media (text, graphics, video etc.), advertisement and so on. Newspapers have two important social functions to perform: education and entertainment. Personalization may seem to enhance the latter at the expense of the former. Hence we would need a mechanism to mix in news items with either great popular appeal or high intrinsic value (in the editor's opinion), into the set of articles that match the user's interests. To allow multiple perspectives into the same news-feed, the user should have the ability to dynamically affect the way in which the two kinds of articles are combined. This functionality would allow the newspaper to keep pace with changes in reading style as time passes. For instance, a user may want headlines and matters of public interest in the morning, and gradually move to a newspaper composed of articles with personal appeal by evening.
In this paper, we describe an experimental system which implements an interactive, personalized newspaper on the WWW. Some of the parameters for personalization are computed at the server end, based on user profiles and the composition of the newsfeed. Personalized layout happens at the client end, based on other parameters under user control. There are already many online newspapers on the web, such as those accessible from the list of "Dailies"[3]; some of which provide personalization (e.g. CRAYON [4], Fishwrap [5]). However, these have been subject to the limitations of HTML [6]. The precise formatting and multi-column layout one is accustomed to in printed newspapers is hard to support. Interactivity is restricted to point and click style interaction and changes take a long time to occur, due to the high, client-server, round-trip latency and the restrictive update model in standard HTTP [7]. Since HTML is not dynamically extensible and tends to evolve slowly, it is unlikely that the custom widgets needed to program a newspaper will ever be supported. Hence we turned to a new paradigm, the embedded Java application (or "applet") feature available in Java-compliant browsers such as HotJava [8] and (future versions of) NetScape [9].
Java [8] is an object-oriented, programming language which can be compiled to architecture-neutral, byte-code for safe execution within a Java virtual machine. A Java applet is a java program designed specifically to be embedded in HTML documents. Java applets can implement arbitrary user interfaces, and can communicate with other entities over the network. A Java-aware browser is a WWW browser that embeds Java virtual machine and can handle applet tags. Since the downloaded code runs on the client locally, fairly involved computation and interactive, custom-designed, user interfaces can be supported. In addition, Java has a library for handling TCP/IP protocols and can access remote objects via URLs [10] easily, which allows us to have continuous bi-directional communication between server and client.
The Krakatoa Chronicle provides some interactive features that other online newspapers do not:
Figure 1 shows the system architecture of the Krakatoa Chronicle. The system consists of two parts: the server end and the client end.
Figure 2 shows the user's view of a session with the personalized newspaper, and Figure 3 shows the communication that occurs between the server and the client during this process.
Selecting the "Change Profiles" button is not mandatory, but with this, the user can bootstrap his/her profile before the first newspaper is created. This allows users to maintain a part of their profile explicitly, by specifying topics of interest/disinterest using keywords. These "explicitly specified" keywords differ from "implicitly extracted" (inferred) keywords in that they have either the maximum or minimum possible weight and are not subject to automatic change over time. The weights for explicitly specified keywords are not modified by implicit feedback. Even if the user doesn't provide feedback explicitly, the system can automatically create and modify personal profiles during the user's reading process by observing the user's actions.
When the user chooses the "Create Today's Newspaper" button, each article's personal and community weights are computed, and a java newspaper applet is composed by the cgi-script and sent to the client. The applet computes the layout using each article's weight and other factors, and displays the composite newspaper after getting formatted pieces of text from the server. When the user chooses the "Read Today's Newspaper" button, the last newspaper viewed by the user is sent over. This allows users to return to the same newspaper even though their profiles have changed.
Figure 4 shows a typical screen of The Krakatoa Chronicle. It is the first web newspaper to attempt a realistic rendering of a newspaper with a multi-column format to resemble actual newspapers, and utilize embedded widgets for convenient browsing. The newspaper is divided into a set of pages, with articles of greater importance appearing earlier in the sequence. A page-scrollbar is available for browsing at the page level. Each page is divided into a set of article widgets, separated by 'bars' which can be dragged. An article widget holds the contents of an article and supports various browsing techniques.
In the Krakatoa Chronicle, this flexible layout control was implemented by having the applet code on the client fetch articles from the server side (see Figure 6). A client-side cache allows for pre-fetching and significantly reduces the cost of browsing and re-layout. Whenever the user changes the setting of layout parameters (via the sliders), a new layout is computed. Article files are cached, and the agent fetches new articles from the server end only if needed.
The Krakatoa Chronicle provides interaction techniques to support browsing in ways a hard-copy newspaper cannot.
A set of server end programs fetch each day's articles, translate them into plain text, index them using SMART, and help users register and log on. This is implemented using cgi-scripts written in Perl. Code size of the script to maintain the database and create newspaper is about 500 lines, and the one to handle the login sequence, create new accounts and interact with the user is about 2000 lines. A set of client end programs manage the layout of the newspaper and user interaction. They are written in Java Release 1.0 Alpha3. Code size is about 3000 lines.
We get about 100 articles a day, and the size of each article ranges from 10-200 Kbytes without images. The batch process to index all the articles takes about half an hour and requires the system to be shut down, and is done at a time when demand is low. On a Sun SPARCstation 10 on our Ethernet LAN environment, it takes about 8 seconds to compute each article's score, and approximately 8 seconds to retrieve an article from the server to the client as a single-threaded process. On an initial visit to a newspaper page with six articles, about fifty seconds are required to compute the newspaper, bring it over and display the entire page's contents (see Figure 8).
Of the many ways to build a user profile, the most common and simplest way is to ask the user to type in keywords that he/she is interested in [17]. With this approach, it is difficult to track changes in a user's interest over time and topics of interest they are not conscious of. An indirect way is to ask the user to provide a score for articles he/she reads, and to compute the weights of keywords from the score [18][19]. This method also requires the user's conscious involvement, which is often annoying. A more subtle way is to consider the time that the user spends on each article, as in prior work on USENET News [20], where it seems to have worked fairly well. Unfortunately they insisted that subjects devote their entire concentration to the task of reading the article and not take breaks, which is not a very realistic solution. In the Krakatoa Chronicle, we included some of these methods, namely typing explicit keywords and/or adding a score to each article. However, the system works even without explicit user involvement. Since anticipated interest values are explicitly represented on the score bars the user will notice that the profile has indicated an erroneous value and will then proceed to give explicit feedback which will correct the profile.
We plan to have about thirty subjects reading the Krakatoa Chronicle newspaper daily. After the experiment runs for several weeks, we will reassess the manner in which user actions affect user profiles. Currently, the importance of operations such as "Peek at an Article" is arbitrarily decided. We are also planning to to get statistics on the way people will browse the newspaper, given the choice of interaction techniques they have. In future work, we plan to include dynamic components into a newspaper framework. These would include billboards, live maps, crossword puzzles, shared whiteboards, animated comic strips etc. These are easily implemented and can use a similar scoring/personalization mechanism.
We have developed a highly interactive, personalized newspaper on the WWW. It is implemented as an active agent that runs within the web-browser as the newspaper is being displayed. Its main features are realistic rendering, dynamic layout control, interactivity and implicit feedback leading to personalization without conscious user involvement. Information personalization is very important on the WWW, especially for commercial services seeking to provide a value-added service for a set of registered users. Our approach to personalization has applicability to other multimedia services on the web as well.
We wish to thank the 'News and Observer' for permitting us to use their news articles for our experiment.
1. MediaInfo Interactive e-newspapers main menu, MediaInfo Interactive, 1995. <URL:http://www.mediainfo.com/edpub/e-papers.home.page.html>
2. Leo Bogart, Press and Public: who reads what, when, where, and why in American newspapers, Lawrence Erlbaum Associates Publishers, 1989.
3. Dailies:U.S.Newspaper Services on the Internet,<URL:http://marketplace.com/e-papers.list.www/e-papers.us.dailies.html>
4. CRAYON,<URL:http://sun.bucknell.edu/~boulter/crayon/>
5. Fishwrap, <URL:http://fishwrap.mit.edu/>
6. HyperText Markup Language (HTML): Working and Background Materials, <URL:http://www.w3.org/hypertext/WWW/Protocols/Overview.html>
8. HotJava Home Page, <URL:http://java.sun.com>
9. Introducing Netscape Navigator 2.0 and Netscape Navigator Gold 2.0, <URL:http://www.ncsa.uiuc.edu/demoweb/url-primer.html>
11. The NandO Times,<URL:http://www.nando.net/newsroom/nt/nando.html>
12. SMART, <URL:ftp://ftp.cs.cornell.edu/pub/smart>
13. Community-Based Navigation,<URL:http://www.ncsa.uiuc.edu/SDG/IT94/Proceedings/HCI/hill/home-page.html>
14. The Webhound WWW Document Filtering System,<URL:http://webhound.www.media.mit.edu/projects/webhound/doc/>
15. Upendra Shardanand and Patti Maes, Social Information Filtering: Algorithms for Automating "Word of Mouth", Proceedings of CHI'95, 1995. <URL:http://agents.www.media.mit.edu/groups/agents/papers/ringo/chi-95-paper.ps>
16. Overview of CGI,<URL:http://www.w3.org/hypertext/WWW/CGI/Overview.html>
17. Tak W. Yan and Hector Garcia-Molina, SIFT - A Tool for Wide-Area Information Dissemination, USENIX Technical Conference, 1995, pp. 177-186. <URL:ftp://db.stanford.edu/pub/sift/sift.ps>
18. Ken Lang, NewsWeeder: Learning to Filter Netnews, ML95, <URL:http://anther.learning.cs.cmu.edu/ml95.ps>
19. Kenrick J. Mock and V. Rao Vemuri, Adaptive User Models for Intelligent Information Filtering, <URL:http://www.glue.umd.edu/enee/medlab/filter/gwic.ps>
20. Masahiro Morita and Yoichi Shinoda, Information Filtering Based on User Behavior Analysis and Best Match Text Retrieval, SIGIR'94, 1994
Tomonari Kamba
Graphics, Visualization, & Usability Center
College of Computing
Georgia Institute of Technology
Atlanta, GA 30332-0280
email: tomo@cc.gatech.edu
Tomonari Kamba received his B.E. and M.E. in Electronics
from the University of Tokyo in 1984 and 1986 respectively,
and joined NEC corporation. Currently, he is a visiting scientist
at the Graphics, Visualization & Usability Center at the
College of Computing, Georgia Institute of Technology.
His research interests include multimedia user interfaces,
mobile computing and online information services.
Krishna Bharat
Graphics, Visualization, & Usability Center
College of Computing
Georgia Institute of Technology
Atlanta, GA 30332-0280
email: kb@cc.gatech.edu
Krishna Bharat received his B.Tech. in Computer Science from
Indian Institute of Technology, Madras in 1991, and M.S.
from Georgia Institute of Technology in 1993. He is currently
pursuing his doctoral research in distributed user interfaces
at the College of Computing, Georgia Tech.
Michael C. Albers
School of Industrial & Systems Engineering
Center for Human-Machine Systems Research
Georgia Institute of Technology
Atlanta, GA 30332-0205
email: malber@isye.gatech.edu
Michael C. Albers is a masters student at the
School of Industrial & Systems Engineering - Center for
Human-Machine Systems Research, Georgia Institute
of Technology.