The Krakatoa Chronicle - An Interactive, Personalized, Newspaper on the Web

Tomonari Kamba
Krishna Bharat
Michael C. Albers

Abstract:: This paper describes The Krakatoa Chronicle, a highly interactive, personalized newspaper on the World Wide Web (WWW). It is intended for Java-savvy WWW browsers such as HotJava, and is architecturally quite different from conventional web-based newspapers. Its high interactivity and powerful personalization are the result of sending an interactive agent along with the text of the newspaper, to operate within the user's web-browser. The agent keeps a network connection open to the WWW server site to fetch resources dynamically, and for updating the user's personal profile as it garners relevance feedback. Users are provided a variety of article browsing options such as scrolling, maximizing, resizing and the ability to "peek". These operations not only enhance the reading experience in ways conventional newspapers cannot, but also transparently provide information to our agent about the user's locus of interest. Ours is the first web newspaper to attempt a realistic rendering of a newspaper, with a multi-column format and embedded, custom widgets for easy browsing. We provide dynamic layout control mechanisms that let the user specify how personal and community interests should be interpreted by the agent in designing the layout.
Keywords:: online newspaper, personalization, information retrieval, interactive agent, active documents, automatic layout.

Introduction

Although non-commercial usage has dominated the Internet for a long time, some commercial applications have been emerging recently, at a rate which is increasing. According to Outing [1], "more than 230 supplemental online services are operated or under development by newspapers worldwide, an increase of about 130% since the end of 1994". Of these, online newspapers are particularly well suited for the WWW, since the web readily facilitates information retrieval, presentation, and to some extent, layout. However, there are a lot of challenges to be met before they become as pervasive as their hard-copy counterparts. Some of the issues are social and some are technical. While printed hard-copy newspapers tend to be more portable and easier to manipulate, online newspapers have a powerful argument in their favor - personalization.

Bogart notes that personalization has a strong appeal to newspaper readers [2]. Practical considerations have prevented them from being realized under the conventional hard-copy publishing setup. Web newspapers are not subject to the constraints of printed matter. Their reach is equally large and electronic dissemination allows newsfeed to be custom-tailored for individual users. The presentation can be personalized in terms of contents, layout, media (text, graphics, video etc.), advertisement and so on. Newspapers have two important social functions to perform: education and entertainment. Personalization may seem to enhance the latter at the expense of the former. Hence we would need a mechanism to mix in news items with either great popular appeal or high intrinsic value (in the editor's opinion), into the set of articles that match the user's interests. To allow multiple perspectives into the same news-feed, the user should have the ability to dynamically affect the way in which the two kinds of articles are combined. This functionality would allow the newspaper to keep pace with changes in reading style as time passes. For instance, a user may want headlines and matters of public interest in the morning, and gradually move to a newspaper composed of articles with personal appeal by evening.

In this paper, we describe an experimental system which implements an interactive, personalized newspaper on the WWW. Some of the parameters for personalization are computed at the server end, based on user profiles and the composition of the newsfeed. Personalized layout happens at the client end, based on other parameters under user control. There are already many online newspapers on the web, such as those accessible from the list of "Dailies"[3]; some of which provide personalization (e.g. CRAYON [4], Fishwrap [5]). However, these have been subject to the limitations of HTML [6]. The precise formatting and multi-column layout one is accustomed to in printed newspapers is hard to support. Interactivity is restricted to point and click style interaction and changes take a long time to occur, due to the high, client-server, round-trip latency and the restrictive update model in standard HTTP [7]. Since HTML is not dynamically extensible and tends to evolve slowly, it is unlikely that the custom widgets needed to program a newspaper will ever be supported. Hence we turned to a new paradigm, the embedded Java application (or "applet") feature available in Java-compliant browsers such as HotJava [8] and (future versions of) NetScape [9].

Java [8] is an object-oriented, programming language which can be compiled to architecture-neutral, byte-code for safe execution within a Java virtual machine. A Java applet is a java program designed specifically to be embedded in HTML documents. Java applets can implement arbitrary user interfaces, and can communicate with other entities over the network. A Java-aware browser is a WWW browser that embeds Java virtual machine and can handle applet tags. Since the downloaded code runs on the client locally, fairly involved computation and interactive, custom-designed, user interfaces can be supported. In addition, Java has a library for handling TCP/IP protocols and can access remote objects via URLs [10] easily, which allows us to have continuous bi-directional communication between server and client.

The Krakatoa Chronicle provides some interactive features that other online newspapers do not:

Flexible Layout Control. Users can change the syntactic/semantic layout strategy of the newspaper while browsing. For example, they can change the number of articles on the screen with immediate re-layout. They can also change how their personal interests and other people's interests are combined to decide the layout; personal interests are represented as a user profile, which is a mapping from a set of keywords to weights.
Implicit and Immediate Reflection of User Interests. A user's profile is modified by the explicit feedback provided by the user on the relevance of various articles, and when this is unavailable, from implicit feedback, derived from observations made by the embedded Java agent. The agent observes the manner in which the user interacts with the articles in the document, and based on the time spent, the interaction techniques used (e.g. scrolling, peeking at, maximizing, resizing), it tries to estimate the user's interest and modifies the user's profile suitably. Users can provide explicit feedback about an article employing that article's attached score bar. They can implement part of their profile explicitly by managing a set of keywords.

In the following sections, we will describe our system architecture and implementation, and also how the personalization strategy adopted in the Krakatoa Chronicle can be applied to other interactive applications on the WWW.

System Architecture

Figure 1 shows the system architecture of the Krakatoa Chronicle. The system consists of two parts: the server end and the client end.

Figure 1. System architecture of the Krakatoa Chronicle

The server end

The server-side software does user authentication, manages user-profiles, collects and processes the articles from the news source, builds index databases, and computes some of the parameters needed to do the layout of the newspaper at the client end.

Management of everyday articles. First, articles are to be collected from several news sites daily, and changed into plain text for content-analysis and re-formatting. Currently we get articles from a single site, the News and Observer [11], with their permission. Then, a document vector is computed for each article. Since our research focus is not on indexing, we used a well-known indexing engine, SMART [12] to convert articles into document vectors. Articles are converted into representative term-vectors, via the TFIDF (term frequency times inverse document frequency) metric. A term-vector is a set of keywords and associated weights. The weight of a term shows how well it represents the article within the current set.
Management of profiles. The server end maintains user profiles. Each user profile has almost the same format as a document vector. The weight of each keyword represents the system's reckoning of the user's interest in the keyword. It is computed when feedback is given. Feedback provides a score for the whole article which is then used to compute scores for individual keywords in its document-vector. Then it is integrated into the user's profile. If a keyword receives both positive and negative feedback it will cease to be a good indicator of the user's interest after a period of time, as its cumulative score will go to zero by a process of averaging. Instead if it continuously receives either a positive or negative score, it will become a good indicator by virtue of the significant cumulative score it receives. For computing a community score for each article, the scores of individual users are averaged over the community. A community is a group of users with similar interests who would like to benefit from each other's preferences [13][14][15], but in the simplest case it could include all the users of the system.
As interaction proceeds, the user's browser provides relevance feedback to the server end, and the user profile is modified.
Computing each article's weight. The server computes each article's weight for a specific user based on how well the article's document vector and the user's profile match. Once the document set has been indexed, we use the personal profile as a query to the SMART engine to compute the weight.

It is worth mentioning that there is no live process at the server end save the HTTP daemon, which spawns cgi-scripts [16] written in Perl when necessary. Indexing is done off-line and in batch mode whenever articles are added.

The client end

The client end manages user interaction and newspaper layout within the browser. The code needed to drive the presentation and interaction is downloaded when the user accesses the newspaper's web page. Subsequently, the code runs locally and may periodically contact the server site to fetch documents or provide feedback. Since the layout is computed at the client on the fly, the user can change its strategy flexibly. The user can scroll, peek, maximize, resize or save an article to a scrapbook. When the user performs these operations, it is taken to be an indication of the user's interest in the article and gets reflected in the user profile. The user can also explicitly specify his/her personal interest in each article selecting the score on a feedback bar. Details of layout control and interaction will be described later.

Creating/Reading a Newspaper

Figure 2 shows the user's view of a session with the personalized newspaper, and Figure 3 shows the communication that occurs between the server and the client during this process.

Figure 2. Reading a newspaper (User's View)

Figure 3. Communication between the server and the client

First, the user types in their user ID and password within a login form. After authentication by the cgi-script on the server, the user can select either "Change Profiles", "Create Today's Newspaper", or "Read Today's Newspaper". The "Read Today's Newspaper" option is available only if at least one version of the day's newspaper is present (having been created earlier).

Selecting the "Change Profiles" button is not mandatory, but with this, the user can bootstrap his/her profile before the first newspaper is created. This allows users to maintain a part of their profile explicitly, by specifying topics of interest/disinterest using keywords. These "explicitly specified" keywords differ from "implicitly extracted" (inferred) keywords in that they have either the maximum or minimum possible weight and are not subject to automatic change over time. The weights for explicitly specified keywords are not modified by implicit feedback. Even if the user doesn't provide feedback explicitly, the system can automatically create and modify personal profiles during the user's reading process by observing the user's actions.

When the user chooses the "Create Today's Newspaper" button, each article's personal and community weights are computed, and a java newspaper applet is composed by the cgi-script and sent to the client. The applet computes the layout using each article's weight and other factors, and displays the composite newspaper after getting formatted pieces of text from the server. When the user chooses the "Read Today's Newspaper" button, the last newspaper viewed by the user is sent over. This allows users to return to the same newspaper even though their profiles have changed.

Layout Control

Figure 4 shows a typical screen of The Krakatoa Chronicle. It is the first web newspaper to attempt a realistic rendering of a newspaper with a multi-column format to resemble actual newspapers, and utilize embedded widgets for convenient browsing. The newspaper is divided into a set of pages, with articles of greater importance appearing earlier in the sequence. A page-scrollbar is available for browsing at the page level. Each page is divided into a set of article widgets, separated by 'bars' which can be dragged. An article widget holds the contents of an article and supports various browsing techniques.

Figure 4. Example of the Krakatoa Chronicle screen

We will not discuss the layout process in detail. We make use of the following criteria:

Articles should be laid out in their entirety whenever possible.
The title should be placed in a single line preferably. The size of the title is a factor in deciding how many columns to allocate a given article.
Rectangular tesselations of the page are adequate.
Pictures may need to be scaled down to fit in the space available and linked to the original.
Single-column articles should be scrolled vertically, and multi-column articles should be scrolled horizontally.

An important feature that differentiate the Krakatoa Chronicle from other WWW-based newspapers is its flexible layout control. Layout is a function of several parameters: the score that each article receives based on the user's profile (the user score), the average score received by each article over the community of users (the community score), and also the size and composition of each article (e.g. title length, the number of pictures). Since there are many ways in which these parameters may be combined to decide the final layout, we give users a set of controls which they can manipulate to dynamically change the significance of various layout parameters. Specifically the following parameters can be controlled (see Figure 5):

User score vs. community score. This decides the ratio in which the scores are combined in deciding the layout. The order of articles (left-top is the most important and right-bottom is the least) for a user is decided by each article's score, and the score is a function of the user's score and the community score:
final_score = personal_score × r + community_score × (1-r)
wherein 0 <= r <= 1
This factor is important from a social point of view as no two people will want the same mix of personal and community content, and a given user may want a different mix at different times.
Sensitivity factor. This decides how significant the scores are in deciding the space allocated to each article. If this variable is large, important articles will have much more space than articles of lesser importance, but if this variable is small, article will enjoy approximately equal portions of screen real-estate, although an article's importance will still affect its position.
Density of articles per page. This decides how many articles will be shown on the page. If this variable is large, all the articles will be shrunk while keeping to the inter-article ratios dictated by the sensitivity factor.

Figure 5. Layout parameters

We hope to let users arrive at a comfortable setting for these controls, over the course of the experiment, and probably learn something in the process. The ability to change layout settings flexibly will allow users to obtain multiple views of the newspaper. We believe these multiple views show users what the community is interested in while allowing for custom views based on (previous) user feedback.

In the Krakatoa Chronicle, this flexible layout control was implemented by having the applet code on the client fetch articles from the server side (see Figure 6). A client-side cache allows for pre-fetching and significantly reduces the cost of browsing and re-layout. Whenever the user changes the setting of layout parameters (via the sliders), a new layout is computed. Article files are cached, and the agent fetches new articles from the server end only if needed.

Figure 6. Layout control mechanism

Interaction

The Krakatoa Chronicle provides interaction techniques to support browsing in ways a hard-copy newspaper cannot.

Scroll/Automatic scroll. This slides the article vertically if it is single column, or horizontally if it is multi-column to prevent discontinuities in the flow of text.
Peek. This displays the article over the entire screen as long as the mouse is held down over the peek button in the top left corner.
Maximize/Revert. This is like "Peek" but requires double clicking on the body of the article to switch states.
Resize. The user resizes the article by dragging a bounding box, and permanently affects layout.
Save to a scrapbook. This saves the article to a scrapbook, which is a page with links to all the articles that the user saved.

From the system's point of view, all these interactions give feedback about the relevance of the article to various degrees (see Figure 7). When the user scrolls, peeks at, maximizes, resizes, or saves an article to a scrapbook, the Krakatoa Chronicle increments the user's interest in the article by a corresponding amount, and subsequently changes the personal profile. There is a score bar beside each article which shows both personal score and community score at the same time, and the user can see changes in the score immediately. The score bar ranges from "Not Relevant" to "Very Relevant" through "No Comment". Initially, this score shows the user's predicted interest, estimated from the user profile and the article's document vector.

Figure 7. User operation and feedback

Once explicit feedback is given by dragging the attached score bar, no further implicit feedback is given for the article. The user's feedback is taken to be the final word. The community score is the average of all the users of the group that the user belongs to, and it cannot be controlled directly by the user. This score shows which articles other people are interested in, and will help the user to understand "generally important articles". This score is important in the absence of human editors to assign intrinsic scores to articles.

Discussion and Future Work

A set of server end programs fetch each day's articles, translate them into plain text, index them using SMART, and help users register and log on. This is implemented using cgi-scripts written in Perl. Code size of the script to maintain the database and create newspaper is about 500 lines, and the one to handle the login sequence, create new accounts and interact with the user is about 2000 lines. A set of client end programs manage the layout of the newspaper and user interaction. They are written in Java Release 1.0 Alpha3. Code size is about 3000 lines.

We get about 100 articles a day, and the size of each article ranges from 10-200 Kbytes without images. The batch process to index all the articles takes about half an hour and requires the system to be shut down, and is done at a time when demand is low. On a Sun SPARCstation 10 on our Ethernet LAN environment, it takes about 8 seconds to compute each article's score, and approximately 8 seconds to retrieve an article from the server to the client as a single-threaded process. On an initial visit to a newspaper page with six articles, about fifty seconds are required to compute the newspaper, bring it over and display the entire page's contents (see Figure 8).

Figure 8. Time to show a newspaper

We have also implemented a version of our personalized online newspaper, to be shown on conventional (non java-compliant) browsers such as Mosaic. Its architecture is similar to other personalizable newspapers on the WWW like Fishwrap [5]. The indexing and user profile/document vector management functions are handled by the same system that implements the Krakatoa Chronicle's backend. However, instead of sending over an applet, it follows the traditional strategy of generating HTML pages for each view into the newspaper, with links to other views. There is no way to dynamically change the layout or get implicit feedback. The newspaper has a listing of articles ordered and font-size coded by relevance, with links to individual articles. The user can show explicit interest in individual articles, by clicking on a color-coded score bar, attached to the article which modifies the user profile.

Of the many ways to build a user profile, the most common and simplest way is to ask the user to type in keywords that he/she is interested in [17]. With this approach, it is difficult to track changes in a user's interest over time and topics of interest they are not conscious of. An indirect way is to ask the user to provide a score for articles he/she reads, and to compute the weights of keywords from the score [18][19]. This method also requires the user's conscious involvement, which is often annoying. A more subtle way is to consider the time that the user spends on each article, as in prior work on USENET News [20], where it seems to have worked fairly well. Unfortunately they insisted that subjects devote their entire concentration to the task of reading the article and not take breaks, which is not a very realistic solution. In the Krakatoa Chronicle, we included some of these methods, namely typing explicit keywords and/or adding a score to each article. However, the system works even without explicit user involvement. Since anticipated interest values are explicitly represented on the score bars the user will notice that the profile has indicated an erroneous value and will then proceed to give explicit feedback which will correct the profile.

We plan to have about thirty subjects reading the Krakatoa Chronicle newspaper daily. After the experiment runs for several weeks, we will reassess the manner in which user actions affect user profiles. Currently, the importance of operations such as "Peek at an Article" is arbitrarily decided. We are also planning to to get statistics on the way people will browse the newspaper, given the choice of interaction techniques they have. In future work, we plan to include dynamic components into a newspaper framework. These would include billboards, live maps, crossword puzzles, shared whiteboards, animated comic strips etc. These are easily implemented and can use a similar scoring/personalization mechanism.

Conclusion

We have developed a highly interactive, personalized newspaper on the WWW. It is implemented as an active agent that runs within the web-browser as the newspaper is being displayed. Its main features are realistic rendering, dynamic layout control, interactivity and implicit feedback leading to personalization without conscious user involvement. Information personalization is very important on the WWW, especially for commercial services seeking to provide a value-added service for a set of registered users. Our approach to personalization has applicability to other multimedia services on the web as well.

Acknowledgment

We wish to thank the 'News and Observer' for permitting us to use their news articles for our experiment.

Notes

The Krakatoa Chronicle is named after the active volcano, Krakatoa, in the Java Sea, which is literally a piece of 'Hot Java'.

References

1. MediaInfo Interactive e-newspapers main menu, MediaInfo Interactive, 1995. <URL:http://www.mediainfo.com/edpub/e-papers.home.page.html>

2. Leo Bogart, Press and Public: who reads what, when, where, and why in American newspapers, Lawrence Erlbaum Associates Publishers, 1989.

3. Dailies:U.S.Newspaper Services on the Internet,<URL:http://marketplace.com/e-papers.list.www/e-papers.us.dailies.html>

4. CRAYON,<URL:http://sun.bucknell.edu/~boulter/crayon/>

5. Fishwrap, <URL:http://fishwrap.mit.edu/>

6. HyperText Markup Language (HTML): Working and Background Materials, <URL:http://www.w3.org/hypertext/WWW/Protocols/Overview.html>

8. HotJava Home Page, <URL:http://java.sun.com>

9. Introducing Netscape Navigator 2.0 and Netscape Navigator Gold 2.0, <URL:http://www.ncsa.uiuc.edu/demoweb/url-primer.html>

11. The NandO Times,<URL:http://www.nando.net/newsroom/nt/nando.html>

12. SMART, <URL:ftp://ftp.cs.cornell.edu/pub/smart>

13. Community-Based Navigation,<URL:http://www.ncsa.uiuc.edu/SDG/IT94/Proceedings/HCI/hill/home-page.html>

14. The Webhound WWW Document Filtering System,<URL:http://webhound.www.media.mit.edu/projects/webhound/doc/>

15. Upendra Shardanand and Patti Maes, Social Information Filtering: Algorithms for Automating "Word of Mouth", Proceedings of CHI'95, 1995. <URL:http://agents.www.media.mit.edu/groups/agents/papers/ringo/chi-95-paper.ps>

16. Overview of CGI,<URL:http://www.w3.org/hypertext/WWW/CGI/Overview.html>

17. Tak W. Yan and Hector Garcia-Molina, SIFT - A Tool for Wide-Area Information Dissemination, USENIX Technical Conference, 1995, pp. 177-186. <URL:ftp://db.stanford.edu/pub/sift/sift.ps>

18. Ken Lang, NewsWeeder: Learning to Filter Netnews, ML95, <URL:http://anther.learning.cs.cmu.edu/ml95.ps>

19. Kenrick J. Mock and V. Rao Vemuri, Adaptive User Models for Intelligent Information Filtering, <URL:http://www.glue.umd.edu/enee/medlab/filter/gwic.ps>

20. Masahiro Morita and Yoichi Shinoda, Information Filtering Based on User Behavior Analysis and Best Match Text Retrieval, SIGIR'94, 1994

About the Authors

Tomonari Kamba
Graphics, Visualization, & Usability Center
College of Computing
Georgia Institute of Technology
Atlanta, GA 30332-0280
email: tomo@cc.gatech.edu
Tomonari Kamba received his B.E. and M.E. in Electronics from the University of Tokyo in 1984 and 1986 respectively, and joined NEC corporation. Currently, he is a visiting scientist at the Graphics, Visualization & Usability Center at the College of Computing, Georgia Institute of Technology. His research interests include multimedia user interfaces, mobile computing and online information services.

Krishna Bharat
Graphics, Visualization, & Usability Center
College of Computing
Georgia Institute of Technology
Atlanta, GA 30332-0280
email: kb@cc.gatech.edu
Krishna Bharat received his B.Tech. in Computer Science from Indian Institute of Technology, Madras in 1991, and M.S. from Georgia Institute of Technology in 1993. He is currently pursuing his doctoral research in distributed user interfaces at the College of Computing, Georgia Tech.

Michael C. Albers
School of Industrial & Systems Engineering
Center for Human-Machine Systems Research
Georgia Institute of Technology
Atlanta, GA 30332-0205
email: malber@isye.gatech.edu
Michael C. Albers is a masters student at the School of Industrial & Systems Engineering - Center for Human-Machine Systems Research, Georgia Institute of Technology.