K.A. Oostendorp, W.F. Punch, and R.W. Wiggins
Intelligent Systems Lab, Michigan State University
E. Lansing, MI. , oostend2@cps.msu.edu, punch@cps.msu.edu
Computer Center, Michigan State University
E. Lansing, MI., rwwmaint@msu.edu
Our approach, therefore, is to create a navigation tool which copes with Internet complexity at the individual, rather than the organizational, level. This tool, PAINT (Personalized, Adaptive Internet Navigation Tool), allows the user to impose a hierarchical organization on Internet sites and documents of interest by creating categories under which to group sites. Such categorization can be used not only by an individual user, but also can be shared among groups of users with similar interests. PAINT will also provide local automatic classification based on user parameters and user behavior. That is, PAINT will record visited locations and categorize them according to past use. The user is then free to examine the automated organization, modify it, and make it a personalized view of the Internet. In our report, we will describe the PAINT tool, its use, and some preliminary investigations of local, automatic categorization.
The loose conglomeration of network-connected computer sites known as the ``Internet'' is showing growth that is nothing short of astounding, comparable only to the growth in computer chip capabilities/size-reductions seen in the previous decade. In the decade from 1984 to 1994, the Internet has grown from approximately 1000 computer hosts to over 2,000,000, a growth of 2000 times! For each of these computer hosts, it is conservatively estimated that at least ten users are granted access (on average) through each host, giving 20,000,000 direct access to Internet. Moreover, this growth rate is continuing with the number of hosts increasing by ten percent a month(pg.9-10, [Wiggins 1995]). Although recent research has shown these numbers to be an overestimate, the growth rate is still phenomenal and likely to continue.
This incredible growth has created a situation comparable to one in which many large-database administrators and users find themselves, the problem of data-mining. In databases, the database is a large resource that must be examined or ``mined'' for a specific ``nugget'' of information hidden in the plethora of irrelevant ``rocks.'' The Internet has a similar problem, in this instance termed ``netmining.'' Users of the Internet are faced with the ever increasing problem of finding a desired file on some host, that is the ``nugget'' in the network ``mine.'' When faced with a ten percent growth rate every month, how can the user's job of mining the net can be made easier, or at least feasible?
Further problems are developing based on the evolution of the Internet organization and access. For example, the World Wide Web (WWW or web) is an information organization of internet sites based on a hypertext model, navigable by clients like NCSA Mosaic. WWW and its client navigators have made the net much more accessible, but also intensify the netmining problem by convoluting the organization of sites.
The many approaches being developed to aid netminers can be roughly divided into three categories: standardizing approaches to making the Internet intrinsically more accessible, navigation tools that make mining the net easier, and personalizing access routes to the Internet/WWW.
A number of groups are working on Internet standards, such as Standards for Universal Resource Identifiers (URIs) in the WWW [RFC-1630] and Standards for Interchange of USENET Messages [RFC-1036]. There are also other working groups concentrating on standards to allow better access to and broader distribution of Internet and WWW resources. Without such standards, the growth of the Internet and WWW will soon outstrip any user's ability to effectively navigate and absorb information simply because of incompatibilities.
Many Internet navigation tools have been developed over the years. Some of these include ARCHIE (to gather information on FTP sites), Veronica (to gather information on Gopher sites), and other resource tools such as WAIS (Wide Area Information Servers). These tools attempt to discover the pertinent resources available on the net and categorize them, thus serving as a ``network index'' for users. This type of approach is starting to be applied to WWW, for example GENVL[McBryan 1994] and Aliweb [Koster 1994].
Finally, there are approaches that allow the user to personalize their access to the Internet/WWW. With these tools, users can catalog and organize sites according to their preferences. Furthermore, when it seems to be useful, they can make those organizations available to others. Such tools include the Prospero file system [Neuman 1992b][Neuman 1992a] and the Alex file system [Cate 1992].
All three of the above approaches address important problems associated with better, easier to use, net access. However, in this paper, we address the third kind of approach, that of personalizing net access. Personalized access has the following aspects that make it a more likely candidate for further exploration.
The rest of this paper will accomplish the following tasks.
As mentioned above, one of the systems addressing the personalized access approach to the net is Prospero [Neuman 1992b][Neuman 1992a]. Neuman's thesis stated that existing approaches, based on a centralized server, were inappropriate for organizing access to an intrinsically distributed, large scale system. As such he developed the Virtual System Model, an approach that presented four principal features, listed below.
In the construction of a name space - a mapping of a personal name and a network address - Neuman describes personalized access. However, the goals of Prospero and the Virtual System Model are much more than just personalized access. Prospero provides tools for: directory filtering, formal models of naming and other aspects of a distributed system, authentication and authorization, processes and processors and a host of other features.
Our main goals are in the same vein as those provided by Prospero and the Virtual System Model. However, we wish to focus on providing personalized access in as simple a form as possible, thereby making the implementation less cumbersome. Therefore, we will simply consider the mappings between names and addresses and their organization. By concentrating solely on personalized access, we can provide more features that serve personalized access. In this way, our work can be viewed as specializing a part of the work being done by Prospero and other systems that currently organize distributed systems.
In our work, we will focus on the following goals.
PAINT (Personalized, Adaptable Internet Navigation Tool) is our first realization of the five goals delineated in Section 2.2.
We addressed Goal 4 first, namely how can we have the largest impact on the WWW community? Clearly the most usable, the most available tool presently for navigating the web is Mosaic. Furthermore, two other things are obvious based on a survey of web users provided by Pitkow [Pitkow and Recker 1994]. First, people are largely very satisfied with Mosaic as a tool. On a scale of 1(horrid)-9(excellent), 1081 respondents gave a mean rating of 8.086. Adding capability to a tool people seem most pleased with will increase chances of wide acceptance. Second, of 1079 respondents, 1006 (93.25%) were using a Unix version of Mosaic. Therefore our first implementation would be based on the latest Unix implementation of Mosaic (version 2.4).
Presently, Mosaic 2.4 does have a simple name-to-address mapping available in the hotlist. However, the hotlist provides simply a linear list of names gathered by saving the presently viewed page to the list. The user can then change the name by which the address is known.
Our approach was to replace the hotlist in Mosaic with the PAINT program. PAINT would be developed as a separate program, but interact through the hooks used by Mosaic's hotlist facility. This would minimize interactions between the two programs and allow for quicker development.
We then addressed Goals 1 and 2, namely developing a simple scheme for mapping name-to-address sites that also maintained some organization for those sites. The scheme we chose was a tree organization for the name-to-address map that would store arbitrary text descriptions of the node along with a name and an address. If a node in the tree was a leaf, that node represented a network address. If the node was not a leaf, it was considered an organizational node, or the parent of a set of leaves. As such, organizational nodes will never have associated addresses.
An example is shown in Figure 1. This figure is a text-file representation of an organization of WWW URL's. The first node in the file, the Root node, is the basis of the entire tree. All other nodes are labeled by four fields: address (for leaf nodes), name, parent node, text description (optional). The use of a text file allows the user (or another program) to easily manipulate the organization of the map. Furthermore, this text representation of the map could easily be shared between groups.
Meeting Goal 3, constructing a GUI that would provide easy interaction with such a map, was the next goal. The actual PAINT tool that manipulates the underlying text scheme of Figure 1 is shown in Figure 2.
The four consecutive panes in the top of PAINT represent a directory structure viewer. This viewer is the same type of viewer used in the NeXTStep©and Smalltalk environments. Each pane represents the nodes contained in a particular level of the tree. The nodes displayed in a pane are those nodes which are the children of the particular node (which is darkened) selected in the pane to its direct left. Furthermore, the selected node in each pane is displayed above the pane. In Figure 2, the leftmost panel displays the Root node, the pane to the right of it displays the children of the Root node, and the next pane to the right displays the children of the Web Resources node. If the selected node is a leaf node, then no children will be displayed.
Directly below the directory structure viewer are two text fields and another viewer pane. These display information about the presently selected node. Continuing with Figure 2, The NCSA Mosaic Home Page is the most recently selected node (the deepest in the tree). Associated with it are its URL (http://www.ncsa.uiuc.edu/SDG/Software/Mosaic/NCSAMosaicHome.html), a user-selected and modifiable name (NCSA Mosaic Home Page), and an arbitrary text description entered by the user. If the selected node is not a leaf node, then the address is not active and cannot be filled by the user, though the name and description can be modified. Any modifications made in PAINT are updated in the associated text file when the user exits the program. In accordance with Mosaic's hotlist, this file is called .mosaic-paint. If no such file exists, PAINT will read the current .mosaic-hotlist file and create a .mosaic-paint file which has all the hotlist entries as children of a default Root node.
In keeping with the theme of simplicity, there are only five commands available to the user, represented as buttons at the bottom of the PAINT tool. Add Node allows the user to create a new node, which is added as a child to the presently selected node. Nodes cannot be added to a leaf node (a node containing an address). Remove Node removes the presently selected node. Nodes with children cannot be removed - this is a safety feature in the tool. Send URL sends the URL to Mosaic so that Mosaic may access the network site. Quit dismisses PAINT.
Finally, the Move Node button allows the user to move a node, and its associated sub tree (if any) to other parts of the structure. The intent of Move Node is to make the presently selected node a child of some other part of the directory structure. Thus, when Move Node is selected, a list of all available parent nodes is presented to the user. The user selects a node from the list and the presently selected node becomes a child of that node. The directory structure viewer then redisplays the new structure.
PAINT was developed on a Sparc20 under Solaris, interacting with Mosaic 2.4. PAINT was developed as a separate program using iXBUILD, a Motif development package. It was then linked with Mosaic, to make the final product.
Work in the area of adaptive or intelligent interfaces has been a research focus in human-computer interaction area for some time. However, in recent years some of this work has become the focus of computer science researchers as well[Sullivan and Tyler 1991][Schneider-Hufschmidt et.al. 1993]. Defining adaptive interfaces is difficult, especially given the interdisciplinary nature of the field, but Schneider-Hufschmidt[Schneider-Hufschmidt et.al. 1993] note four general goals that adaptive interface researchers address: adapting a system being used by users with different requirements, adapting a system being used by a user with changing requirements, adapting a system to a user working in a changing environment, and adapting a system to a user working in different system environments.
For our purposes, we are focusing on the second approach: adapting an interface to a user with changing requirements. From the point of view of creating an adaptive interface for Web/Internet browsing, we see two general approaches.
We have considered both options for the PAINT tool but have focused mostly on the later, adapting the organization of the tree of internet/web nodes. In fact, our original plan was to work on the former approach (actually changing the interface itself), but there are a number of problems with this approach that make in more difficult. Some of these problems include controlling the rate of change such that the user sees some effect but is not swamped by a constantly changing interface and integrating user preferences into the interface changes.
Given that our focus, at least for the short term, is on changing the tree organization, we return to the questions of the kinds of modifications to impose on the tree and how the user triggers these changes. Our first cut at this, not yet implemented in the PAINT tool itself, is to use the frequency and sequence of visitation to determine the categories under which leaf nodes - representing actual internet/web sites - are placed. For example, an upper level tree node, representing the highest level of division of leaf nodes, could be created such that it separates often used, lower-level nodes from less used nodes. Furthermore, nodes that are often ``co-visited,'' that is nodes that are typically visited within a small period of web visitations, would be candidates for being placed in the same category.
This process, while simple, might suggest to the user some new organizations of the tree of internet/web nodes that would make relocating important sites easier. Suggesting is the key word, since we cannot allow the interface to usurp control from the user. Thus PAINT will not modify directly the user's organization, but propose a new organization and allow the user to select those parts, if any, s/he wishes to incorporate into the tree.
PAINT is a tool that allows a user to organize a personal view of the internet/web. Our first implementations of PAINT are incorporated into Unix-Mosaic as a replacement for the hotlist facility. We are presently working on expanding the capabilities of PAINT to include techniques of adaptive user interfaces, giving the user some outside support in analyzing how they use the net and making recommendations on how to do so more efficiently.
Karen A. Oostendorp
Karen Oostendorp is currently a graduate student at Michigan State University, working towards a Masters Degree in Computer Science. She received her B.S. in Computer Science from Calvin College, in Grand Rapids, Michigan. Her interests include the development of interfaces to assist in searching the Internet.
William F. Punch III
Bill Punch is presently an Assistant Professor in the Intelligent Systems Lab of Michigan State University. He received his M.S. and Ph.D. in Computer Science and his B.S. in Biochemistry from the Ohio State. His interests have been in the area of knowledge-based systems, especially as applied to ecology, such as designing waste-treatment systems and evaluating biodegradability of chemicals based on their structures. He also maintains a very active research group in genetic algorithms, working on both fundamental GA research (parallelization, representations) and application of GAs to design. Finally, he has been active in multimedia applications, especially tools to aid computer music composers and adaptive interfaces for network ``netminers.''
Richard W. Wiggins
Richard Wiggins manages the Central Systems Service group in the Computer Laboratory at Michigan State University. He coordinates the deployment of MSU's campus-wide information system (CWIS) using Gopher and World-Wide Web. He has been active in the Gopher and WWW communities since early 1992, and organized the first Gopher Workshop in August 1992. He began working with computer networks in 1979 as a consultant helping users of the Merit network. Previously, he has contributed to a book on the VM/CMS operating system and to Computer Language magazine. He moderates the Usenet News group compfosystems.announce.
Questions or comments should be directed to Bill Punch .