This paper presents the results of a three week study conducted at Georgia Institute of Technology that captured client-side user events of NCSA's XMosaic. Specifically, the paper will first present a review of related hypertext browsing and searching literature and how it's related to the Web, followed by a description of the study's methodology. An analysis of user navigation patterns ensues. Lastly, a discussion and recommendations for document design are presented.
Intuitively, it would seem that browsing and searching are not mutually exclusive activities. In Bates's [Bates, 1989] work on berrypicking, a user's search strategy is constantly evolving through browsing. Users often move back and forth between strategies. Similarly, Bieber and Wan [Bieber & Wan, 1994] discuss the use of backtracking within a multi-windowed hypertext environment. They introduce the concept of "task-based backtracking," in which a user backtracks to compare information from different sources for the same task or to operate two tasks simultaneously. A similar technique, in a Web environment, would be backtracking to review previously retrieved pages.
All of these studies were performed on closed, single-author systems. The WWW however, is an open, collaborative and exceedingly dynamic hypermedia system. These previous findings provide the basis and structure for the describing the ways a user population behaves in a dynamic information ecology, like the WWW.
Given that we expect to find the same kinds of strategies used in the WWW, supporting both the browser and the searcher in designing WWW pages and servers is necessary, although difficult. Furthermore, supporting the kind of task switching described by Bates and Beiber and Wan adds another level of complexity because the work implies that a user should be able to switch strategies at any time.
It has long been recognized that methods for supporting directed searching are needed. As a response to this, certain WWW servers are completely searchable and there are World-Wide Web search engines available.
Supporting browsing, though, may be a more difficult task. Both Laurel [Laurel, 1991] and Bernstein approach the topic of how to assess and design hypertexts for the browsing user. Laurel considers interactivity to be the primary goal. She defines a continuum for interactivity along three variables: frequency (frequency of choices), range (number of possible choices) and significance (implication of choices). Laurel contends that users will pay the price "often enthusiastically -- in order to gain a kind of lifelikeness, including the possibility of surprise and delight." Bernstein takes a slightly different approach with his "volatile hypertexts" [Bernstein, 1991]. He argues that the value of hypertext lies in its ability to create serendipitous connections between unexpected ideas.
There is a tension between designing for a browser and designing for a searcher. The logical hierarchy of a file structure or a searchable database may work fine for a closed-task, goal oriented user. But a user looking for the unexpected element or a serendipitous connection may be frustrated by the precision required by these methods. The first step in balancing this problem is to determine what strategies are being used by the population. In order to do this, we collected log files of users interacting with the Web.
Equally important was infusing a meaningful representation into the data of user events. This allows not only a clear understanding of the extent and functionality of the interface, but also allowed for clear extraction of task specific data during analysis. Accordingly, we recorded events according to the User Interface Design Environment (UIDE) [Sukaviriya, et. al, 1993] guidelines for task representation. This permits all actions to be viewed on three levels: an Application Action (high-level task, e.g. Open File), an Interface Action (mid-level task, e.g. select item from pull-down menu), and an Interface Technique (low level action, e.g. Mouse Click). In the example below, a user clicked on a hyperlink in the document window that pointed to http://www.somehwhere/. The user is identified as participant number 123, and the event was generated from machine foo.gatech.edu on August 3rd, 1994 at 12:21:10 a.m.
Aug 3 00:21:10 foo.cc.gatech.edu uel: 775887872 123 1 Mouse Navigate Anchor:: http://www.somewhere/
The study was conducted for a three week period that commenced August 3, 1994. Participation was solicited through a consent window that informed users of the experimental procedures employed as well as of their rights as human subjects. The intent of the consent window was both informative and to minimize the "Big Brother" effect [Nielsen, 1993]. This window appeared the first time XMosaic was executed by each user during the sampling period. One hundred and seven users, or sixty-three percent, chose to participate in the study.
The selection of XMosaic was made for several reasons. According to some estimates at the time [Kostner, 1994], XMosaic accounted for roughly 53% of all WWW related accesses to HTTP servers. Furthermore, XMosaic was one of the only UNIX based GUI browsers available. Still, since the computing environment studied also included several other platforms that supported non-logging WWW browsers, certain portions of the computing population were not able to participate. Another confound of the experimental design exists in that it was possible for users to compute on multiple platforms during the sampling period, which may have resulted in the users running the specialized Sun OS version of XMosaic in tandem with other non-logging versions of WWW browsers.
Table 1. Occurrence of X Mosaic user events mapped to UIDE- like representation,where M = mouse click; K = keyboard entry (after Sukaviriya et. al., 1993) ---------------------------------------------------------------------------------------------------- Application Action Interface Instances Percentage Category Description of Action Technique of Action ---------------------------------------------------------------------------------------------------- Anchor M 16140 51.9 Navigate Selection of Hyperlink in Document Back M K 12633 40.6 Navigate Go Back One Document Open URL M K 707 2 Navigate Open File via a URL Hotlist - Go To M 636 2 Navigate Go to Document via Hotlist Forward M K 537 2 Navigate Go Forward One Document Open Local M K 221 .7 Navigate Open Local File Home Document M K 179 .5 Navigate Go to the Home Document Window History M K 39 .1 Navigate Go to Document via Window History ----------------------------------------------------------------------------------------------------
Since users will often leave XMosaic running for extended periods of time without interacting with it, determining session boundaries artificially was necessary. With the intent of identifying these boundaries, the time between each event for all events across users was calculated. The mean between each user interface event was 9.3 minutes. In order to determine session boundaries, all events that occurred over 25.5 minutes apart were delineated as a new session. This means that most statistically significant events occurred within 1-1/2 standard deviations (25.5 minutes) from the mean. Thus, a new log file was derived that indicated sessions for each user. Interestingly, a consistent third quartile was observed across all users, though we note no clear explanation for this effect.
Users averaged 9.4 sessions each, or approximately one session every other day. For subsequent analyses, navigational related events were extracted(2), which brought the total number of events to 31,134 representing 73% of all generated events.
Document requests were distinguished by protocol. Eighty percent of the document requests were of type http (i.e. requests for a document from a WWW servers). Four percent of these were generated by "cgi" scripts. Files accounted for 8%, followed by ftp and gopher both at 4%. All other accesses combined (including news, wais, telnet, etc.) totalled 4%.
Methods of Interaction
Hyperlinks were by far the preferred method of traversal, accounting for 52% of all document requests. Second, accounting for about 41%, was the "Back" command. Following in order of popularity were "Open URL," "Hotlist," "Forward," "Open Local," "Home Document" and "Window History" (see Table 1). This indicates that users typically did not know the location of documents a priori, or relied on other heuristics to navigate to a specific document. Furthermore, most users did not select items in the hotlist and window history. It seems that they either preferred using "Go_To" or did not know how to employ this interface technique.
While all menu items have corresponding keyboard equivalents only 4272 events were instantiated via the keyboard, though this may be due to the lack of display of keyboard equivalents next to menu items, as is done on Macintosh applications. Finally, 486 or 1% interrupts/asynchronous aborts (hitting the spinning globe) occurred during file transfer. This indicates that the population as a whole was insensitive to retrieval latency, although there may be a difference for users using modems or slower connections.
Within Site Navigation
Average successive document requests(3) within a single site across all users was 12.64. Outlier removal resulted in a mean of 10.31 (min=1, max=403) with a standard deviation of 28.56.
Popularity of Sites
The five most popular sites were:
1222 sites outside of Georgia Tech were accessed by College of Computing users. A modified version of the Pattern Detection Module (PDM) algorithm [Crow & Smith, 1991] identified the frequency of repeating sequences of site and document accesses. Specifically, the program tallied the number of occurrences of sequences of accesses, or paths. Paths of length two through fifty were computed. For example, suppose a user went from www.gatech.edu to www.ici.edu to www.ncsa.uiuc.edu a total of seven times throughout the study, the PDM would identify a path of length three (three sites) with a frequency of seven (repeated seven times). Stated differently, the length of a path is the number of successive document requests, which are to be viewed as user navigation.
Table 2.Characterization of sites based on frequency and path length relations.
The PDM analysis revealed long sequences of between-site access patterns on a per-session and a per-user basis. By "per-session" we refer to patterns within a session by a single user. Likewise, by "per-user" we refer to all sessions by a user, thus allowing for the identification of between-session patterns. For the per-session analysis, paths including seven different sites occurred with a frequency of five times. On a per-user basis, the PDM algorithm identified sequences of length eight with a frequency of nine. Furthermore, numerous shorter sequences were discovered with higher frequencies with a maximum frequency of seventeen [Pitkow and Recker, 1994b].
High Frequency Low Frequency ---------------------------------------------------------- Short home pages sporadic visits Path Length orientation pages dead ends meta indexes un-useful pages ------------------------------------------------------- Long source of refer- one shot resources Path Length efence sites, like directed searching NCSA or CERN -------------------------------------------------------Table 2.Characterization of sites based on frequency and path length relations.
In addition, an analysis of the length of paths within each site visited per user was performed. Figure 1 shows the average frequency per path length. This corresponds to the mean path of length x, for all x between 2 and 50. Exploratory data analysis revealed a slightly negative linear relationship between frequency and path length, with the slope across all users equalling -0.24. Thus,
Within Site Navigation
Overall, users tended to operate in one small area within a particular site. This structure resembles a spoke and hub structure due to the frequent use of backtracking. Backtracking occurs when a user issues the "Back" command to exit a server via the path used for entry. This "leave as you've entered" strategy was heavily used by all users. In contrast, the looping back strategy occurs when users return to the original point of entry after a path traversal by utilizing the history feature or by selecting a "Return to Home/Entry Page" link. Both navigation strategies can be visualized as a kind of spoke and hub structure. In the example below, the user orientated with http://www.cc.gatech.edu/people and http://www.cc.gatech.edu/people/People.Faculty.html as hubs.
Other Navigation Techniques
One supplemental navigation method often observed was use of home pages as indexes to interesting places. For instance, a typical session begins with the "College of Computing Home Page" followed by a traversal to a user's personal home page. Once there, jumps to other sites, or other parts of the local database ensue. While providing similar functionality to "Hotlist" commands, the use of personal home pages as indexes allows for better layout control and customization and therefore is a natural, yet crafty adaptation to an impaired interface.
What's worth Saving?
Surprisingly, only 2% of retrieved documents were either saved to file or printed. Futhermore, "Window History" and "Hotlist" based document accesses accounted for less than 3% of all accesses. The minimal use of such archival interface commands may be indicative one or more of the following: the quality of Web documents, the temporal nature of certain documents, the design of these archival interfaces, or reliance on other navigation techniques like personal home pages.
This also implies that there is minimal potential copyright infringements by this population. If material retrieved by users was printed or saved to disk, unauthorized local copies of information could potentially violate certain copyright restrictions, although legal precedence remains to be set.
Directions for Design
Since users accessed on average 10 pages per server, this would indicate that "must see" information must be accessible within two to three jumps of the initial home page (two/three navigations in, two/three out, performed three/two times). However, the placement of numerous links on one page can lead to increased search time by users to find relevant information as well as a cluttered screen layout. As such, information dense interface tactics that preserve screen space, such as using image maps, may be a more successful strategy for page design.
For rich information ecologies, the use of indexes throughout the document space supports hub and spoke observed usage patterns. Additionally, these pages help orient users, minimizing the "lost in hypertext" phenomenon. Since most users explored small regions at a time, this design recommendation can increase the exploration of clusters of related information.
Document designers need to be cognizant of the classification of expected visitors as serendipitous browser, general browser, or searcher. Granted, within a server collections of documents need to be targeted toward different users. Just the same, authors aware of the three classes of users can tailor documents to suit the intended use of the documents. When more than one class of visitor is expected, a separate document can be created for each class(4), thus providing customized, alternative views of the information. Note that this already occurs with the stratification of users based upon graphics-based and text users as well as forms and nonforms-compliant Web clients.
In designing for all strategies and behaviors, there exists a tension between "volatile hypertexts" and efficiency (between the browser and the searcher) in all of these recommendations. However, as Sproull and Kiesler [Sproull & Kiesler, 1993] found in their study of the uses of electronic mail, efficiency may not always be the appropriate metric for system evaluation. User satisfaction may provide a more accurate measure of the success of an interface.
In the future, servers may use the user classification to offer a "usual" view of a database. Additionally, servers could also offer a guided tour of a server based on the paths most travelled, or more excitingly, alter page design on the fly based on accesses by users.
Future Analysis
Recent studies that correlate reading time with document relevancy for USENET news articles suggest that a similar correlation may exist with Web information spaces as well. That is, we hypothesize that browsers spend less time on pages and within sites than searchers.
Users who access a large number of documents in a fixed period of time will have higher y-intercepts in their individual frequency to path length plots. These users may well be prime candidates for macro suggestion. Futhermore, it would be interesting to run a correlation analysis on the y-intercepts and the total number of sites visited.
Finally, a cost function for browsing can be developed based on analysis of expected value to the user of particular information and the expected time to retrieve that information.
[2] Berners-Lee, T., R. Cailliau, J.F. Groff and B. Pollermann. "`World-Wide Web: The Information Universe." Electronic Networking: Research, Applications and Policy. 1992.
[3] Bernstein, M., J.D. Bolter, M. Joyce and E. Mylonas (1991), "Architectures for Volatile Hypertext," Hypertext'91: Third ACM Conference. on Hypertext, ACM, 243-260.
[4] Bieber, Michael, and Jiangling Wan. "Backtracking in a Multiple-window Hypertext Environment." ACM European Conference on Hypermedia Technology, 1994.
pp. 158-166.
[5] Caramel, Erran, Stephen Crawford and Hsinchun Chen. "Browsing in Hypertext: A Cognitive Study." IEEE Transactions on Systems, Man, and Cybernetics, Vol. 22, No. 5, Sept.-Oct 1992. pp 865-883.
[6] Cove, J.F. and B.C. Walsh. "Online text retrieval via browsing," Information Processing and Management, Vol. 24, No. 1, 1988. pp. 31-37.
[7] Crow, D. and B. Smith, in eds. Beale, R. & Finley, J. Neural Networks and Pattern Recognition in Human Computer Interaction. 1992.
[8] Koster, M., 1994. Personal communication.
[9] Laurel, Brenda. Computers as Theatre. Reading, MA. Addison-Wesley Publishing Co., 1991.
[10] Lucarella, Dario. "A Model for Hypertext-Based Information Retrieval," Proceedings of the ECHT `90 European Conference on Hypertext. Cambridge University Press, 1990. pp. 81-94.
[11] Marchionini, G. "Information-seeking strategies of novices using a full-text electronic encyclopedia," Journal Amer. Soc. Inform. Sci., Vol. 40, No. 1, 1989. pp. 54-66.
[12] Mukherjea, Sougata, James D. Foley, Scott E. Hudson. "Interactive Clustering for Navigating in Hypermedia Systems." ACM European Conference on Hypermedia Technology, 1994. pp. 136-145.
[13] Nielsen, Jakob. Usability Engineering. Boston, MA. Academic Press, 1993.
[14] Pitkow, James E. and Margaret M. Recker. "Integrating Bottom-Up and Top-Down Analysis for Intelligent Hypertext." Conference on Intelligent Knowledge Management. Intelligent Hypertext Workshop, Dec. 12, 1994, National Institute of Standard Technology.
[15] Pitkow, J. and Recker, M. "Results from the First World-Wide Web Survey." Special issue of Journal of Computer Networks and ISDN systems, 1994 Vol. 27, no. 2.
[16] Pitkow, J. and Recker, M. "Using the Web as a Survey Tool: Results from the Second WWW User Survey." Unpublished work, 1994.
[17] Sproull, Lee and Sara Kiesler. Connections: New Ways of Working in the Networked Organization. Cambridge, MA: MIT Press, 1993.
[18] Sukaviriya, Piyawadee "Noi", James D. Foley and Todd Griffith, "A Second Generation User Interface Design Environment: The Model and the Runtime Architecture," Proceeding of the ACM INTERCHI `93 Conference on Human Factors in Computing Systems, 1993.
JAMES PITKOW received his B.A. in Computer Science Applications in Psychology from the University of Colorado Boulder in 1993. He is a Graphics, Visualization, & Usability graduate student in the College of Computing at Georgia Institute of Technology. His research interests include user modelling, adaptive interfaces, and usability.
Table 3. List of salient user interface events with number of occurrences and abrief description. Note that the number of occurrences above differ slightly fromTable 1. This discrepancy results from differences in tabulation methods. Specifically,in the above table, all subtasks related to the application action were included.For example, Table 3 reports 203 occurrences for Window History. This number includesall events from the Window History subwindow, e.g., Help. Mail To and Go_To. TheInterface technique is added to clarify certain event. For example, a menu itemexists for Add current to Hotlist as well as a button in the Hotlist subwindow.Table 3 reports the former. ----------------------------------------------------------------------------------------------------- Application Action Number of Occur Interface Application Description of Action rences Technique Category ----------------------------------------------------------------------------------------------------- Add Current 207 M Navigate Add Current File to Hotlist Anchor 16176 M Navigate Hyperlink in Document Annotate 44 M K Annotate Spawn Annotate Window Audio Annotate 4 M Annotate Spawn Audio Annotation Back 12632 M K I Navigate Navigate Back Binary Transfer Mode Off 62 M Options Load to Disk Binary Transfer Mode On 68 M Options Don't Load to Disk Clear Global History 6 M Options Clear Global History Clone Window 25 M K I File Clone the Window Close Window 481 M K I File Close the Window Delay Image Loading Off 9 M Options Delay Loading of Images Delay Image Loading On 12 M Options Don't Delay Loading of Images Exit Program 840 M File Exit Mosaic Fancy Selections Off 5 M Options Disable Fancy Selections Fancy Selections On 14 M Options Enable Fancy Selections Find In Current 235 M K File Search Current File Flush Image Cache 2 M Options Flush the Image Cache Forward 537 M K I Navigate Navigate Forward Home Document 179 M I Navigate Navigate to Home Document Hotlist 2336 M K Navigate Spawn Hotlist Interrupt 464 I File Abort Loading of File Load Images in Current 2 M Options Load Images in Current File Mail To 49 M K File Mail a File New Window 30 M K I File Open a New Window Open Local 487 M K File Open Local File Open URL 1753 M K I File Open File via a URL Print 350 M K File Print File Refresh Current 14 M K File Redrawn Current File Reload Configuration Files 3 M Options Reset Configuration File Reload Current 1507 M K I File Reload Current File Reload Images 14 M File Reload Current File's Images Save As 340 M I File Save Current File Source Document 631 M K File View Source Window History 203 M K Navigate Spawn Window History -----------------------------------------------------------------------------------------------------