WebScout is a prototype application that creates a complete archive of Web pages accessed by the user and a rich record of the user navigation, including the user and system annotations of seen or previewed pages. In addition, WebScout enhances the Browser with (1) the natural language processing of text and image analysis, (2) indexing and search capabilities for text and images, and (3) a module for creating visual representations of structured data. Based on this fundamental layer of the WebScout we have built three features: LinkInspector, SessionNavigator, and HistoryExplorer.
The LinkInspector allows the user to pre-view the hyperlinked content within a zoomed out Browser placed next to the inspected link. The content of the pre-viewed page can be highlighted with respect to the recent search query or other active context. The SessionNavigator provides easy access to pages seen during a browsing session. It organizes the pages into WebTrails and provides a linear view of the navigation by transforming the navigation hierarchy into a sequence of ‘branches’. The HistoryExplorer provides a textual search over the contents and annotations of the pages and a color search over page thumbnails. It enables the user to browse the archive using graphical representations of past navigation patterns, e.g., WebTrails or visited Web sites.
Navigation history, personal Web archive, link preview, Web history search, navigation aid, Web browser
Capabilities of Web Browsers have evolved to accommodate increased users’ demand for accessibility to a variety of contents and services on the Web. They have also changed in accordance with our improved understanding of how users communicate via the Internet. We expect that the recent increase in affordability of local storage and high computer processing power will mark further development of the Browser. Similarly, we expect that the encouraging prospects for a wide availability of the fast Internet access will play an important role. With this in mind, we went on exploring ways how to address the following three problems:
Providing support for informed attention and context management while navigating the Web
Enabling an easy and reliable access to the pages the user has seen during a navigation session
Creating a full personal archive of the seen Web content, including the annotations and a rich record of user navigation, and providing effective access to the archived information.
We have built the WebScout, a prototype application that extends the capabilities of the Microsoft Internet Explorer (IE) with three features: LinkInspector, SessionNavigator, and HistoryExplorer. In the following sections we briefly describe the main features of the WebScout v.1.0 and refer to the related research work. Our future research will involve extensive evaluation of the underlying concepts and ideas.
An essential function of the WebScout is its mechanism for capturing and storing locally
Information on the user’s navigation
Archive of the Web pages seen or previewed by the user (via the WebScout LinkInspector feature)
Annotations on the viewed pages such as search queries or user assigned labels
Thumbnail images of archived pages.
In addition to the rich data storage, WebScout incorporates the natural language processing capability and a searchable index of both textual and image data. In the current implementation, the pages are archived independently from the standard IE Cache store. Since the IE Cache has the primary purpose of aiding the page loading, it is not suitable for a permanent archive without a significant redesign.
Figure 1: LinkInspector showing the content of a linked document, highlighted with respect to the query terms
Technical details of WebScout data archiving and search capabilities will be presented in future publications. In the following three sections we describe the basic concepts behind the three features built upon the foundation of WebScout: LinkInspector, SessionNavigator, and HistoryExplorer.
During Web browsing, users are continuously performing a risk assessment and deciding whether or not to follow a hyperlink on a page. In order to help the users with this task researchers have devised intelligent agents that proactively search the Web and inform the users of the relevance of immediate and further removed information (see [1], [2], [3]). We fully recognize the benefit of this approach. However, we also note that a significant gain can be achieved by enabling the user to preview the hyperlinked content without having the view of the current page obstructed.
We implemented LinkInspector, a Browser feature that presents a hyperlinked page in a zoomed out browser of a specified thumbnail size (see Figure 1), visible within the current page next to the inspected link. In this fashion the user can inspect the content and the format of the linked page and decide whether to commit to the full Browser view of the page. In the current implementation of the LinkInspector downloads the page on user demand (either when the user is hovering over the link with the mouse or clicking on a tool-tip icon that appears above the link).
Similarly to [3] and [4], the WebScout enhances the Browser with the natural language processing capability and thus enables the user to preview links on the search result page with highlights that correspond to the matches of query terms.
One dominant feature of Web navigation is the user’s ‘linear’ experience of navigation as determined by the time of access to a Web page.
Figure 2: SessionNavigator Toolbar showing the WebTrails (indicated by black and red arrows) and the graph view of the current WebTrail
However, in order to be effective the user has to keep a mental note of the hierarchical structure and access sequence of the Web pages. This mental overload is to large extent due to the type of navigation support provided in commercially available Browsers. In our attempt to address this issue we followed the strategy taken in previous works (see [5],[6]): we capture the user navigation events and use thumbnails as visual representation of pages. However, we also explore two novel ideas.
First, we partition the user’s navigation into logical units, referred to as WebTrails. Our hypothesis is that the navigation session can be automatically divided into sequences of page visits that form groupings meaningful to the users. In the current implementation, a WebTrail is a sequence of pages that begins with a user request for a page by specifying its URL (either explicitly by typing a URL or implicitly by activating a link from the Bookmarks). Alternatively, one can chose finer grain trails allowing each search query to mark the beginning of a trail. Each WebTrail is marked by the title of the initiating URL or a search query.
Second, we provide a linear view of navigation by ‘flattening’ the navigation hierarchy into a sequence of ‘branches’. We repeat a branching point whenever showing the new branch, thus enforcing the user’s linear experience of the navigation and providing easy access to pages that serve as ‘hubs’.
The basic view of the navigation is facilitated by the SessionNavigator Toolbar that shows a sequence of thumbnails in the order of page access with a clear demarcation of WebTrails. As the user navigates the Web the thumbnail images are appended to the current WebTrail. The user can also choose to view a graphical representation of individual WebTrails (see Figure 2).
One significant drawback of relying on the Web as a source of information is the transient nature of Web page contents. Thus the ability to store and access the true representation of the content seen by the user has many benefits. While an obvious one is the ability to re-examine information at later dates, the archive can also enable a more reliable user profiling than the navigation patterns (see [7]) or search topics alone (see [3],[4]).
In order to illustrate the possible uses of the WebScout archive we implemented a search facility that enables the user to filter by date, pose text queries over the content of the Web pages and stored search queries, and specify a predominant color on the page.
The color search is based on the analysis of thumbnail images while color query specification is facilitated by a predefined color palate. We also support browsing based on site organization or WebTrail organization of pages.
Our future work will include the evaluation of these search strategies for retrieving archived information.