New ways of using Web annotations

Laurent Denoue, Laurence Vignollet
Syscom, University of Savoie, FRANCE
{Laurent.Denoue, Laurence.Vignollet}
 

Introduction

When browsing the Web, users still rely on bookmarks to mark interesting pages. But bookmarks can be huge : their number actually grows linearly with time [1]. Web users report that bookmark organization is the third biggest problem they have when using the Internet [1]. Moreover, users have sometimes difficulties to find out why they created a bookmark for a page. They also have no information about the context of that page, e.g. the path that lead to it. There are three problems with current bookmarks : An annotation tool working with a Web browser may help to overcome these problems. Firstly, annotations are interesting to understand why a page has been bookmarked. By highlighting specific texts, the user can remember what was of interest in this page. Secondly, by keeping the parent URL of each highlighted document, the user is provided with a context of the bookmarked pages. Thirdly, preliminary results also show that the annotations can improve personalized document clustering.

Annotation Tools and Yawas

Web annotation systems are not new and their weaknesses have been presented in [7]. Most of them were preliminary designed so that everyone can leave comments on the Web. Their design thus focused mainly on providing a platform/browser independent solution and a remote server to store annotations. These design choices are not the same when designing a personal annotation system. When browsing the Internet, users should be able to quickly highlight texts, like they do on paper. This is in contrast to a remote server approach where delays can appear and where the system forces the browser to reload the entire document in order to display the new annotation. Furthermore, the server rises important privacy issues. We think that annotations should be stored locally, still allowing users to send them by email or publish them on their Web page like they do with bookmarks.

After a careful consideration of existing systems we implemented Yawas, a light annotation tool for the Web. For speed and privacy reasons, annotations are stored locally and Yawas uses the Document Object Model Level 2 and Dynamic HTML [3] to dynamically modify the document without reloading it. Although this approach is platform and browser dependent, new commercial annotation tools which use this technology have recently proved their superiority (see ThirdVoice and iMarkup).

Instead of bookmarking a document, the users of Yawas simply highlight interesting texts. At creation time, they may also add a comment and other information like the document topic or document type. Yawas then stores the document URL, title, parent URL, highlighted words and the extra information eventually provided by the user. Yawas has been used extensively by the authors, a team of 7 researchers and 18 master's students. Users understand why they kept a page and where it came from. They also reported that an annotation tool like Yawas should be included in Web browsers.

New ways of using Web annotations

We asked a human classifier to manually cluster a set of 350 documents highlighted by an other person. The classifier was asked to cluster the documents using either the original document, or only the highlighted texts. From 350 documents, 32 document URLs were no longer valid. Since these documents could not be retrieved, the classifier could not classify them, but was still able to classify them when using the highlighted texts. During the clustering, the classifier created cluster names. Interestingly, the number of clusters was statistically the same in both experiments (about 37), but the names created when considering only the highlighted texts were much more precise. The clustering produced by the human classifier was better when highlighted texts were used. It is not surprising since many Web pages are merely a list of resources where only a few lines are of interest. This experiment suggests that annotations can improve the automatic clustering of Web pages, and thus help users to classify their bookmarks in a personalized way.

Future work and related works

We are currently studying how annotations could help to classify a new document into an existing classification. It happens when a new document is highlighted and needs to be inserted in the current bookmark hierarchy. When a document is highlighted for the first time, the automatic classifier is given little information. We suggest to add the words already highlighted in the classified documents.

Research in automatic document summarization [6] showed the benefits of using query terms to produce user directed summaries in an information retrieval task. We used the highlighted texts to build user directed summaries. Similarly, we will measure users' speed and accuracy in classifying documents when they use the annotation biased summaries or more typical summaries like the title and first few sentences of the documents.

Annotations could also simplify history maps which tend to be very complicated [2]. Instead of displaying the whole navigational path to the user, history tools could display highlighted documents. Upon user's request, the complete path could be displayed. In its current implementation, Yawas does not store the whole navigational path and a local proxy could be used to overcome this limitation.

Recently, [5] introduced robust hyperlinks to deal with the problem of broken URLs. A signature consisting of 5 words is computed for a document and used to retrieve the document when its URL has changed. Highlighted words help users to understand the content of a page, but more investigation is required to see to which extend they could be used to compute robust hyperlinks. More research [4] also proved how annotations improve document retrieval through the automatic expansion of queries. Queries are automatically expended using previously highlighted words.

References

  1. Abrams, D (1998), Information Archiving with Bookmarks : Personal Web Space Construction and Organization, ACM SIGCHI 1998 Conference, Los Angeles, USA.
  2. Cockburn, A (1999), Issues of Page Representation and Organisation in Web Browser’s Revisitation Tools, OZCHI'99 Australian Conference on Human Computer Interaction, Wagga, Australia.
  3. World Wide Web Consortium (1999), Document Object Model Level 2 Specifications, http://www.w3.org/DOM/
  4. Golovchinsky, G (1998), Emphasis on the Relevant: Free-form Digital Ink as a Mechanism for Relevance Feedback, ACM SIGIR 1998, Melbourne, Australia.
  5. Phelps, T, Wilensky, R (2000), Robust Hyperlinks Cost Just Five Words Each, UC Berkeley Computer Science Technical Report UCB//CSD-00-1091, Berkeley, USA.
  6. Tombros, A, Sanderson, M (1998), Using and Evaluating User Directed Summaries to Improve Information Access, ACM SIGIR 1998 Conference, Melbourne, Australia.
  7. Vasudevan, V, Palmer, M (1999), On Web Annotations: Promises and Pitfalls of Current Web Infrastructure, 32nd Hawaii International Conference on Systems Sciences, Maui, Hawaii.