Rendezvous

A WWW Synchronization System


Author:

Dragomir R. Radev
Computer Science Department
Columbia University
New York, NY 10027

Abstract:

With the proliferation of WWW-based documents and especially of those that are likely to be of interest to the users often, the concept of hotlists was created. By using hotlists, users are able to specify what WWW documents they are likely to revisit in the future. It is often a problem, however, that users are not aware that a certain document on their hotlist has been modified in the time since they last accessed it. Users need, therefore, to be able to receive update information on selected WWW documents in order to stay synchronized with the latest changes to them.

We are proposing a system which lets users control their favourite ``hotlists'' and notifies them when a particular WWW-accessible document has been modified. In that case, the users are given the possibility to view all or some of the modified documents. The system allows the users to set up parameters such as frequency of the synchonization, priorities of different documents, form of notification (e-mail based, or on demand), etc. The user interface of the system makes it easy for updates to be made.

The proposed system, Rendezvous, is written in Perl and works under various flavors of Unix. The system is flexible from the user's point of view. In the future, an extension to Rendezvous is being planned, which will allow WWW servers to send automated update notifications to sites subscribed to one or more of the WWW documents available through these servers.

Paper:

Problem

The fast growth of the Internet and of the World-Wide Web has created various problems for their users. A problem which we have tried to address is the one of large hotlists for use with World-Wide Web browsers.

A hotlist is a listing of URL's that have been selected by the user and which indicate WWW resources that they are likely to access again in the near future.

It is common, however, that hotlists grow so much that they get out of control. One of the obvious ways to solve this problem is the creation of user-defined hierarchical search systems which make browsing through the hotlist simpler and more intuitive. Such a system (SIMON) was created by Mark Johnson [6].

Another way of easing the user's task of browsing through hundreds of URL's is to notify him of any changes in the URL's from his hotlist.

Our goal was to design a tool which will allow automated notification of such changes according to a set-up made by the user. Such approach changes WWW documents into more dynamic sources of information.

Technical Characteristics

For the moment, Rendezvous is created to work in a Mosaic environment. That is, it needs to be able to access the Mosaic default hotlist.

Rendezvous is written in Larry Wall's Perl, version 4. The interpreted source can be executed on various UNIX platforms.

Rendezvous supports http: and ftp: URL's. For the moment, it cannot handle news: or gopher: URL's.

Implementation

The HTTP protocol allows for a client to obtain information from the HTTP server about the creation and modification dates of a given file that is available through HTTP. Such information can be obtained using the HEAD method of the HTTP protocol [3].

The user creates a list of URL's that he wishes to monitor using Rendezvous. Such a list can be created automatically for the Mosaic default hotlist, for example. The user also specifies how often he wishes to receive update notification. All such set-up information is maintained as a regular ASCII file and can thus be modified easily by the user.

The user specifies the e-mail address at which he wishes to receive the reports. An alternative to e-mail notification is the possibility of Rendezvous to automatically modify the default hotlist file and duplicate all entries the corresponding URL's of which have been modified recently. This allows for an easy visual presentation

A third alternative is the automated removal of such URL's from the global history file. This allows for the user to have an HTML page that contains his hotlist and which will show those URL's that have been recently modified using the same layout as these URL's that have never been visited.

System configuration

When executed, Rendezvous reads a system file from the user's home directory, called .rendezvous

The information contained in .rendezvous is as follows:


     .rendezvous            ::=     (< COMMAND > | < URL > ) *

     < COMMAND >            ::=     mail < ADDRESS >  |
                                    frequency < DAYS >  |
                                    history < FILENAME >  |
                                    hotlist < FILENAME >  |
                                    follow < FILENAME > 

     < ADDRESS >            ::=     < some valid e-mail address > 

     < URL >                ::=     < some valid URL > 

   
All other lines are ignored. Three of the commands described above (mail, history, and hotlist) correspond to the different notification methods. They are as follows: The remaining two commands are used to set up some system parameters: If the "mail" notification method is used, Rendezvous has to run in the background.

Future work

It is possible for HTTP clients to "subscribe" to certain commonly used URL's on remote machines. This subscription can be registered with the remote HTTP server which will send automated notification to all clients or servers that have subscribed. In the event of such modification,

It can also prove helpful to allow the user to specify variable notification frequencies for different URL's. For example, URL's containing newswire can be checked for modifications every hour, whereas URL's of less frequently-updated documents can be checked on a weekly or monthly basis.

Acknowledgments

I want to express my gratitude to Tim Berners-Lee and the developers of WWW and Mosaic at both CERN in Geneva and the NCSA in Urbana-Champaign for their great work. I would also want to show my respect to Larry Wall for the excellent PERL.

References

  1. Berners-Lee, T. , et. al. "The World Wide Web Initiative", in Proceedings of INET'93, Internet Society, 1993.

  2. Berners-Lee, T. "About NCSA Mosaic for the X Window System",

    < http://www.ncsa.uiuc.edu/SDG/Software/Mosaic/Docs/help-about.html> , 1994.

  3. Berners-Lee, T. "HTTP: A Protocol for Networked Information",

    < http://info.cern.ch/hypertext/WWW/MarkUp/MarkUp.html> , 1993.

  4. Borenstein N., and N. Freed, "MIME (Multipurpose Internet Mail Extensions) Part One: Mechanisms for Specifying and Describing the Format of Internet Message Bodies", RFC 1521, Bellcore, Innosoft, September 1993.

  5. Crocker, D., "Standard for the Format of ARPA Internet Text Messages", STD 11, RFC 822, UDEL, August 1982.

  6. Johnson, Mark. "Simon, an experimental system for resource discovery and navigation on the internet", Queen Mary and Westfield College, University of London,

    < http://www.elec.qmw.ac.uk/simon/all-about-simon.html> , 1993.

  7. Moore, K., "MIME (Multipurpose Internet Mail Extensions) Part Two: Message Header Extensions for Non-ASCII Text", RFC 1522, University of Tennessee, September 1993.

  8. Reynolds, J., and J. Postel, "Assigned Numbers", STD 2, RFC 1340, USC/Information Sciences Institute, July 1992.

Biography:

Dragomir R. Radev is born in Bulgaria. He received his undergraduate education in Computer Science with a concentration in Linguistics at Sofia Technical University in Bulgaria and the University of Maine. He is currently a Ph.D. student at the department of Computer Science at Columbia University. His research interests are in Natural Language Generation, Computational Morphology, and Machine Translation. In his spare time he likes to be an Internet hacker. He was a participant at the ACM International Collegiate Programming Contest Finals in Indianapolis in 1993. He has studied (to a variable extent) 8 human languages.

Contact address

radev@cs.columbia.edu