K. Maly, J. French, A. Selman, E. Fox
Old Dominion University
University of Virginia
SUNY Buffalo
Virginia Tech
April 30, 1994
Kurt Maly
(804)683-4817
Fax: (804) 683-4900
Computer Science Department
Old Dominion University
Norfolk, VA 23529-0162
Internet: <maly@cs.odu.edu>
WATERS provides a distributed database of technical reports and similar documents that are produced by academic departments of computer science and by research laboratories.
The WATERS project aims to speed up and increase sharing of information in the computer science field and to encourage technology transfer. By providing browse and search capability, as well as online access to technical reports, anyone with access to the Internet can have immediate access to the latest research results in the discipline. This technology enables departments with technical report series to disseminate them while simultaneously lowering costs, increasing efficiency, and increasing access. They can provide abstract and bibliographic information, and if desired, the online reports. Now that the majority of new Ph.D.'s in computer science are no longer being employed at traditional research universities, this service can provide important access to young researchers who would be otherwise isolated from the global research community.
The current implementation assumes that a user has a client for the World Wide Web (WWW) such as Mosaic for UNIX workstations. With such a client, users can browse lists of contributing sites, and then browse through those sites' lists of technical reports. Users can also search the global bibliographic database of reports, and users can obtain abstract and bibliographic information for each report they select. Finally, users can request and obtain an online copy of a desired report for local viewing and printing. Assuming that the user's WWW client is set up to launch viewers such as ghostview for postscript files, xdvi for dvi files, and xtiff for TIFF page images, then reports can be read directly online.
Using the Wide Area Information Service (WAIS), users may search for reports using keywords. Every word that appears in the ASCII bibliographic files is a keyword. Thus, for example, users might search on subject names, author names, or even Computing Reviews categories.
WATERS is maintained via a central server that provides routing and index information. When a user at site A wishes to access a report at site B, the user communicates via WWW with the central server. The latter communicates identifying information to site A that can subsequently be used by site A to fetch documents directly from site B, without central server intervention. The central server never has to contact a contributing site; that communication will always occur directly between sites A and B.
Any computer science department or other group that publishes and maintains a series of technical report is eligible to become a contributing site. All that is necessary is to be willing to provide online access to technical reports, to install appropriate software tools, and to have a designated librarian use those tools to maintain a local technical report archive. The WATERS team provides the necessary software with a full and straightforward installation package.
We should stress that use of WATERS is simple and self-explanatory, and that maintenance of local databases of technical reports is equally self-explanatory and easy once the appropriate software tools have been installed.
The WATERS project originated as a result of a workshop on maintenance of department publications that took place at the 1992 Snowbird Conference for Computer Science Department Heads. Shortly before Snowbird '92 a simple request for information on electronic publications of technical reports generated an enormous amount of net traffic. It appeared that there were at least thirty different efforts going on, and at least that many departments expressed interest in electronic dissemination but were not involved at the time and did not know how to proceed. Also, it became clear that an increasing number of high quality reports were being produced at more and more research organizations. This, coupled with decreases in department operating budgets, make it increasingly difficult for departments to keep their faculty informed about research at other universities. Charging for reports would merely escalate costs across the research community. Thus, a system that would enable researchers to have access from their workstation to the technical reports that are available within their own department and elsewhere has obvious appeal.
Given the large interest, Kurt Maly and Alan Selman led a workshop at Snowbird '92 with the primary charge of developing a recommendation of what, if anything, the computing research community as a group should do. We quickly realized that having several different systems in use is not necessarily better than having no solution at all. In short, we decided to implement a solution with existing technology in which all computer science departments who are connected to the Internet could participate. WATERS is the result of that decision.
In the future we will seek further support of the WATERS project in order to realize the following objectives: recruit new contributors; provide maintenance of WATERS and provide assistance to users and contributors; improve the user interface and expand the platforms that are supported; incorporate multiple domains; improve search capabilities; improve performance by using distributed servers; tie searches to other information services; analyze social and educational consequences.
The current version of WATERS which combines WAIS and the World Wide Web was released early January 1994 with five contributing sites and 621 technical reports. The contributors are the original authors (ODU, UVA, VPI, and SUNY Buffalo) and NASA Langley, which worked closely with us in developing the system. Since then the system has run without any serious bugs being reported and only a few hours of downtime due to routing problems in the network connection in Virginia and server breakdowns at ODU and VPI. Between Jan. 16 and Jan. 21, 1994, 392 separate and distinct hosts have accessed WATERS although we had not announced it yet to the World Wide Web Community. We only sent recruiting letters to about 30 sites including the five sites working under a DARPA grant to provide electronic publication means for these sites. Our strategy has been to recruit contributors first, and then make it public to the Internet community.
It would be best if the readers of this report try out WATERS for themselves to see what the system can do for the research community: first, help them manage their local papers and second, make them accessible to a wide audience instantaneously. To do so, add the following URL:
http://www.cs.odu.edu/WATERS/WATERS-GS.html
to your hotlist and go to it. If we are right, everything should be self explanatory. If you do not have Mosaic installed:
Mosaic is available on the Macintosh, and WATERS can be used to access all the browse, search, abstract, and information pages. On-line retrieval of technical reports needs specialized uncompress software. A similar level of services is available through Lynx for terminal access and through Mosaic for Windows.
The WATERS project has progressed to the point of a working prototype system. Its basic architecture is adequate for initial use, but there is still work to be done to ensure that the system will scale with both number of documents in the system and number of users accessing the system.
Although we have the architecture in place, we have had to proceed slowly with expansion due to inadequate funding. We are now in a position to recruit many contributors, but we must have the resources to help them over any problems that occur with their installation. At the same time, we must evolve our system architecture to accommodate the increasing load. It will also be necessary to monitor the system users to react to their needs with respect to system functionality. Finally, there is the need to provide for the ongoing system maintenance, and hardware needs. These four categories are discussed in more detail below.
The current WATERS prototype is sufficiently robust for widespread deployment. We are ready to begin a fullscale effort to recruit more CS departments and other related departments as contributors to the system. We have a plan to help quickly develop a critical mass of users, but this is bound to be labor intensive. We have a list of sites already maintaining their technical reports online. We will examine those sites one by one to determine the steps necessary to incorporate their holdings into WATERS.
In the process we will develop procedures for bulk loading a site's backfile of technical reports. Eventually we expect to be able to reuse the procedures to convert sites with similar characteristics. It should be noted that it is not necessary for a site to load its backfile before it can participate. New sites begin immediately capturing their technical reports and deal with their older reports as time and desire permits.
The current WATERS architecture is adequate for the near term but its capacity and robustness must be increased as the number of documents and users grow. For all its simplicity, the current system is surprisingly robust, but the following are several general areas in which work must be done.
NSF is sponsoring a major program to develop a digital library over the network. We expect a number of results from these research efforts and will adapt WATERS to novel search mechanisms and information structuring as they become available. In particular, we are interested in expanding WATERS to be able to handle queries in multiple domains which may be served by different organizations with different methods of storing their papers and reports. We already include NASA Langley as one of the sites accessible through WATERS although it uses a different mechanism for storing and managing technical reports. The main concept under this task is to develop an engine which will take a search query and direct it to the appropriate domain server.
We have built WATERS from currently available off-the-shelf components readily available to users, an additional software component, techrep, to help manage local site holdings, and indexing server software in support of the WATERS search function. Maintenance as defined for this task consists of two separate components: a user support system and support of the software and hardware. The user support system will almost exclusively be there to help new and existing contributors with problems in managing their technical reports (start-up, backloading, crashes, help, etc.). A small part will be to help users of WATERS; we do not expect many problems in this area because the user only deals with Mosaic, a well understood system. The main problem for the software support group will be to manage releases and maintain constant up-time of the index servers as well as to resolve problems with contributor sites not being up.
We hope we will be able to recruit one hundred PhD granting departments to use WATERS to manage their technical reports (about 5,000 to 10,000) and make them available on the Internet. That implies we will have to make the system efficient enough to handle many simultaneous accesses and retrieval of compressed PostScript files. From our experience with NASA Langley it is clear that the WATERS approach allows for integration with similar efforts by other organizations and we will consider ourselves successful if we can get the National Laboratories to become members of this system. The true benefit to the community as a whole will only be evident once a critical mass of technical reports are present and when they are kept current because the main utilization of technical reports is within six months of their occurance.
A number of government agencies (including NSF) and universities have been pushing to get the most recent results not only into the graduate curriculum but also to help with undergraduate classes. Already, at Old Dominion University, graduate students use WATERS routinely to find thesis topics and we use WATERS in a number of project oriented courses in the class room. We expect to cultivate and promote this mode of teaching across the CS community.
Received the Dipl. Ing. degree from the Technical University of Vienna, Austria, and the M.S. and Ph.D. Degrees from the Courant Institute of Mathematical Sciences, New York University. Dr. Maly is a member of the IEEE Computer Society and the ACM.
He is Kaufman Professor and Chair of Computer Science at Old Dominion University. Before that, he was at the University of Minnesota, both as faculty member and Chair. He also is Visiting Professor at Chengdu University of Science and Technology, and is Honorary Professor at Hefei University of Technology. He was a member of the Board of the Microelectronic and Information Sciences Center. His research interests include modeling and simulation, very high-performance networks protocols, reliability, interactive multimedia remote instruction, Internet resource access, and software maintenance. His research has been supported by DARPA, NSF, NASA, CIT and the U.S. Navy.
Research Assistant Professor in the Department of Computer Science at the University of Virginia. He is also affiliated with the Institute for Parallel Computation there. His current research focus is information storage and retrieval in widely distributed information systems especially scientific databases systems. He is the general chair and North American program chair for the 7th International conference on Scientific and Statistical Database Management scheduled for the fall of 1994.
Professor and Chairman of the Department of Computer Science at SUNY at Buffalo. He received the Ph.D. at Penn. State in 1970 and has held prior positions at Florida State, Iowa State, and Northeastern Univ. His principal researh area is structural complexity theory. He founded the IEEE Computer Society Structure in Complexity Theory Conference and is a member of the editorial boards of JCSS and MST.
Assoc. Professor in the Department of Computer Science at Virginia Tech. His research interest is in the area of digital libraries and electronic publications of journals. In the latter area he is actively working with ACM and IEEE. He is currently funded by NSF in this area and is chair of SIG IR.