Second International WWW Conference '94
Paul D. Boyer
SAIC provides innovative technical solutions to information challenges within Government. During the last 10 months, SAIC's Information Discovery and Distillation Systems Division has been tasked by various governmental agencies to:
To provide for the efficient flow of information in this environment, we integrated numerous public domain and COTS software products so that users can have information distilled based upon their needs and can also have a means of discovering information which can supplement their current sources.
Figure 1: Message Handling Home Page
WWW serves as the conduit for integrating Topic, News, and a Web server for information distillation and WAIS, Sybase, and the same Web server for information discovery. In the distributed, client/server environment which has been created for this customer, these technologies eliminate the costly loss of information and slow information flow within this area.
In order to integrate the aforementioned technologies in an environment with homogeneous servers but heterogeneous clients, custom gateways and flow-control scripts were created to allow for the management of real-time data flow and the presentation of information in a fashion which was consistent across all clients. Flow-control scripts have been written in C and UNIX shell and gateways have been created using Perl and HTML Forms (with associated C routines to manage form inputs). Through this integration, multiple technologies can be applied via WWW/Mosaic to service the broad needs in the Department of Defense.
Figure 2: Message Handling and Retrieval System Architecture
The first time a user starts up Mosaic, they are presented with a form that asks them what organization they wish to be their default. They simply select one of the organizations and press the submit button. This immediately takes them to the home page, which is dynamically generated to have hypertext links to the proper newsgroups and archives for their organization. From then on when the user starts up Mosaic, the home page appears with their default choice based upon their IP address. A hypertext link is provided which allows the user to change their default organization if they should move to another computer.
The user is given the ability to browse current messages by using the Mosaic software in conjunction with the NCSA httpd 1.3 web server and a custom news gateway that we have developed. In the beginning, we used the inherent news reading capabilities built into Mosaic. However, the users complained about having to step back and forth between the subject headers and the text in order to read the messages. What they wanted was the ability to have "Next" and "Previous" buttons embedded within each message. We felt it was very important to use the same interface for all message reading activity, so we rejected the idea of installing a dedicated news reader such as "xvnews" or "xrn." We wanted Mosaic to be the sole interface for all message browsing.
Our custom news gateway reads the .overview file for the particular newsgroup to create a dynamic set of HTML files that include "Next," "Previous," and "Backö buttons to allow more rapid browsing through all messages. The user is first presented with a subject list of messages that were posted since the previous evening. When the user selects the first message, a gateway program is used to generate the HTML that includes the next and previous links and a link back to the subject list. We have not developed a threaded news reading capability since our users do not get threaded messages.
Once messages are processed by the news daemon, they are sent to a custom archive program we call "dispatcher" that performs three tasks. First, it creates the archival symbolic links from the message file located in the office directory to an archive directory. These symbolic links allow us to store the message only once, but for many purposes. Then the dispatcher calls "sybase_send," which parses the fields from the message and inserts the message header fields into a Sybase database with one field being the path and filename of the archived message. Finally, dispatcher calls "append2day" which calls waisindex to add the message to the daily index so that current messages can be found by keyword.
The WAIS keyword indexes are built for each organization and are kept for 60 days. One challenge that had to be overcome was that the customer needed messages to be added to the index in real-time. Performance limitations and locked access did not allow us to simply add the new files to the 60 day archive. Instead, we created a separate, smaller index that just holds today's incoming messages. This smaller index is deleted nightly and the 60 day archive is rebuilt after removing the files older than 60 days. At that point a new daily index is created to hold the incoming messages.
To provide the users the ability to find specific messages, either current or archived, we created a custom gateway which allows fielded and free-text searching combined. Our gateway hides from the user whether the query is being posted to Sybase or WAIS. The result is that the user can enter a fielded query, say a date range, at the same time entering a free text query, such as some key words, and the results from both Sybase and WAIS are correlated and presented to the user in a unified result.
Considering the customer's previous message handling system, paper, the MHRS has been a resounding success. Not only can users get current data faster, but they now have a way to search vast stores of archived messages, enabling them to prepare reports with more depth. And it doesn't matter what kind of computer they use.
Government analysts need a database management system that they can configure and use without requiring system administration support. A recurring comment is "I can do more on my home PC than I can on these SPARCstations." Now that the analysts have ample processing capacity, they need to have the productivity tools that make their jobs easier. Perhaps at home they use a program like Claris's FileMaker Pro to keep address lists or recipes. Why can't they have an easy-to-use database like this at work to get things done? Perhaps they want to keep a record of the activities of certain entities. Or maybe they just want a record of people in their organization and what skills each of them have. These types of databases could all be set up now by using an RDBMS with the help of the database and system administrators. But for these simple tasks, couldn't the users set up and administer their own databases like they do at home? We think so. That is why we would like to describe our concept of a "Personal Data Server," which will be to databases what PCs are to mainframes.
Just like the commercial products available for the PC, the Personal Data Server (PDS) must be extremely simple to set up and administer, with a graphical user interface that makes data entry and data query as easy as possible. It should be nowhere near as complex as Sybase or Oracle to set up. There must be several interfaces to the database, ranging from simple commands typed in a UNIX shell to a custom X- Windows interface which displays tabular data to the use of existing tools which query the database and display maps, timelines, network diagrams, tables, graphs, and a whole host of other possibilities. If the PDS were an engine, it would be a lawnmower engine that could be used for a go-cart or a power generator or many other small tasks, but certainly not a locomotive engine or jet engine.
The PDS should also allow the data to be viewed by selected others on the network, allowing the user to share the information they have accumulated. Living up to the "S" in its name, the PDS will serve data not only to the user that maintains it but also to other designated members of the team or even the entire organization, whatever the user wants to do. Eventually, a network of Personal Data Servers could be set up so that the results of a whole team of analysts can be automatically merged together to create a daily "analysis newspaper" custom designed by each individual, complete with graphics, images, and text summaries relevant to the team's analytic area.
SAIC has spent several years developing a database browser tool, called Screenwork, that is simple to use and provides a great deal of functionality without requiring Sybase or other software licenses. Through our dealings with analysts we have continually been asked if there is a way that they can save the information they are browsing for later review or accumulate data over a long term for analysis or simply keep a list of relationships between entities of their choosing. Until now, there has not been.
The Personal Data Server had some guiding principles for its development. The PDS must:
Work on the preliminary version of the PDS is complete. Currently, only UNIX platforms are supported directly. The undercarriage of the PDS utilizes a set of Perl scripts called RDB [1]. These scripts provide filters, like "row" and "column," which serve to extract data from tab-separated database files. Our interfaces to these scripts provide an easy way for the users to manipulate their databases without having to learn the syntax of these scripts.
The PDS home page provides the user with the options of creating a new database, querying a database, modifying an existing database, and deleting a database. When creating a new database, the user can enter the name of the database, the number of fields, and the width of each field. The PDS then creates a form which allows them to enter information into the database based upon their design. Once information has been entered, the query form gives the user a list of all fields available in the database and the option of deselecting certain ones. The form also allows the user to specify the sort order of the result and the search match criteria for the data. Once the user presses the submit button, the database query is issued. Actually, the query consists of a number of RDB scripts strung together properly. For example, the "row" command which specifies the search matches is piped to the "column" command which cuts out the user's selected columns. This is in turn piped to the "sorttbl" command and then the "ptbl" command which formats the result nicely with headers and tab separations adjusted for the field widths.
The PDS is opening up new capabilities for the users. It promises to provide a simple way of setting up databases customized by the individual. By incorporating the PDS into the Web, users have a simple means of sharing their data with others in their team. Using a Web browser, the PDS will require very little training of new software.
In January 1994 we embarked on an effort to convert the entire user documentation and on-line help into HTML format suitable for use in Mosaic. The previous on-line help system was limited to hypertext only, no graphics or fonts such as can be found in HTML.
Figure 5: Context-Sensitive, Hypertext Help
The software already had a hypertext help system using a widget that provides the hypertext links. Every widget in the application had a help callback assigned to it which pointed to a page of hypertext. When we decided to move to using HTML and Mosaic for the hypertext help, we used the exact same callbacks, but instead of every widget calling a different page, we lumped most widgets together into a page for the entire window in the application. Thus, instead of getting help on the "Add" button, the user now gets help on the whole window, of which the "Add" button is a component. Also, if the user keeps Mosaic open and presses the help key again, the software signals Mosaic to point to the new page, making use of Mosaic's remote control feature.
Our users found that the new HTML-based hypertext help system provided them with a much better description of how to use the software. By using Mosaic, we made use of an interface that they were already accustomed to, eliminating the need for training.
Now that the WWW technologies have matured, the imagery server has been redesigned to allow HTML- capable browser to query the same databases, browse the same matching images, and select the specific images of interest for further examination. This allows any user, running on PC/MS-Windows or Macintosh platforms in addition to UNIX workstations, to perform the same analytic tasks that previously required custom-built software. This saved the Government thousands of dollars in costs associated with porting the imagery client software to the other platforms and maintaining the three separate versions of software.
Mr. Boyer is expert in the development of analytic software tools which assist users in the summarization, recognition, extraction, and exploitation of data. His responsibility is to help the national defense community in managing the discovery and distillation of information in electronic environments. He is expert in the use of the World Wide Web, WAIS, News, Tcl/Tk, and Perl and provides these solutions to his clients. He holds a BSEE from West Virginia Tech and an MSEE, specializing in Software Engineering, from The Johns Hopkins University.
pdboyer@c3i.saic.com