A set of tools using Tcl has been developed to assist in the maintenance and distribution of the collection by the widely distributed consortium members. These tools allow the members to coordinate their resource discovery efforts while providing them with complete control over their local presentations.
The individual listings were all based on the same dataset, but each person chose somewhat different presentation formats and structures. At NRAO, the resources were listed by Category, while at STScI they were listed by protocol, i.e., http, gopher, etc. Both these sites included the resource Description in their presentations, while at ESO/ST-ECF the Description was omitted.
The need for resource `validation' was identified very early. In the volatile world of the Internet, URL's can vanish or change as easily as they are created. Thus one of the first Astroweb tools was one which went through the resource listing and checked to see if each URL still worked.
It was obvious from the start that the merged resource listing would have to be put into a WAIS database. This would provide users with the ability to find resources independent of the local formatting and structuring. Since the database was at one site and the search tool was at another site, some tools were needed to bridge the gap.
The unified central resource listing combined with the different formatting needs of the individual sites required the creation of a tool which could provide a generalized reformatting capability.
This last ability is the equivalent of the yet-to-be-implemented POST method. It allows the user to send an entire file to a HTTP server via HTML forms. Since the merged resource listing can be represented in ASCII characters, it can be put into a HTML TEXTAREA region after escaping any "<", ">", and "&" characters. The server script can extract the TEXTAREA information, unescape the characters, and then save the information in a file. The method even works for UUENCODED binary files.
The tool uses Concurrent Version System (CVS) to store the merged resource listing. It allows several people to be simultaneously editing the merged listing and does not require a user to `checkout' or `lock' the file. Using CVS also has the advantage of tracking the version which the user fetched and prevents a user who fetched an earlier version from undoing the changes made by a user who fetched a later version. This version tracking ability would be difficult to implement if the information were stored in a conventional database.
The tool itself is implemented as a CGI script using Tcl. Tcl is an interpreted scripting language with a high level and easy to learn syntax and with extensive facilities for operating on strings, interacting with UNIX processes, and communicating with UNIX sockets. To eliminate the need to parse the merged resource listing file each time it was changed, the resource listing file was converted from the original HTML syntax to Tcl command syntax which automatically loads all the information into Tcl associative arrays. A simple Tcl procedure is used to create a HTML version from the Tcl version whenever the listing is changed.
The CGI script provides security by demanding a username and password and allowing access from only a specific set of sites. It also verifies that the required fields were populated and that only legal Categories were used.
The tool obtains the Tcl version of the merged database via Lynx and produces a list of URL's from the resources URL's and from any URL's contained in the resource Descriptions. WWW and Gopher URL's are tested using the Tcl `server_open' function. WAIS and FTP resources are tested using the `waissearch' and `ftp' clients. Earlier versions attempted to test TELNET resources, but the tool hung too often checking them to be really useful. Since TELNET resources are so expensive to set up, they are less likely to be taken down or moved than other types of resources.
The results of the last 20 tries are stored in a file along with the string returned by the most recent failure. This information is available in separate HTML reports on the "dead" and "unreliable" URL's as well as an "Inactive???" label in the listings sorted by protocol and by category.
This information has been invaluable in detecting changed URL's and inactive URL's. Any static list of URL's must be `validated' or else it will become populated with pointers to nowhere.
The tool obtains the Tcl version of the merged database via Lynx and produces a HTML version with each resources separated by a line of dashes. This HTML version is indexed by WAIS using the `-t dash' option. The user queries the WAIS index via a CGI script which uses `waissearch' directly on the WAIS files and removes the lines of dashes.
By naively indexing the entire HTML file, the user can search on the Category's which are hidden in HTML comments, or on portions of the URL, as well as only any text appearing in the resource Longtitle or Description.
The searchable index also provides rapid access to specific resources where the user knows part of the Longtitle or the Shorttitle. Without this tool, the user would be forced to search the individual HTML pages to find the desired resource.
The tool obtains the Tcl version of the merged database and the datafile saved by ValidateMaster and produces HTML files for resources in each protocol and in each Category. Other sites can modify their local copies of the tool to fit their own presentation structure and format.