An exponential increase in the volume of data from planetary explorations has rendered traditional methods of data distribution inadequate. The Magellan mission to Venus, which produced more data than all previous missions combined, was the first to use CD-ROMs for data distribution to scientists, yet even that medium is impractical for distributing many copies of large datasets. Electronic delivery on demand is the most efficient method for distributing large datasets such as the 300-gigabyte Magellan full-resolution basic image data record (F-BIDR) dataset. The NASA Planetary Data System Geosciences Node at Washington University has established the Magellan Standard Products Catalog for locating and ordering Magellan data. The Catalog provides search capability, image display, product ordering and electronic delivery.
Electronic delivery is practical only if access is simple, stable, and fast. The Internet provides stability and reasonable speed, but simplicity has been difficult to achieve. To that end we are converting our present DBMS forms-based interface to HTML pages accessible on the Web. The advantages of this approach are that HTML provides an intuitive interface, is easy to develop and maintain, runs on multiple platforms, is free, and is compatible with but not dependent on our commercial DBMS.
In the early years of the exploration of the solar system, the collection of science data in digital format was scarce to nonexistent; the Apollo missions in the 1960s concentrated on lunar sample return, with the addition of several hundred photographs from a hand-held camera. Sets of prints were distributed to scientists for use in their research. Early unmanned missions collected digital data but sometimes discarded it after the photographic prints were made. By the time of the Viking mission to Mars in the late 1970s, the volume of data collected had increased considerably, and digital data collection was the norm, but the data were still converted to photographic prints which were the primary data source for most interested scientists. A small number of tape copies of the digital data were distributed to a few facilities, mostly government institutions, that had the computers and software to analyze the data. The digital data stored on those tapes are all but unreadable today, and documentation and ancillary data important for scientific analysis were often poorly preserved.
The Magellan Mission to Venus [Saunders et al., 1990], launched in 1989 and just now coming to an end in October 1994, collected over 300 gigabytes of digital data, more than all previous U.S. planetary missions combined. Magellan was the first mission to use digital rather than hardcopy media as the primary method of data distribution to scientists. Data products of greatest interest to the science community were published on CD-ROMs and sent directly from the vendor to scientists who requested to receive them. About 150 CD-ROM volumes of Magellan data have been published and distributed. CD-ROMs were chosen not only for the convenience of distribution but for their stability as a long-term archive and their relatively low cost compared to magnetic tape.
As an archival medium, CD-ROM and its companion, write-once compact disk, are superior to any other technology currently available; they are cheaper and more stable than tapes, and the hardware required to use them is available and affordable for a wide range of computer platforms. For data distribution purposes CD-ROMs still have drawbacks, however, cost being a significant one. Even though the replication cost per copy is low, about a dollar or two per disk, the total cost for publishing a complete dataset can still be prohibitive, depending on the cost to master a volume, the number of volumes in the dataset, and the number of copies to be made. In the case of a very large dataset, electronic transfer is a more desirable method of data distribution.
The Magellan Full Resolution Basic Image Data Record (F-BIDR) dataset is a case in point. The dataset consists of some six thousand radar images of the surface of Venus, one for each orbital track from the north pole to the south. A single F-BIDR product, consisting of the image and nineteen supporting files, can vary in size from 20 to 150 megabytes. The volume of the entire F-BIDR dataset is on the order of 300 gigabytes, which would be about 500 CD- ROMs. Special software is required to read F-BIDR images, and a good understanding of the radar instrument is required to analyze them. These are not data products intended for the casual observer. They are, however, crucial image data for scientists studying Venus. It was decided that the cost of distributing this dataset on CD-ROMs was too great, yet something had to be done to get the data to the scientists, as well as to preserve the data in a stable archive.
NASA's Planetary Data System (PDS) Geosciences Node at Washington University [Arvidson and Dueck, 1994] is currently involved in a project to transfer the entire F-BIDR dataset from its original 9-track tapes to write-once compact disks, and to make the data available for electronic transfer upon request. The project includes validation of the data files and creation of additional documentation and index files. Requests for electronic transfer can be made through the Magellan Standard Products Catalog, a database of information about Magellan products maintained by the Geosciences Node. A set of three CD jukeboxes, each with the capacity to hold 100 disks, helps automate the delivery process. A requested product is copied from its write-once CD in the jukebox to a staging area on hard disk, and the user is notified by electronic mail with instructions for transferring the file via FTP. The product remains on disk for a few days to give the user time to copy it. This system is now the primary means of delivery of Magellan F-BIDRs to the planetary science community.
Electronic transfer of such large and complex products is feasible only if access is simple, stable, and fast, from the user's point of view. We have been able to demonstrate the stability of our method by experiments with colleagues at Stanford University and at Rand Corporation. They have copied over two thousand F-BIDR products using FTP, without errors or other difficulties. Transfer time is about 30 minutes during the day for a 120-megabyte product, and roughly twice as fast at night. This speed is adequate, especially considering that alternative delivery methods (e.g., mailing a tape) would take days or weeks, not minutes.
Our current focus is on improving the ease of access to the F-BIDRs and other Magellan products, by improving the user interface to our catalog. Currently the database for the Magellan Standard Products Catalog is implemented in the commercial DBMS Sybase. The catalog interface is based on Sybase APT, a forms-based interface for X-window and text terminals. This software was chosen because it provided straightforward, flexible access to the database, and because it could be used on both X and regular terminals. A disadvantage of the APT package is that the resulting set of forms and procedures is large and complicated, making development and maintenance difficult. Another disadvantage is that the forms interface is not always intuitive to the user; it can be perceived as awkward and difficult to learn.
We are exploring the feasibility of using an HTML interface to the catalog, incorporating access to the underlying Sybase DBMS with GSQL, an http server gateway to SQL databases. The advantages of using HTML and a client package such as Mosaic are many. Development and maintenance is considerably simpler than for the APT package. The system is compatible with, but not locked into, our choice of commercial database management software. From the user's point of view, the interface is simple and intuitive, and the necessary software is free and readily available for various computer platforms. One disadvantage to building an HTML interface is that we would have to continue to maintain the existing APT system for some time in order to serve those users that do not yet have HTML client software.
The management of gigabyte- to terabyte-sized datasets will continue to be an issue for current and upcoming planetary missions. The Clementine mission to the Earth's Moon in early 1994 will result in about 150 CD-ROM volumes of standard data products. Mars Global Surveyor, the replacement for the failed Mars Observer mission, will generate at least as much data as Magellan did. This mission, along with other U.S. and Russian missions to Mars planned for the next several years, will generate increased interest in the planet and a greater demand for Mars data, old and new. Plans for an International Mars Data Base are underway to serve this anticipated need. The Magellan Standard Products Catalog and the electronic delivery system are expected to serve as models for future systems of data distribution for these projects.
Arvidson, R.E., and S. Dueck, The Planetary Data System, Remote Sensing Reviews, 9, 255-269, 1994.
Saunders, R.S., G.H. Pettengill, R.E. Arvidson, W.L. Sjogren, W.T.K. Johnson, and L. Pieri, The Magellan Venus radar mapping mission, J. Geophys. Res., 95, 8339-8355, 1990.
Susan Slavney received a B.S. in Computer Science from Washington University in 1984. She is a systems analyst/programmer at Washington University's Earth and Planetary Remote Sensing Laboratory. She was a key participant in the design of the Magellan Standard Products Catalog and is responsible for its implementation and maintenance. She is currently involved in several other data archiving projects, including a catalog of lunar data from the recent Clementine mission, archive product generation for the Laser Altimeter on the Mars Global Surveyor Mission, scheduled for launch in 1996, and archive volume design and data validation for the two German cameras on the Russian Mars 94/96 Mission, also scheduled for launch in 1996.
Thomas C. Stein received a B.A. in Earth and Planetary Sciences from Washington University in 1987. He is Computer Systems Coordinator at Washington University's Earth and Planetary Remote Sensing Laboratory where he is responsible for day to day computer operations. He was instrumental in development of the Laboratory's new Sun-based computer system. He was previously Computer Specialist for the Smithsonian Institution's Global Volcanism Program. He has headed several software projects, including the computer system for a Smithsonian exhibition. He has authored numerous technical documents and was co-editor of a book on the world's volcanic eruptions from 1975-85.