Climate Research and the Web: Integrating World Wide Web and Mosaic Capabilities into the Climate Diagnostics Center

Julia A. Collins
Professional Research Assistant
Cooperative Institute for Research in Environmental Sciences
University of Colorado
Boulder, Colorado 80309

The Climate Diagnostics Center (CDC) has historically processed and stored climatological data for its resident scientists as well as a relatively limited population of scientists outside the data center. Use of the World Wide Web (WWW) improves the way in which CDC advertises and distributes data to its users. This paper discusses the transition of CDC from a traditional data storage and distribution paradigm to a site that emphasizes the use of WWW capabilities.

CDC currently archives 17 sets of climatological data. Each of these datasets is composed of from one to approximately 240 physical files, resulting in over 1300 files that need to be described and available for distribution. By taking advantage of the hypertext interface supported by WWW browsers, our data presentation now allows scientists to evaluate the data only to the degree of detail that they require, from viewing the summary metadata to looking at the data themselves. We are currently in the process of automating our hypertext file management so that scientists are assured of finding all relevant links to their data of interest.

Our transition to a World Wide Web paradigm increased our data distribution options. WWW servers allow customized software to be executed when the user explores a particular link. Presently, we are implementing prototype custom software and hypertext forms which allow the user to electronically request data, even to the point of extracting a subset of the data.


1. INTRODUCTION

Historically, the Climate Diagnostics Center (CDC) has operated a "center of data" (i.e., a small NOAA data center) to process and store climatological data for their resident scientists. In addition, CDC has distributed these local data (or subsets of the data) to a relatively limited population of scientists outside the data center itself. This case-by-case distribution required a fairly large amount of human intervention. Data were advertised by word of mouth in addition to printed publications. Although File Transfer Protocol (FTP) allowed electronic transfer of data via the Internet, frequently data were copied to magnetic tape for delivery.

More recently, use of the World Wide Web (WWW) (Berners-Lee, 1994) has dramatically changed data advertisement, presentation and distribution at CDC. The World Wide Web is a system of distributed hypermedia, i.e., media which contains pointers to other (possibly different types of) media (Boutell, 1994). This paper will discuss the transition of CDC from a traditional data storage and distribution paradigm to a site that emphasizes the use of WWW capabilities. We will review the modifications we made to data description and advertisement, as well as the resulting changes in the way scientists request and receive data. Joining the WWW community involves both benefits and risks, and these will also be discussed.

2. CLIMATE DATA ON THE WEB

The World Wide Web employs a client-server architecture to efficiently retrieve data from sites around the world. In order to join the WWW community, CDC first established a HyperText Transfer Protocol (HTTP) server (Berners-Lee, 1994) and provided browser software such as Mosaic (NCSA, 1994) for CDC developers and scientists. Both client and server software is currently available in the public domain, and only an Internet connection is required to successfully explore the WWW (Chang, 1993; Hughes, 1994). Thus, participating in the Web was economically feasible. Once the CDC HTTP server was active, we could create a home page and serve it to the Web (see Uniform Resource Locator (URL) http://www.cdc.noaa.gov/). By setting up a WWW server and its accompanying home page, CDC was able to immediately reach any scientist who has software to browse the WWW. CDC could now advertise its data throughout the globe, not just to the small population of researchers who have personal connections to CDC. This data advertisement was realized on several different levels: a basic presentation of metadata stored externally to the file, the ability to browse metadata contained in the data file, and the ability to perform a keyword search on the metadata.

2.1 Basic Data Presentation

Although WWW browsers are capable of interpreting such various protocols as FTP, gopher and Wide Area Information Systems (WAIS) (Berners-Lee, 1994), WWW text documents are primarily constructed using HyperText Markup Language (HTML) (NCSA, 1994). HTML provides a simple means to construct hyperlinks which point to related media, in this case additional information about CDC data holdings. From our home page, we provide a high level interface to CDC data holdings, as illustrated in the excerpt below:
The CDC Data Directory
The CDC Data Directory consists of a collection of metadata which describe the CDC climatological data holdings. These metadata are in turn linked to the datasets themselves. The CDC data is archived in netCDF format. We have established some additional conventions for CDC netCDF files. The metadata may be searched by name, by a variable of interest, a statistic of interest, or by data level. Additionally, you may perform keyword searches on the metadata.

A quick reference of all of the CDC datasets is also available.

This WWW page provides hyperlinks to keywords which are familiar to scientists as entry points to our data - i.e., they are typically used for querying our database, We also include links to references about the data format. Effective use of the WWW browser interface requires the information to be distributed to be in HTML; thus, CDC converted the metadata for its data holdings from ASCII text to HTML-tagged text. This required designing the hypertext link architecture, as well as additional maintenance to insure that these links remain current. From the intial data interface, scientists can traverse links which provide increasingly detailed information about our data holdings. If a user follows the link to dataset name and then selects the Comprehensive Ocean-Atmosphere Data Set (COADS), detailed metadata is presented which includes the following:
Archive parameters: File names are composed of variable abbreviations and statistic:
(variable).(statistic).nc

"Observed" variables                    Name    Units           Precision
  Air temperature
        				air     deg C           0.01
  Sea level pressure
        				slp     mb              0.01
  Sea surface temperature
        				sst     deg C           0.01
  u-wind
       					uwnd    m/s             0.01
  v-wind
        				vwnd    m/s             0.01
  Scalar wind
        				wspd    m/s             0.01
  Cloudiness
        				cldc    okta            0.1
  Specific humidity
        				shum    g/kg            0.01

Derived variables
  Relative humidity
        				rhum    %               0.1
  Sensible heat parameter
   (sst - air) * wspd                   sflx    degC*m/s        0.1
  Latent heat parameter ((saturation
    shum at sst) - shum) * wspd         lflx    g/kg            0.01
  u-wind stress (wspd * uwnd)
        				ustr    m^2/s^2         0.1
  v-wind stress (wspd * vwnd)
        				vstr    m^2/s^2         0.1


Statistic                               Abbreviation

  Mean                                    mean
  Number of Observations         	  nobs
  Long Term Mean                   	  ltm
Should a scientist follow the link to e.g. Air temperature, he or she would be presented with a list of COADS files containing the air temperature variable. At this point, the user could directly browse the netCDF file metadata contents by exploring the hypertext link to the file. Because the Mosaic WWW browser has an interface to HDF/netCDF, no additional code is necessary to display this metadata, in a form similar to the following (edited for brevity):
Scientific Data Brows-o-rama
Datasets
There are 4 datasets and 4 global attributes in this file.

Available datasets:

[etc.]
Rather than rely on the Mosaic HDF interface, we could also store or generate this display of metadata using a local CGI script. Should the user decide to transfer the file to their local computer rather than browse its contents, the Mosaic Load to Local Disk option (or its counterpart in other WWW browsers) would be selected, allowing the user to transparently initiate an FTP of the file.

2.2 Search Capability

In many cases, a scientist may not want to manually traverse a series of hypertext links to reach their final data destination. The ability to search by keywords may be provided by integrating a search engine such as Wide Area Information Service (WAIS) with a HTTP server. CDC provides such a search capability on its WWW pages. The WAIS index is constructed from the HTML-tagged metadata files associated with each data file. Below is an example of a search on the string air temp.

air temp

Index cdc_data contains the following 9 items relevant to 'air temp'. The first figure for each entry is its relative score, the second the number of lines in the item.
  • 1000 103 data.doe.html
  • 1000 89 data.ssmi.html
  • 834 115 data.nmc.marine.html
  • 751 124 data.nmc.html
  • 703 122 data.monterey.marine.html
  • 561 147 data.coads.coads1.html
  • 478 177 data.coads.interim.html
  • 448 192 data.coads.coads1a.std.html
  • 448 192 data.coads.coads1a.enh.html
  • A list of matching files is returned. Each of the entries in the list is a hyperlink to the metadata matching the search term.

    2.3 Hypermedia Management

    The CDC data archives are dynamic: new data are regularly appended to existing data, and data sets are removed and added as researcher needs and data availability dictates. CDC currently archives 17 sets of climatological data. Each of these datasets is composed of from one to approximately 240 physical files, resulting in over 1300 files that need to be described and available for distribution. Manual generation and verification of HTML-tagged metadata and the hypertext links to the datasets rapidly becomes overwhelming and error prone. To ensure link reliability, software developers at CDC have written code which examines hypertext links contained in our WWW documents. To limit the scale of the effort, only local links are examined and verified.

    Since CDC data is archived in netCDF format, the file metadata is available, at its source, in electronic form. We are in the process of making use of the netCDF metadata to generate HTML-tagged metadata via software processes rather than manually.

    3. DATA DISTRIBUTION

    3.1 Raw Data

    Section 2.1 discusses the ability to transfer an entire file using WWW browser functionality. In many cases, scientists only need a subset of a given dataset, not the entire file. This subset can be in time, space, or both. Software such as CRDtools (Messenger and Mock, 1993) gives local scientists the ability to extract and save such data subsets. The CRDtools Extract application illustrates the type of customization offered to local users.

    O

    Remote users must request custom data from CDC support personnel, who then generate the data file and deliver it by anonymous ftp or via magnetic tape. The HTTP server software provides for execution of custom software when the user explores a link. This capability allows CDC to provide an electronic interface to custom data requests. Such automation removes the bottleneck of human intervention, but has some associated security risks. The ability to execute scripts via a hypertext link also gives client software the opportunity to use the script for unexpected - and destructive purposes (McCool, 1994). Keeping these risks in mind, CDC is exploring the use of HTML forms which provide the same functionality as the Extract application to fill specific data requests via the WWW. Additionally, CRDtools itself can be executed as the result of exploring a hypertext link.

    3.2 Data Products

    Although the focus at CDC is on acquiring, archiving and distributing raw data, we are also using the WWW to present some of our analysis products. For example, the CDC map room page (URL http://www.cdc.noaa.gov/~ldm4/text/cdc_maproom. html) illustrates some of the climate and weather product work in progress at CDC.
    CDC Map Room Weather Products

    CDC Map Room Weather Products

    The CDC map room page is constructed of small in- line graphics as shown above. These graphics are dynamically updated as the products are generated. Both the figure and descriptive text are linked to a larger version of the image. WWW browsers allow the client to specify how the larger image should be viewed when the link is explored. In the same fashion, CDC can also provide links to animations of climate products as we generate them.

    4. CONCLUSION

    Future plans at CDC include automating the generation of as many of the hypertext links to our metadata and data files as possible. We are also exploring the possibility of constructing some of the HTML forms of our metadata dynamically. Although the transition to a WWW-based data archive facility has involved fairly large start-up time investments in terms of generating and managing our hypermedia, the long-term payoff is an increase in the efficiency of our data presentation, the ability to allow researchers throughout the globe to explore our data, and a flexible means of data distribution.

    5. REFERENCES


    Author Biography

    As a Professional Research Assistant with the Cooperative Institute for Research in Environmental Sciences (CIRES) in Boulder, Colorado, Julia Collins coordinates the preparation and maintenance of the Climate Diagnostics Center World Wide Web documents, supervises the data management support staff and activities, and provides general software support for Climate Research Data tools (CRDtools), an in-house scientific visualization toolkit. Previously, she was employed at TRW, Inc. as a system engineer and project manager for the development of a proof-of-concept expert system. In addition to her computer science interests, she is also involved in sport psychology research.
    Corresponding author: Julia Collins (jcollins@cdc.noaa.gov)