Find it, Plot it, Grab it: Distributing Climate Data Via the Web
Julia Collins
Roland Schweitzer
NOAA-CIRES Climate Diagnostics Center
Boulder Colorado 80309, USA
jac@cdc.noaa.gov
rhs@cdc.noaa.gov
Abstract
Scattered archive locations and the sheer volume of data
make it difficult for climate researchers to locate desired information.
The goal of the NOAA-CIRES Climate Diagnostics Center (CDC) data providers is to
make the process of locating, acquiring, and sharing climate data
one which enables and encourages researchers' individual and
collaborative activities. This poster describes our approach to
providing Web access to our climate data holdings. We describe
our use of the Web -- in combination with file metadata and
established visualization tools -- to make the process of finding,
evaluating, and retrieving climate data a straightforward and
user-friendly one.
Introduction
A vast amount of climate data exists at
repositories around the globe,
including the NOAA-CIRES Climate Diagnostics Center (CDC).
The scattered archive locations and
sheer volume of data make it
difficult for researchers to locate
desired information. This situation
is compounded by the fact that
many of these data sets are
stored in unfamiliar formats,
or are simply not well-described,
and therefore difficult to access and use.
Collaborative opportunities may be lost
when a scientist generates a specialized
data set, but does not have the means to
advertise this resource to the rest of
the research community.
As data providers at CDC, our goal is to make
the process of locating, acquiring, and sharing
climate data one which
enables and encourages
researchers' participation.
The Web allows us to address many
of the difficulties inherent in advertising
and distributing data to a worldwide
research community.
The nature of hypertext, combined
with the comfortable Web browser interface, allows
members of this research community -- most of
whom are not computer scientists by training -- to
easily navigate
our site from any location on the Internet
and retrieve information relevant to their work.
The efficient access to data and ability to
present current information in a timely
manner made possible
by the Web also allows
researchers to share information more readily and
strengthen their collaborative efforts.
This paper will describe our approach to
providing Web access to our climate data.
It builds upon and extends our
previous work regarding search
techniques (Collins and
Schweitzer, 1995) into the areas of data browsing and retrieval
mechanisms. In general, we will describe how
the use of the Web as a data distribution
platform allows us to...
- make it easy to find the information,
- make it easy to evaluate the data before acquiring it,
- and make it easy to retrieve
the desired data.
Our current Web site
is the result of a variety
of decisions made regarding our data storage
mechanisms, the tools used to manipulate
the data, and the CGI processes we
generated to integrate
our HTTP server, the stored data, and
data manipulation processes.
Data Storage
An important part of our data access scheme is the
consistent structure of our climate data sets.
Our formally maintained data sets are all stored in
netCDF,
a machine-independent "Common Data Form." The metadata in
these files conform to
the conventions
adopted by participants in
the Cooperative
Ocean-Atmosphere Research Data Service.
These metadata include information such as the names and units
of all of the data's physical coordinates,
geographic extents,
the time range covered by the data, and the minimum and
maximum data values. While most of the
data at CDC are maintained by software engineers in
a data management role,
local scientists frequently produce their own small
data sets which they wish to share with
collaborators. To encourage the use of netCDF in these efforts, we have
generated a Web-based
form and supporting
software tools
which allow researchers unfamiliar with the details of netCDF
to easily produce netCDF files.
This allows those of us in a data management role
to ensure that we have the potential to apply our standard
advertising, searching, visualization and retrieval mechanisms to these
customized data.
Data Manipulation
Many climate researchers are familiar with the
Grid
Analysis and Display System (GrADS), a toolkit for
manipulating and plotting scientific data. Given
the acceptance of the GrADS capabilities, and
recent enhancements to GrADS enabling it to
parse netCDF files, we decided to use it as a
mechanism for visualizing our data via the Web.
Since GrADS may be run in a batch mode, it
is well-suited to incorporation in a CGI
process.
Putting the Pieces Together in the Web Environment
Finding It
Our search implementation strategy for our data holdings
has been described in previous publications (e.g.,
Collins and
Schweitzer, 1995). We use a modified version of the
Harvest
Information Discovery System to index and search our data.
The search indices are constructed from the consistent
metadata inherent in the netCDF files (described above).
This allows us to construct a
search
interface which guides users in their choice
of keywords and helps to ensure the success of the search effort.
The results of the search are formatted to provide a direct
hyperlink to the data visualization process,
the netCDF file itself,
the metadata associated with the file(s) identified by the search,
and to a description of
the data set containing the individual data file(s).
The user can then follow the desired hyperlinks to evaluate,
and, if desired, retrieve the data. These hyperlinks are
illustrated in the Web page subset shown in Figure 1.
Plotting It
The data visualization Web interface takes advantage of
the netCDF metadata to build a series of Web documents
customized to the data being examined.
Figure 2 shows the Web page generated when the user
follows the Graphically browse file data
hyperlink shown in Figure 1.
The interface in Figure 2
is constructed dynamically by a CGI process which
populates the text fields and scrolling lists with
values extracted directly from the netCDF file.
The names and minimum and maximum values
for geographic extents, time ranges, and level or height
coordinates are presented by default in the Web form's
text fields, scrolling lists, and menus.
After modifying coordinates of interest,
the user submits the form to generate
a plot of the data (Figure 3). The CGI process in this case
executes GrADS with the appropriate data parameters. The
GrADS process generates a GIF image which is then included
in the resulting Web page.
This series of actions gives the user direct, real time access to
data they are interested in, and presents them with an
opportunity to examine the features of those data before
retrieving the file and storing it on their local disk.
Grabbing It
The user has two options for retrieving data file(s) via the
CDC Web interface: They may follow the search
results hyperlink directly to
the netCDF file (see Figure 1),
or after graphically browsing the data,
they may follow the FTP the data (in netCDF format)
used to generate this image
hyperlink at the bottom of the
visualization output (see Figure 3).
The first action -- that of following the search result
hyperlink directly to the netCDF file -- takes advantage of
Web browser FTP support to retrieve
the file. The second action -- following the hyperlink
associated with visualization output --
executes a CGI process to extract the desired range of data
from the netCDF file and create a new netCDF file. It then
again utilizes Web browser FTP support to enable the
user to retrieve the file
subset. Both situations shield the user from the
details regarding the FTP site, and the second
situation also transparently handles the subsetting of
the netCDF file for the user. The goal (and hopefully
result) of this sequence of actions is to make
the location and retrieval of climate data as easy as possible.
Summary
The Web interface was designed to be easily applied to
any data conforming to the netCDF conventions described
above. Originally developed using one particular data set,
the Perl scripts which support the
visualization interface have
been integrated into the
search results output for all of our formal data
holdings. Thus, any data which are
made available in our search indices may also be visually
browsed by the user. These CGI processes may also be
applied to any user-generated data
which are stored according to our netCDF conventions.
The World Wide Web is an integral part of our
efforts to give a large segment of
the climate research community information about
and access to
our data holdings. The combination of Web protocols
and well-defined data storage mechanisms allows us to give researchers
an opportunity to graphically and textually access data
regardless of the physical location of the user - they need only
have an Internet connection and Web browser to explore the CDC
data holdings.
References
- Hankin, S.C., cited 1997: The COARDS
netCDF profile. [Available on-line from
http://ferret.wrc.noaa.gov/noaa_coop/coop_cdf_profile.html.]
-
Collins, J.A., 1996:
Communicating Distributed Search Results to Distributed
Data Servers. Preprints, Twelfth International Conference on
Interactive Information and Processing Systems for
Meteorology, Oceanography, and Hydrology, Atlanta, GA, AMS, 397-400.
-
Collins, J.A. and R.H. Schweitzer, 1995:
Applying Metadata to the Search Interface:
Constructing Effective Local and Distributed Searches
of Web-Based Scientific Data.
Poster Proceedings, Fourth International World Wide
Web Conference: The Web
Revolution, Boston, MA, O'Reilly & Associates, Inc., 36-37.
- Collins, J.A., J.D. Scott, C.A. Smith, and M.A. Alexander, 1997:
Climate Data
Visualization on the Web: Implementation Details of the Climate Diagnostics Center
Web Atlas Interface. Preprints, 13th International Conference on Interactive
Information and Processing Systems for Meteorology, Oceanography, and Hydrology,
Long Beach, CA, AMS, 157-159.
- www@grads.iges.org, cited 1997: The Grid Analysis and Display System
[Available on-line from
http://grads.iges.org/grads/head.html.]
-
Rew, R.K., cited 1997: Unidata netCDF
[Available on-line from
http://www.unidata.ucar.edu/packages/netcdf/.]
Author correspondence should be directed to
Julia Collins at:
CIRES
Campus Box 449
University of Colorado
Boulder, Colorado, USA, 80309-0449
|
Voice: 303-492-0842
FAX: 303-492-2468
E-mail: jac@cdc.noaa.gov
|
Return to Top of Page
Return to Posters Index