NOAA on the Web -- The Experiences and Future of one Nation-wide User

Ernest Daddio and William Turnbull National Oceanic and Atmospheric Administration Environmental Information Services Abstract
NOAA is one of the pioneers in the use of MOSAIC on the Web. Explorations of the MOSAIC power have required our designers to cast off their usual constraints. In particular, hypertext permits multiple solutions to the problems of information dissemination. No longer must the designer follow a linear path through a problem set. A simple example is our organization. In the past we have been required to describe ourselves by hierarchical organization, by geography, or by discipline. Now all three are available and the user can choose the preferred route. For those with access to the internet, NOAA data are now more available than ever before and knowledge of the existence of our data is much more widespread. We have encountered a number of issues, many of which remain open. How do we maintain consistency across information systems? Should we employ top-down control or entrepreneurial empowerment? How does this technology fit with a history of charging for data? Several future needs have also been identified ranging from management tools for maintaining a distributed system of Web pages to support for parallel searches across multiple servers, to direct database support, to enhanced viewers, particularly for GIS explorations.

THE ROLE OF INFORMATION MANAGEMENT AND DISTRIBUTION IN NOAA'S MISSION

NOAA has mission responsibility to predict the weather, chart the seas, assess natural and man-induced climate change, manage U.S. fisheries, and perform environmental research to advance capabilities in these areas. In carrying out its mission, the agency operates a variety of observing systems including environmental satellites, doppler weather radar, ground-based weather sensors, an ocean-going fleet, ocean-based instruments, etc.; telecommunications facilities have been established to support the acquisition of data from these observing systems for operational use. The agency operates three National Data Centers (National Climatic Data Center, National Oceanographic Data Center, and National Geophysical Data Center) charged with maintaining the Nation's climate, ocean, and other earth science records and distributing environmental information in support of commerce, transportation, construction, education, research, etc. In addition to the formally constituted National Data Centers, NOAA operates more than 30 additional archive facilities (See Fig. 1) with total archive volumes ranging from approximately 100 megabytes to more than 10 terabytes. The total NOAA data archive currently exceeds 220 terabytes and is growing at an approximate rate of 30 terabytes per year; with the introduction of new observing systems, the rate of growth of the total archive is expected to exceed 80 terabytes annually by the end of the decade.

[Map of U.S. with NOAA Data Centers and Centers of Data]

 
Ann Arbor, MI             Camp Spring, MD            Monterey, CA
Grt Lakes Env Res Lab     Climate Analysis Ctr       Ocean Apps Br
                                                     Nat Meter Ctr 
                                                     Ocean Prods Ctr

Charleston, SC            Woods Hole, MA             Norman, OK
Coastal Data              NE Fisheries Sci Ctr       Radar Data Arc Svc Ctr
 
Asheville, NC             LaJolla, CA                Seattle, WA
Nat Climatic Data Ctr     SW Fisheries Science Ctr   Alaska Fish Sci Ctr
                                                     Equa Pacif Info Ctr
Boulder, CO               Miami, FL                  PacMar Env Res
Clim Mon & Diag Lab       Atl Ocean & Met Lab        NW Fish Sci Ctr
Clim Diag Center         
Nat Geophys Data Ctr      Silver Spring, MD
Nat Snow and Ice Data Ctr Aeronautical Charting Div
                          Coastal Zone Mgmt
Suitland, MD              Geosciences Lab
Satellite Act Archive     Hydrological Info Ctr
Joint Ice Ctr             Nat Geodetic Data
                          Part Deposition Air Res Lab
                          Photogrammetry
                          Strat Envirn Assessment
                          NOAA Central Library
                          Nat Ocean Data Ctr
                          Nat Tide & Water Lev Data Base  
 
   Fig. 1. Locations of NOAA's information centers.

Among the information products routinely generated and disseminated by NOAA are: daily weather forecasts, hurricane and tornado warnings, aviation winds, fisheries stock assessments, satellite imagery of the earth, tide predictions, navigation charts, X-ray emission from the sun, and scholarly reports of scientific findings. NOAA's major operating units are the National Weather Service, National Ocean Service, National Marine Fisheries Service, Oceanic and Atmospheric Research, and National Environmental Satellite, Data, and Information Service. Elements of NOAA's operating units are found across the U.S. in all 50 states and territories.

Characteristics of NOAA Digital Data

NOAA digital data span the spectrum of data types and formats. By far the largest volume is in the form of raster images of the earth obtained by NOAA satellites over the past decade and a half. This may be in the form of time histories in several spectral bands taken by the Geostationary Operational Environmental Satellites (GOES) from one point in space, or multispectral imagery in the form of north-south swaths under the path of NOAA's polar- orbiting satellites. The new suite of doppler weather radar currently being deployed by NOAA is a new source of high volume data that will soon eclipse the operational weather satellites in shear volume. NOAA data include graphical products of contours depicting the two-dimensional distribution of sea surface and deep layer temperature across the Pacific Ocean; vertical profiles of point measurements at various levels in the atmosphere or in the ocean; horizontal profiles such as ocean chemistry obtained by ships along a navigation track line; and station histories of weather observations at Weather Service offices. NOAA data even include solar activity observations and paleoclimatic measurements from glacial ice cores.

These data are stored on a variety of media and in a variety of formats. Similarly, they are distributed in a variety of media and formats depending upon the type and volume of data and the preferences of the NOAA clientele and operating unit. Over the years, one of the great challenges to the organization has been to provide the data to the user community in a uniform way (particularly when the user requires multiple data types) so that the user can easily read the data into his own system for processing.

The total number of unique NOAA data sets extends into the several thousands, depending upon how one chooses to group the data. One of the shortcomings of these data sets (as is the case in most organizations) has been the lack of adequate documentation to allow the user to assess the value or accuracy of the data for his particular application. Still further, comprehensive inventories have been lacking that would allow the user to determine if particular data of interest covered the time periods or spatial domain of interest. An attempt to correct this shortcoming in the agency has been ongoing in the form of the NOAA Directory Services development.

Typically, the volume of data to satisfy a particular user request is on the order of several megabytes. It is not unusual however, to satisfy a request requiring only several tens of bytes as, for example, in the case of law firms requiring a temperature measurement at a particular location and a particular time for litigation purposes. Alternatively, requests for data may be in the range of several hundred gigabytes as, for example, researchers requiring satellite imagery of a region for a year.

Impacts of On-line Data Availability

Traditionally, NOAA's information products have been disseminated as hardcopy or on digital media such as magnetic tapes; this continues to be an important mechanism but is rapidly being replaced by on-line systems connected to the Internet. Other information products particularly those relating to safeguarding life and property (such as severe weather warnings) and those supporting operational data acquisition continue to be transmitted via closed networks dedicated to supporting these activities and operated by the agency line offices.

NOAA has been and continues to be required by federal regulation to charge its users nominal fees to cover cost of reproducing data whether that be analog data in the form of reports or pictures or digital data as in the case of magnetic tapes. These charges, in part, help to offset the cost of maintaining the archives and reduce the cost to the taxpayer for continued archive maintenance. With the expansion of free on-line data availability, the agency is having to reassess cost recovery to defray the cost of providing information services to the Nation.

Increasingly, NOAA has utilized the great reach of the Internet to distribute its information products to its client community. Initially this was through FTP services. With the advent of Internet tools such as Gopher and WAIS, a much greater reliance is being placed on on-line electronic information dissemination. The introduction of MOSAIC (currently numbering more than 30 servers across the agency) has brought about an explosion at the NOAA grass roots level of information services being provided on-line. This effort has been responsible for the cutting-edge exploitation across the agency of MOSAIC capabilities to the point that NOAA elements now utilize MOSAIC to dynamically access, create, and display information products. Further, MOSAIC begins to address a fundamental need of the agency: to link together the various information systems into a seamless agency-wide distributed information system.

Trends and Characteristics of On-line Data Access

The introduction of the Web tools into NOAA's on-line information systems has led to dramatic increases in the number of accesses and the types of individuals accessing NOAA data. A striking example of this is access to the NOAA Directory Services server. Prior to November 1993, the only network tools available to the user were character-based telnet applications; typically up to this point the number of Directory accesses was in the range of several hundreds per month (Fig. 2). Following the introduction of Gopher and WAIS in November 1993, that number immediately went up fourfold from under 1000 to greater than 3500 accesses per month. The introduction of MOSAIC in early 1994 has resulted in the number of accesses exceeding 18,000 per month.


[Graph of NOAA Directory Usage]
Fig. 2. Trends in NOAA Directory accesses. Introduction of each new Web tool has resulted in quantum jump in user access.

Can this increase be explained in terms of people "surfing" the Internet and stumbling on NOAA servers? We don't think so. For one, the trend for nearly a year has been a rapid increase with the introduction of each new Internet tool followed by either a leveling off at a much higher level than had previously been experienced or by a continued increase over time in the number of accesses.

The trend in aggregate information services requests at NOAA's three National Data Centers and NOAA's Directory Services has shown similar remarkable growth. Our projections to the year 2000 were initially based upon a steady rise observed since 1990. With the introduction of Web tools in 1994, the year 2000 projection was realized in 1994 -- a fivefold increase in one year.


[Graph of NOAA User Requests]
Fig. 3. Explosive growth in NOAA information services requests in 1994. Estimates to the year 2000 were initially based on observed usage growth prior to Web tools implementation. Jump in 1994 is observed growth resulting from introduction of Web tools.

MANAGEMENT APPROACH TO ON-LINE INFORMATION SERVICES DEVELOPMENT

NOAA's implementation of Web tools to date has followed a path that reflects the diversity and decentralized nature of the agency. It has been a grass roots effort rather than one being mandated by management. We believe that this is the key to the great success we have experienced thus far. The approach has led to a rapid deployment of the technology by fostering an environment of entrepreneurship and creativity. To echo a major theme of the Clinton Administration's National Performance Review, we believe that this management approach has succeeded because it "empowers" developers at the working level by removing the burden of bureaucratic controls.

There have been numerous successes in this process. One of our earliest adopters within NOAA, the National Climatic Data Center, Global Climate Perspectives System, was nominated in May 1994 for a "Best of the Web" international award for outstanding technical merit. NCDC developed an innovative form which permits the user to examine climate data and derive his/her own product. The user selects an area and time interval then produces a contour of mean temperatures. The Pacific Marine Environmental Laboratory (PMEL) has recently implemented similar capabilities with their near real-time data from the equatorial Pacific Ocean and also with long term climatologies developed by the National Oceanographic Data Center. The Space Environment Laboratory includes recent images of the sun along with a plot of near-earth x ray activity updated at 5-minute intervals. One author was able to demonstrate the solar eclipse of last spring to an awe struck group of federal managers in the Department of Commerce as it occurred.

Another success is shown in the continued increase of the directory statistics above. What is not evident from the statistics, however, is that the "Directory" has now been transformed into the " Lynx interface is provided. Until the Catalog was implemented, users were limited to searching only NOAA's directory on only one machine. The catalog adds two powerful features. First the forms provide a front end which can be mapped to an arbitrary number of different metadata databases. Currently the Data Interchange Format (DIF) and the single purpose NEDRES formats are supported. The MARC format (in wide use by the library community) and the Standard for Spatial Metadata (entering use in the GIS world) are being added to the list of supported metadata formats. The system converts the various metadata formats to HTML and a default with links to the source data, if known. The user may also choose the native format on one of several standard metadata formats for the display, if desired. The second key capability is provided by WAIS. This, of course, is the ability to search distributed text data. No longer is it necessary to have all descriptions located on a single server; now we are able to search databases distributed across NOAA, across the Federal Government, or across the Internet.

One final success. One does not necessarily find all of the excellent graphics of NOAA satellite images on NOAA servers that are available from other sources such as at the University of Illinois, Purdue University, and Michigan State University. We have resisted duplicating work done by others to focus our resources on activities which are not available elsewhere. This is one of the clear advantages of the Web; it allows information services to easily import and synthesize other's information products into value-added new products.

But we also recognize that our current development approach which minimizes bureaucratic controls is not without attendant risk and may need to be augmented as the technology matures and as we gain experience. Two problems posed by this entrepreneurship have been identified.

First, the user cannot count on any specific information from all NOAA servers. Several organizations include excellent staff directories, phone books and e-mail information. Many do not. One provides a useful map of its facilities. A visitor to PMEL, for example, need only consult the PMEL home page to discover exactly where PMEL is located, and how to reach it by car or by public transportation. Some organizations provide data via FTP or direct download, some provide on-line ordering, and some provide no data on-line. Some organizations point to other locations in NOAA, and some do not. Clearly, in the near future, NOAA will need to provide the user with greater consistency. A very desirable enhancement would be to provide a common "look and feel" across all of NOAA's information servers. The user has a right to be able to count on each organization to provide a minimum set of information and a predictable means of accessing that information.

Second, with no required review, neither peer review nor management review, there is no process for verifying information presented. Whether its introduction is intentional or otherwise, incorrect information could be placed on many of NOAA's servers by a knowledgeable local user. Thankfully, there are no examples at this time, however, this does not lessen our responsibility to ensure the value and reliability of information presented by NOAA on the Web.

The challenge faced by NOAA , therefore, is to strike a balance between encouraging the entrepreneurship as represented by an unencumbering management approach and the natural inclination of most organizations to attempt to protect themselves through extensive management control. We feel that an approach that overemphasizes the latter will inevitably lead to a substantially reduced development pace.

DESIRABLE CHARACTERISTICS OF A WEB-BASED NOAA INFORMATION SYSTEM

NOAA's information servers must address a broad range of users and their information needs. The user will range from novice to experienced researcher; the information system must therefore be capable of delivering information in a common user interface aimed at various levels of technical understanding. It must address the needs of the casual, curious "surfer" of the Internet, while at the same time, service the individual or organization that regularly relies on environmental information for critical decision making or for value-added information product generation. For example, a number of environmental consultants provide value-added information products to American agri-business and/or farm commodities investors; generation of these products depends upon reliable access to NOAA-produced and distributed information.

The user should be capable of formulating an information query whose satisfaction may require accessing one or all of NOAA's information servers. The actual physical location of the data and information across this virtual data and information system should be of little concern for the user. Once formulated by the user, the query should be broadcast across the Web to the appropriate information servers and should result in a comprehensive response. In a broader sense beyond the NOAA purview, the query should be broadcast also to other organizations' environmental databases and, again, the user receives a still more comprehensive response.

The system should be queriable at several different levels: (1) to get information on information inventories; (2) to retrieve the information products themselves (e.g., graphic products or text reports); (3) to retrieve source data or observations which can be imported by the user into his own application for analysis.

Other desirable characteristics include: adequate security to prevent unauthorized persons from modifying data or information, or gaining access to NOAA resources; a unified, distributed data and product ordering system (for paper products, CD-ROMs, etc.) with user authentication, and verification of billing information; and support for centralized management of multiple servers across the Internet. Even now some links are becoming stale as servers are modified, updated or reconfigured. A system which allows automatic verification of embedded links is becoming more important as the Web becomes more complex and dynamic.

CANDIDATE SYSTEM ARCHITECTURE

We envision a nationally distributed system of NOAA information servers connected to the Internet and to the follow-on National Information Infrastructure. Responsibility for the individual servers will be retained by the individual NOAA operating units with guidelines and policies developed and coordinated by a steering group of NOAA organizational representatives. Control over information content will be retained within each of the operating units of the agency.

The nationally distributed network of environmental information servers must be capable of providing on-line or near-line access to tens or hundreds of terabytes of data for perusal. We anticipate that in the next decade certain individual archives will reach into the hundreds of terabytes and up to a petabyte. The largest data sets may not be appropriate for on-line distribution due both to network bandwidth considerations and to the fact that on-line distribution requires the user to have the capacity to store the data as it is received. Typically, for the larger data archives, we foresee a front-end server running the suite of Web tools and a file management/database management system and a backend near-line mass store device such as a robotic system or silo. A disk system such as the currently available RAID technologies will serve as a means of caching data for on-line access. A more modest configuration will be implemented for smaller databases. Given historical precedents, the agency will likely not adopt a single server solution for all archives but will continue to depend on enabling technologies such as MOSAIC to allow the integration across these heterogeneous systems.

The current system providing information about data, the NOAA Directory Services, is a centralized database. Our plans are to greatly expand the information content of this system and to distribute the components among the information servers and archive facilities that currently house the data. The directory information will be interlinked across the nationally distributed database.

To support these ambitious plans, MOSAIC and the Web will need to continue to expand. Several of the capabilities needed include:

* Distributed parallel searches across multiple servers.

* Full support for SGML for rich document display and formatting and compatibility with word processing programs.

* The ability to integrate data from multiple servers into a single viewer. For example to support multiple layers in a GIS from multiple distributed data sources.


Ernest Daddio, edaddio@esdim.noaa.gov
William Turnbull, turnbull@esdim.noaa.gov