The Use of the Information Highway to explore Climate Variability

C. Bruce Baker [bbaker@ncdc.noaa.gov] and Danny E. Brinegar [dbrinega@ncdc.noaa.gov]


Table of Contents


Introduction

Current concerns over global and regional climate change fluctuations require development of a system that is capable of answering inquiries about the state of the climate from many sectors of society. Often however, when the general public is confronted with studies concerning climate change the results are presented as "global" averages. These averages can mask important regional changes which are "key" to understanding the climate system. The Global Climate Perspectives System (GCPS) system is being designed to examine regional to global scale climate changes by using station and gridded climate data.

The main concept behind the GCPS is to enhance the exploration of climatological data in ways that will allow an investigator to gain an understanding of the information contained in the data. This is accomplished by creating an environment that allows for the non-linear flow and visualization of information about the data. This enables an investigator to pose a particular question about climate change and use GCPS to provide quantitative information on the metadata, determine the credibility of the climatological datasets, graphically examine the data spatially and temporally, and obtain the data needed to answer the question.

An important aspect of this entire process is having an efficient way to navigate through the information and large datasets. This navigation process is a multilevel investigation of the information and data for a given inquiry. What conclusions have been drawn from the research using this dataset? What is the temporal range and spatial distribution of the data? Answers to these questions are linked using hypertext/media to journal articles, real time display of time series , contour plots and ultimately access to the data, all of which are available on the Internet. These questions should be answered prior to embarking into other investigations with these data. The scientist has a unique way to assimilate scientific information using this hiearchical information approach. But the information on a particular issue should be sufficient to be understood by individuals with different levels of knowledge on the subject.

The (GCPS) is being developed using this paradigm to explore climate change. The GCPS is a coordinated project involving groups within two major NOAA components: the National Climatic Data Center (NCDC), and the Climate Diagnostics Center (CDC).


The System

The GCPS information system is specifically designed to produce meaningful information about the climate and to have the ability to browse and extract station and gridded climatological data in the time and space domain. The information is in the form of online hypermedia/hypertext journal articles or reports that pertain to the datasets and metadata. The philosophy behind GCPS is for an investigator to have a thorough understanding of these large datasets by probing in the time and space domain and using available metadata (including results from peer reviewed journal articles) available to make decisions as to the types of analyses that can be used to examine regional to global scale climate changes. Figure 1 provides an overview of the construct of GCPS. It is a combination of the World Wide Web client/server software called MOSAIC, an X Window system display server, a Motif window manager and a database access engine. The arrows indicate the linkages between these modules. One can start with any one of the modules and navigate to any of the other ones.

MOSAIC uses hypertext/media and provides the framework for the hierarchical information delivery system. Information about particular scientific investigations are linked together using hypertext markup language (html). For example, within the MOSAIC framework there is a discussion on global warming (Figure 2), which includes the text and figures electronically linked to key words. Each of the small images can be "clicked on" to view on the screen or download to the client computer as a postscript file. The user has the ability to also download the entire dataset via anonymous ftp within the MOSAIC framework or a subset of the data in the time and space domain using the GCPS data access engine.

The interactive portion is shown in Figure 3. The user can define the following aspects of query :

Once submitted, the query is passed to the Grid Analysis Display System (GrADS) (Doty and Kinter, 1993) which performs the necessary calculations and generates a postscript image. The image is then returned to the WWW client via the Common Gateway Interface (CGI) and the HyperText Transfer Protocol (HTTP).

Figure 4 shows an example of an anonymous ftp dataset description page. Each dataset is displayed like a book. In this case we see the geographic distribution of all stations and an overview of its contents. Selected journal articles that show results from the analysis of this dataset are linked from the anonymous ftp page back to the article and all documentation about the dataset are provided when it is downloaded.

The GCPS data access engine (Carroll and Baker 1994) consists of a relational database (Empress), the Naval Environmental Operational Nowcasting System (NEONS), a X Window system display server and a Motif window manager. The access engine provides a method to browse and extract station and gridded data. The relational tables are an integral part of the information about the datasets residing in the RDBMS. The database access engine provides a user interface that is coupled to all databases within GCPS. The initial window is shown in Figure 5. This window displays the datasets from which subsets of data may be extracted in the time and space domain. Just below the datasets is the query for the time domain. The user can choose the time increment by using this part of the dialog box. Below this part we see a world map. The user can now box in any part of the world using the cursor. These two actions define the time and space domain for extracting a subset of the data. At this juncture a query is formulated and the location of the stations and any metadata associated with those stations appear (Figure 6). Note in Figure 6 we see the station distribution for the chosen area and time range and metadata. In this case the metadata consists of latitude, longitude, station name, and period of record. Ultimately the data and/or the metadata can be written to a file. The user can specify the format for the data.

All of the above methods of exploring and examining the information and data can be done in any order the user would like to pursue. This eliminates the conventional problem of having to step through sequential processes. The MOSAIC interface allows one to explore information and data in a non-linear fashion on the Internet.


Databases

The databases that reside within GCPS are accessible over the internet using MOSAIC. These datasets are peer reviewed observational data that can be used to examine global to regional scale climate change. The crux of GCPS is to develop high quality century-scale time series for various atmospheric constituents for examining regional and global change. The databases will have a high degree of quality control. For example, there are a number of datasets that have global distribution that have online documentation and a detailed description of the quality control. These datasets are the NOAA Baseline Climatological Datasets. These datasets are derived from the Global Historical Climate Network (GHCN), Vose et. al. (1993). The GHCN is modified by performing reproducible quality control procedures both temporally and spatially, producing summary statistics for each station and month and updating the data on a seasonal to semi-annual basis (Eischeid et al. 1994). At present these have a global spatial distribution and will be updated semi-annually.

The temporal check flags extreme values based on limits determined from a multiple of the interquartile range (IR) calculated for each station/month. This procedure is common in exploratory data analysis procedures. An outlier is flagged when:


                Xi - q50 > f * IR                 (1)
where Xi is the monthly mean of year, i, q50 is the median (or the 50th percentile) and f is the multiplication factor. A value of 2.75 was used for temperature and 4.00 for precipitation.

The spatial check uses nearby simultaneous values to calculate an estimated value at a target station over the period of time for which adequate data are available. There are numerous spatial interpolation methods avaialable for point estimation with irregularly spaced data. Typically, the choice of methodology is dependent on several factors: the meteorological variable under consideration; the geographical area; the spatial distribution of surrounding observations; and the month/season for which the target station is to be estimated. Since these estimates are required for each month separately over a variety of terrain with differing number of available surrounding observations, six different methods are utilized and compared for each month for each station, with the best one chosen (based on the correlation).

The end result from these checks are a series of flags and a file of summary statistics for each station and month. All of this information is cross-linked through hypertext to explore these datasets.

In addition to these datasets in GCPS, recent versions of the Geophysical Fluid Dynamics Laboratory Global Climate Model 100 and 1000 year output are available. There are temperature and precipitation model runs for CO2 fixed at the amount observed in 1958 (control run), and for an approximate 1% increase each year thereafter for the 100 year run, and the control run output for the 1000 year run. The data are on a gaussian grid with monthly resolution.


Information and Data Distribution

Another thrust of GCPS is developing data and information distribution nodes over the Internet. At present GCPS has been directly linked by different organizations on the Internet. These include University of Illinois, Purdue University, NASA Global Change Master Directory, Australian Environmental Service, Woods Hole Oceanographic Institute to name a few. The database access engines are located at the National Climatic Data Center and the Climate Diagnostics Center. At present only limited access is possible until the client/server version is completed. The Internet has the potential for providing credible and easily accessible scientific information and data for all levels of curiosity and the GCPS is being designed to provide that framework for the climate community.


Research Plans

Over the next few years GCPS will be focusing on implementing, in near-real time, the quality control procedures developed over the last year. Additional datasets being considered for inclusion into GCPS are: a historical merged land/ocean gridded dataset; provide interactive contour and time series plots for Microwave Sounding Unit (MSU) data; and possibly a snow/ice coverage dataset. It is also planned to have a client/server version of GCPS available in 1995.


REFERENCES

Carroll, T. C. and C. B. Baker, 1994: The Global Climate Perspectives System: An Intelligent Database System for the Analysis of Environmental Data. Tenth Conference on Interactive Information and Processing Systems, Nashville, TN., January 16-20 1994.

Doty, Brian, J. L. Kinter III, 1993: The Grid Analysis Display System (GrADS) An Update. Proceedings of the Ninth International Conference on Interactive Information and Processing Systems for Meteorology, Oceanography, and Hydrology, 1993, pp. 165-167.

Eischeid, J. K., C.B. baker, T.R. Karl, and H.F. Diaz, 1994: The quality control of long-term climatological data using objective data analysis. Submitted to J. Appl., Meteor.

Vose, R.S., T.C. Peterson, R.L. Schmoyer, J.K. Eischeid, P.M. Steurer, R.R. Heim, and T.R. Karl. 1993. The Global Historical Climatology Network: Long-term monthly temperature, precipitation, and pressure data. Fourth AMS Symposium on Global Change Studies, Anaheim, CA., January 17-22 1993.


Danny Brinegar, dbrinega@ncdc.noaa.gov