Abstract - The management and distribution of environmental and public health data requires that information be made available in a consistent and reliable format to a wide range of groups for multiple purposes. The recent effort by the Federal Geographic Data Committee (FGDC) to standardize metadata--documentation of database contents--is an important step in that it provides a reliable means of sharing spatially referenced environmental and health data. The World Wide Web (WWW) and graphical interfaces such as Mosaic should play a critical role in expanding access to information concerning health and the environment. However, information systems currently emerging on the Internet face significant challenges if they are to serve both specialists as well as the taxpayers who are contributing to their creation. This presentation discusses the potential of tools such as Mosaic to provide the kind of descriptive and contextual information that can make environmental and epidemiological research data and information more accessible; however, we emphasize that information developers need to follow established standards for handling data to assure its validity in the eyes of scientific researchers and public health specialists. Tools such as those being developed on WWW have the potential to fulfill these goals while also enhancing public understanding of the contexts and benefits of such research projects, and thereby helping to assure their future support and continued funding.

The management and distribution of environmental and public health data requires that information be made available in a consistent and reliable format to a wide range of groups for multiple purposes. The ability to deliver information resources to multiple users in forms suited to their diverse needs and limitations has not been feasible before the development of tools such as the World Wide Web (WWW) and Mosaic. The recent effort by the Federal Geographic Data Committee (FGDC) to establish standards for metadata ("data about data") [1] clearly shows the potential of the Internet to provide a reliable means of sharing spatially referenced environmental and public health data. While our discussion relies on geospatial and epidemiological data, our experiences with these data serve to illustrate the significant obstacles that face researchers interested in sharing a diverse range of data types, data which often arrives with little accompanying information about its file types, storage media, or use. The potential benefits of improving access to these data for researchers and policy-makers warrant our attention. Data cataloging and transfer standards enable information system developers to facilitate the description and exchange of digital data via electronic information systems and hypertext markup language (html). Placing metadata and other information resources on the Web serves the interests of both specialists as well as the public by expanding access to information concerning health and the environment. [2]

We provide examples of information resources implemented under current Web topology to suggest the potential of browsers built on the model of Mosaic to provide descriptive and contextual information that can make environmental and epidemiological research more accessible to a wider group of users. Just as the evolution of the scientific method has provided uniform and repeatable techniques for conducting laboratory and field research, information development must meet a similar demand to follow established standards for handling data and assuring that scientific researchers and public health specialists accept data as valid and reliable. While our discussion contains much that is relevant to the development of user interfaces, we emphasize the need for developing and adopting standard methodologies for moving data from collection through the processes of extraction, analysis, and delivery to the user. However, information system developers face another challenge: user interfaces should also provide background and descriptive information to enhance public understanding of the contexts and benefits of such research projects, thereby helping to assure their continued support.



User interfaces have often been too narrowly associated with the capabilities of a particular technology, and discussions of the topic sometimes give the impression that the design of better computer systems alone can solve the problem of making information accessible to large numbers of users. The problem is the assumption that effective information access can be provided if one of two things happen: the optimal interface design is found or users finally become proficient with computer tools. This view shifts a burden from information system developers to either the machine or the user while ignoring many of the immediate contexts in which information systems are implemented: a lack of standard interface design, incompatible systems, and proprietary formats. Today users are still often required to learn a new set of keyboard commands and metaphors for each information system they use. [3]

The limitations of early efforts to develop interfaces for public dissemination of scientific information can be understood in terms of scientific reduction, a key feature of the scientific method. Reduction is "the idea that the laws and theories of a discipline can be re-expressed as special cases of the outworking of the laws of a more fundamental discipline," and successfully reducing complex scientific problems has obvious benefits. [4] Yet it often seems that the art of designing user interfaces has been reduced to a matter of technology or user psychology, with too little attention paid to information content and user expertise. The information systems that are emerging on the Web suggest that developers have become much more interested in users' interactions with information and data. Many of the systems on the Web have incorporated insights about information-using behaviors of not only end users but also data providers and systems developers. To understand interactions between users and scientific information systems, two factors must be addressed: the diverse character of scientific data, the raw material of both scientific research and social policy formation, and the complex actions that are performed on data during its collection, extraction, and analysis.

Ironically, developments that would usually be associated with the "back end" of an information database have significant potential to enhance the design of interfaces for searching and exchanging data over the Internet. Researchers and developers have attempted to expand database contents to include a wide range of information about file content, structure, and access, details that have traditionally been seen as ancillary to the primary purposes of scientific databases. Especially important is the emergence of standards for handling digital and spatially referenced data, standards that include but are not limited to the Federal Information Processing Standard (FIPS), Spatial Data Transfer Standard (SDTS), and FGDC's Content Standards for Digital Spatial Metadata. [5] Interdisciplinary and inter-agency attempts to develop common data dictionaries on the model of these standards represent user-driven efforts to incorporate aspects of scientific research practices into electronic systems, in that they extend the consistency that characterizes scientific methodologies to our handling of digital data and computer technologies.



Efforts to develop digital libraries have already begun, as evidenced by the opening of the French National Library in Paris and the National Science Foundation grants awarded earlier this year. [6] The development of scientific digital libraries (also called data repositories) can be expected to foreground truly innovative information tools that provide not just access to published works but to the information resources (the data and metadata) upon which published works are built. Information systems must provide multiple forms of access to research data, but they must also tie data to additional information resources that typically reside outside scientific databases in the form of overviews describing the origins and contexts of the research, and metadata indicating the appropriate uses of the data and its availability or distribution. Multiple benefits can result from incorporating descriptive content such as metadata into information systems and developing procedures enabling users to search and process data. The goal of developing such multi-purpose systems should be to provide enhanced opportunities for scientific users and managers to share not just data but expertise and insight. Another goal should be to benefit non-scientific users by providing other kinds of supplemental information: background materials and contextual overviews provide access to the terms of the discussion being carried through scientific research.

The benefits of providing electronic support for multiple uses of information resources are evident in the Web page of the Consortium for International Earth Science Information Network (CIESIN). CIESIN’s page provides the user with access to multiple forms of information and data including Thematic Guides, which provide explanations of the background and relevance of the data bases as well as pointers to related information resources on the Internet. From within CIESIN's home page, users can access Catalog Services, a client-server tool developed by the Consortium for browsing the data bases of member institutions of the CIESIN Information Cooperative. Professional and scientific users and the general public enter the data base via the same interface; neither group is isolated from the other, and each can feel that it is participating in similar processes of knowledge discovery. Additionally, the general information contained in the Thematic Guides along with full-text articles available via the "Information Kiosk" provide some insight into the goals and objectives of the research carried on by the Consortium and its member agencies. The "users" of this system can be seen as scientists and managers (the research participants) and the general public, and rather than an enclosed encyclopedia entry, these users are provided with a ramp that leads them from gateway materials to more detailed, specialized resources such as Data Set Guides and CIESIN’s Catalog Services.

CIESIN's Data Set Guides contain information that is somewhat less detailed and formally structured than that provided by the FGDC metadata standard. The Guides are organized around a common set of topics and include links to additional and related information concerning a particular data product. While the Data Set Guides provide information about the contents, availability, and coverage of the data products held by either CIESIN or its "Information Cooperative Partners," data exchange is directly facilitated by the user interface provided by CIESIN's Catalog Services. [7] CIESIN’s Catalog possesses several features of effectively designed browsers for data and metadata: the interface consistently provides a small number of basic controls prior to making available tools for querying, displaying, or downloading database contents. The interface design can minimize network loads as well by requiring users to perform a series of metadata search and query steps before being able to transfer data, a feature that may frustrate high-bandwidth activities by novice users and Web "wanderers."

Although CIESIN indicates that it will support the MARC, FGDC, and SDTS metadata standards, a significant limitation is that standards such as those developed by FGDC cover one kind of data only, digital geospatial data. Organizations such as CIESIN that must manage multiple data types can look to FGDC’s metadata standard for guidance but must also improvise standard approaches to cataloging unconventional information resources. For example, the Louisiana Coastal GIS Network (LCGISN), a system developed at LSU in Baton Rouge, provides an online catalog of eight types of data ("gray" literature, maps, people, aerial photography, satellite imagery, as well as audio visual, tabular, and geotechnical information resources) relevant to the study of Louisiana's wetlands and coastal zone. LCGISN’s development began long before FGDC completed its metadata standard. Because LCGISN adopted much of the United States Machine Readable Cataloging Standard (USMARC), incorporated a pre-existing hierarchy of subject terms established by the Library of Congress, and followed the development of the FGDC standard closely, the project has benefited from incorporating existing standards from diverse sources, a cooperative pattern of development that is characteristic of the Internet as well scientific research generally. [8]

The range of multi-disciplinary information resources available via CIESIN and LCGISN reflect attempts to incorporate the expertise of multiple domains, disciplines, and organizations into broadly distributed electronic systems. Instead of incorporating all types of information into the same interface, however, these systems rely on the expertise of professionals in different disciplines to develop methodologies for delivering information resources in ways that satisfy the scientific demand for reliable, usable data. While some information in these systems is accessible as reports, charts, tables, or even maps, providing such "fixed" resources is generally much less complex than current efforts to allow users access to tools for manipulating or processing data while still providing acceptable levels of security and data management. The pattern of development reflected in emerging scientific resources on the Internet resembles what Matheus, Chen, and Piatetsky-Shapiro describe as facilitating Knowledge Discovery in Databases (KDD). [9] According to these researchers, the most useful information often exists outside the database in the form of "domain knowledge," unstructured knowledge that enables specialist users to recognize appropriate or potential uses of data or the relevance of information. While "the extraction and codification of this knowledge is a challenging problem in the development of a KDD system" (906), metadata may be one way to indicate the appropriate applications of a particular information resource. By including complete details about a particular data type in a hypertext format that is searchable by information exchange protocols such as Z39.50, metadata can supply two kinds of information that may make scientific data more accessible to general as well as specialized users: processing history, which often takes the form of a narrative, and distribution information, which indicates the terms of the data's availability. [10]



Developing systems for sharing information and research data requires that domain knowledge be incorporated or attached to the data themselves. The conceptual model below represents what we consider to be a "preferred dynamic" where end users become integral to how developers provide access to data and information resources. It should be pointed out that although the model distinguishes between data providers, information system developers, and users, the distinction is somewhat artificial; in practice, the categories are not mutually exclusive but overlapping. The model illustrates the unfortunate fact that these groups often exist in isolation from one another: users often feel isolated from the agencies that collect and process data, and developers rarely share the same concerns as either data users or providers. The advantage of an open dialogue between users, system developers, and data providers is two-fold: a clearer image of user complexity is allowed to emerge, and a broad population of users may come to see themselves as participants in the development of enhanced communications concerning science and technology.

The somewhat awkward relationship between data providers and end users in the current research milieu can be generalized as follows. Users needing access to data contact providers and obtain data that is generally on magnetic media. The media type may or may not be supported under the user’s host computing environments. Experience has proven that gaining access to information about data formats, media compatibility, and appropriate use of data can be a confused and error-prone process that stifles research efforts. On many occasions end users lack the needed information that would enable smooth and reliable transfer of data. In other words, the arrival of new data may generate more questions than it answers.

Another relationship or intercourse depicted by the model is between developers and data providers, the large agencies that collect, process, and disseminate environmental and public health data. (Commercial providers are also included in this group.) Since many of these agencies and organizations may not possess the necessary resources or incentives to develop multi-purpose interfaces to scientific information, much of the development may be done by intermediaries, the group termed developers in the model. This group may be composed of a combination of expert users and motivated providers, and their inclusion in the model is our attempt to focus particular attention on this group while also complicating the familiar user-provider dichotomy. Furthering dialogue between data providers and information system developers has great potential for promoting access to scientific data and information, a goal with significant social relevance. If access to reliable information and research data is to be enhanced by tools such as hypertext used in combination with distributed computing environments such as the Web, we believe that institutions holding the data are going to need the expertise of a diverse group of developers, domain experts, and computing professionals, all of whom need to become participants in the ongoing discussion concerning access to information, metadata, and data.



This work suggests that developing user interfaces for scientific and public information can be facilitated by including with data such supplementary information as details of data structure and formats, as well as the research methods of data collection, processing, and exchange. While the hypertext capabilities of Web browsers such as Mosaic offer significant potential to expand access to public information, metadata standards are essential to the reliable exchange of critical information resources such as spatially referenced environmental and public health data. The continued development of metadata standards appears to be one cornerstone of enhanced access to reliable sources of digital data. The capability of hypertext markup language to link data to related information resources suggests that the Web can become an environment that is conducive to the development of data repositories capable of serving multiple purposes including the distribution of public information concerning environmental and public health research. We believe that consortiums such as CIESIN provide one model of collaborative efforts where federal agencies, national and international organizations, and private industry extend access to information resources, enhance scientific research, and promote general education.