MetaCenter Science Highlights Repository

Abstract: The MetaCenter, a cooperative effort to integrate the intellectual and computational resources of the NSF-funded supercomputing centers, has developed an integrated, but distributed Online Information System. This system is intended to provide a single point of access for information about the centers for the Internet community and a documentation system for users of our high-performance computing facilities. The Science Highlights Repository is the latest addition to MetaCenter Online Information System and was designed specifically to take advantage of the functionality of the Web. It incorporates a sophisticated search engine across multiple servers integrated with a dynamic browsing interface, scientific visualizations, and links to related information. The Repository required development of tools which are of value to other consortiums of institutions who are interested in creating an online information system distributed across multiple servers.

Moderator: Greg McArthur, National Center for Atmospheric Research


What is the MetaCenter?

Cordelia Baron Geiken, National Center for Supercomputing Applications

The MetaCenter is a collaboration of the Computer and Information Science and Engineering (CISE) supercomputer facilities funded and supported by the National Science Foundation. With the addition of a new member in February 1994, the Metacenter now consists of five unique supercomputer facilities: the Cornell Theory Center (CTC), the National Center for Atmospheric Research (NCAR), the National Center for Supercomputing Applications (NCSA), the Pittsburgh Supercomputing Center (PSC), and the San Diego Supercomputer Center (SDSC). In addition to these initial participants, the NSF recently announced that a number of MetaCenter Regional Alliances (MRAs) will be formed with other institutions which will work with and directly participate in the MetaCenter.

These centers work to promote joint research and facilitate technological development in high-performance computing for academic, industrial scientific, government and engineering purposes. Each center has its own research focuses and strengths, and the Metacenter collaboration aims to combine these efforts in order to develop a comprehensive resource for advanced technologies.

The Metacenter collectively represents all of the major US supercomputer machine architectures and boasts a combined computing power of one teraflop. In addition, the Metacenter has been a key participant in the development and testing of high-speed network applications. With its new MRA partners the MetaCenter intends to serve local and regional education, training, research and industrial outreach needs of multiple audiences by facilitating the understanding and use of high-performance computing and its relation to the NII.

MetaCenter Projects

The MetaCenter participants work closely on a number of collaborative project meant to make advances in high-performance computing technologies or to provide better support to the research community. A sample of some current projects include:


"MetaInfo" - The MetaCenter Online Information System

Dan Dwyer, Cornell Theory Center

The MetaCenter Online Information system, or MetaInfo , is the electronic entrance to the MetaCenter. This single, integrated online information system for the National Science Foundation funded supercomputer centers is a core activity of the MetaCenter. Development and implementation of this system has been an evolving effort over the past two and a half years. With over 1000 accesses each month, MetaInfo has become a significant resource to the Internet community for information on computational science, high-performance computing, and new scalable parallel architectures.

The goals of the MetaInfo system are to:

In development of the MetaInfo system we consider a number of prospective audiences including current and potential supercomputer users, other researchers and graduate students, scientific computational support staff at other institutions, those who train in the use of high-performance computing, news media, and Internet browsers in general. By developing an integrated system across all five NSF-funded supercomputer centers we are able to better meet the needs of these audiences and reduce duplication of efforts.

History

The national laboratories and NSF supercomputer centers have been discussing online information systems since 1990. Initially these discussions considered appropriate standards for vendor and locally written documents. An integrated system across the centers was not originally envisioned. With the initiation of the MetaCenter and the development and acceptance of distributed client/server-based tools such as WAIS, gopher, and World Wide Web, the time was right for a coordinated online repository for information about high-performance computing and communications.

Beginning in the summer of 1992, staff members from the Cornell Theory Center, the Pittsburgh Supercomputing Center, the National Center for Supercomputer Applications, and the San Diego Supercomputer Center began to discuss a common online information system. (The National Center for Atmospheric Research was not yet a participant in the MetaCenter. ) This had been identified as central to the MetaCenter mission by our Academic Affiliates. It was decided to develop this system using gopher which, at that time, was the most popular information server protocol.

In March 1993 the gopher-based MetaInfo system was announced. This system was designed as a identical menu structure on the gopher servers at each MetaCenter site. While the menu structure was duplicated, each document linked into the MetaInfo structure resided only at the Center which had created it. The content and links within this structure was kept in synch by a set of perl scripts run by each Center.

Almost immediately after announcing the gopher-based system, discussions began on how to take advantage of the functionality of the Web. By the summer of 1993 we were providing a Web interface into MetaInfo. At the end of the year, the current design was largely in place. Early in 1994 it was decided that the the MetaInfo Web system could be best maintained by each Center taking ownership of a part of the larger structure. That is, the four major subcomponents of the system would be divided and physically reside on the four Web servers of the centers. As with the gopher server, the actual documents that are linked into the MetaInfo Web system are located at the Center which created them.

In the fall of 1994, the gopher version of MetaInfo was phased out of production.

MetaInfo Philosophy

The existence of the MetaInfo system does not imply that each of the MetaCenter sites disseminates information in identical ways. Each of the five centers will choose to emphasize those technologies and protocols that best meet its individual needs and capabilities. Thus, the individual centers will have the same "look and feel" to their information services and will link into each other's systems, but they will also maintain their individual, diverse characters.

By working together we are not only able to provide an integrated interface into the information systems of the MetaCenter, but can also assist each other in development of our individual systems. Tools which are developed or implemented at one Center can be easily share with the others. Continued communication and interaction has allowed us all to move forward more quickly in the rapidly evolving cyberspace world.

We believe that our experiences in developing and maintaining an integrated, but distributed online information system can assist others who wish to make use of the Web for disseminating information about their collaborative projects. We hope that the MetaInfo system is not only a resource to the high-performance computing community, but also an useful template for other collaborating institutions who wish to use the power of the Web to develop integrated information services.

Future Projects

In addition to the MetaCenter Science Highlights Repository which will be described in detail in later sections of this paper, the MetaCenter plans to focus on two additional areas in the near future. These include more tightly integrating our online materials and developing electronic educational materials.

Integration: While the MetaInfo system contains a wealth of useful information, cross references between different sections are frequently not available. For example, while reading a research report on the latest progress in oceanic ecosystem simulations carried out on the IBM SP1, it might be desirable to quickly follow a hyperlink to a detailed technical description of that architecture or to view a list of upcoming MetaCenter workshops on simulation techniques.

We wish to provide a single, tightly integrated access point to all information available within the MetaCenter. This would include abstracts of research projects, hardware and software descriptions, results (written report, images, animations), and bibliographic references. A user of this system should be able to easily navigate among the various components of the system regardless of physical location of the documents and not limited by predefined structure. Already work has begun to inventory MetaCenter resources (both electronic and hardcopy), review subject access design issues, and address the question of intellectual property rights. Future work by the MetaCenter and others in the Web community is needed to design databases which can store and service queries about metadata and develop more sophisticated search engines.

Education:An important mandate from the NSF to the centers is to educate and train U.S. researchers and potential researchers to make effective use of high-performance computing. Because we have a national community to train, electronic educational techniques have the potential to greatly increase our impact at significantly lower costs. Our goals in the area of education include:

Projects already underway within the MetaCenter include converting lecture notes to html, evaluating "Multimedia" tutorials, creating indexes of Internet resources for communities with which we work, and investigating training over the Internet.


The MetaCenter Science Highlights Repository: Moving from Hardcopy to Electronic Publishing

Vivian M. Benton, Pittsburgh Supercomputing Center

The hunger for information regarding available resources, how those resources are used and results of research projects that have made use of the resources of the five NSF national supercomputing centers, is evidenced by the amount of hardcopy science publications produced by the centers each year. In 1993 alone, the NSF centers printed 150,000 newsletters, 6,000 technical reports and 50,000 individual science reports.

In light of this obvious demand for the printed piece, the question then becomes "Why is the electronic publication of science reports on the Web a better medium than the traditional hardcopy documents"? The answers are simple:

  1. To reach a larger audience. Researchers and the lay person will have better access to the articles.
  2. To provide articles in a more timely and dynamic manner. Individual articles can be placed on the Web as soon as they are available, not at the end of the year when the publication is produced.
  3. To provide more flexibility in the presentation of articles. Sound and animations can be included, and links to related information can be made at the click of a key.
  4. To enable scientists to contribute publications to the repository that reside on their individual servers.
  5. And, of course, it's a lot cheaper.

Document Design and System Design Issues: Moving from "Why" to "How"

Document design and system design are extremely important issues to consider when developing and building a repository of science articles for use in a hypermedia environment. The technical nature of the material makes an aesthetically pleasing and well-designed system even more desirable. Agreement on the design of the basic template for the articles, though not required, adds to the uniformity of the system as a whole.

The size of an article in a hypermedia environment is another important consideration. Separating an article into "chunks" and creating links to supplemental articles (in hardcopy, these would be called sidebars) gives a reader the option of reading as much or as little as he wants to at any given time.

Other document design issues no less important are how effectively hyperlinks and multimedia components are used. Care must be taken so that the reader is not distracted, confused or discouraged. In addition, the use of high quality images and animations, which cannot be included in hardcopy publications, adds another dimension and increases the level of interest to what has the potential to be sterile and boring.

In addition to the above, the design of the overall system should be user friendly with minimal learning required to use it effectively. A user should be able to easily browse through articles of interest to him or use keywords to search for specific items. The searching and browsing mechanisms should be integrated in such a way that a user can effortlessly switch from one to the other at any level within the repository.


The MetaCenter Science Highlights Repository: Software Development

Joshua Polterock, San Diego Supercomputer Center

Developing in the MetaCenter on the World Wide Web

The World Wide Web, as a tool, provides the functionality required to develop a distributed repository in a MetaCenter environment. The Web actually allows the MetaCenter, spread across five geographically separate locations, to undertake activities in research and development. This project is lead by Jason Ng at the National Center for Supercomputing Applications (NCSA) with the development occurring at the Cornell Theory Center (CTC), the Pittsburgh Supercomputing Center (PSC) and the San Diego Supercomputer Center (SDSC). The fifth member, the National Center for Atmospheric Research (NCAR) functions as our first beta test site for the software package.

The MetaCenter Science Highlights Repository has four main areas of software development; the user interface, the server maintenance scripts, and the science articles.

All project scripts use the Perl programming language.

The User Interface

The user interface contains several pages, but the three primary pages include; the welcome page, the search engine and the browse engine.
  1. The welcome page introduces the project and its participants. It presents the goals of the project and links to the searching and browsing engines as well as the package distribution pages. (PSC)

  2. The search engine provides keyword searching on a project index file. Each server maintains a copy of a global index file containing the records of all project pages from each participating server site. (SDSC)

    The meta_search script dynamically generates a form based on the searchable keys (fields) found in the index file. The form allows the user to enter a search string (any Perl regular expression), select which keys to display in the search result and rank them by relevance. A relevance ranking of zero (0) removes a key from the search.

  3. The browse engine (CTC) provides an interface that allows the user to list project pages based on fields of science as defined by the National Science Foundation's codes.

    The meta_browse script reads in a formatted ASCII file containing the fields of science and their respective numbers and dynamically generates pages of HTML code based on the areas selected by the user. The MetaScience group continues to debate the issue of how many levels of hotlinks a user should have to traverse before seeing references to project files. The design of the meta_browse script allows easy alteration of this function.

  4. The package distribution page (SDSC) describes the repository software package and provide links for other administrators to retrieve the package. This package allows other information providers to create similar searchable cross-server repositories of HTML documents. (This page will be provided upon release of the software.)

The Server Maintenance Scripts

There are three scripts required to compile a master index of the project files from all participating repository servers; makeindex, fetchindex, and catindex.

  1. The makeindex script (CTC) reads project files, extracts the key information and generates index files.

  2. The fetchindex script (CTC) handles the collection of the index files between servers.

  3. The catindex script (CTC) compiles the index files into a single master index file.

The Science Project Files

Inclusion in the repository requires that a project file contain at least one searchable key of the form <!KEY>arbitrary text string<!/KEY>. The repository scripts allow the writer/editor to define any generic key or metatag for searching. For our project, we defined the keys; title, url, researcher, organization, hardware, software, keywords, and field of science. To maintain a consistent look and feel across the articles in the repository, the group defined a project file template. The article template defines the minimum requirements for a project file and its layout. (PSC)


Author Biographies

Vivian M. Benton is the Publications Coordinator at the Pittsburgh Supercomputing Center. She is editor of the center's newsletter, PSC News, and is also responsible for producing and distributing the majority of the center's hardcopy documentation and publications, such as overviews, flyers, user's guides, etc. In a past life, Vivian was a computer programmer and a manager of technical writers.

Dan Dwyer is the Project Leader for Online Information Systems at the Cornell Theory Center. In this capacity Dan organizes and oversees development of the Theory Center's extensive online information system which contains general information for the Internet community and technical documentation for supercomputer users. Over the past two years this has included deployment of gopher, WAIS, and World Wide Web servers. Dan also been active in collaborative work on online information with the other NSF supercomputing centers.

Cordelia Baron Geiken is a Multimedia Technology Associate in the Media Resources Group at the National Center for Supercomputing Applications. She is the Project Leader for the Digital Information System at NCSA and is also heavily involved in projects throughout NCSA developing Mosaic as a multimedia communications tool.

Greg McArthur, Ph.D., serves as the principal digital media expert in engineering hypermedia-based information networks for the Scientific Computing Division at the National Center for Atmospheric Research (NCAR), located in Boulder, Colorado. As a member of the Visualization and Digital Information Group, he works to create seamless, intuitive interfaces to information resources carried via World Wide Web servers, and accessed principally via NCSA's Mosaic browser. He acts as the Webmaster for SCD and NCAR and oversees the design and development of Mosaic documents used for education and training, outreach, inter-scientist communication, scientific visualization, and online documentation via the internet.

He recently served on NSF's Digital Library Review panel and is an ardent and outspoken advocate of WWW/Mosaic, its expanded use, and its potential for creating a truly ubiquitous, open, and universally available knowledge resource for mankind.

Joshua Polterock is a Senior Technical Editor at the San Diego Supercomputer Center. He manages the Gopher and World Wide Web network information services at SDSC and is a contributing writer/editor on multiple projects including a pilot interactive multimedia science project and the center's bi-monthly Gather/Scatter newsletter.

Dan Dwyer dwyer@tc.cornell.edu