The Web as a Computational Engine for Chemistry and
Molecular Biology.
Peter C. FitzGerald and Robert A. Pearlstein
Computational
Molecular Biology Section, Division of Computer Research and
Technology, National Institutes of Health, Bethesda Maryland 20892
Abstract
The functionality of the World Wide Web and Mosaic
as an information distribution system has been amply demonstrated
during the past year. However, with the introduction of FORMS, Web
clients such as Mosaic, have acquired the ability to act as "front-ends"
to an almost limitless variety of computational applications. In
essence, Web clients may act as "universal front ends", offering
user-friendly interfaces to many kinds of remote computational tasks.
The main constraint of this approach is that a given computational task
must be capable of being initiated based on a defined set of input
parameters and data, and requires no further user interaction.
However, while this limitation is significant, many computational tasks
may be addressed within this constraint. To determine the feasibility of
Web/Mosaic acting as a "universal front-end", we have
developed a number of prototypes which address specific tasks relating
to Computational Chemistry and Computational Molecular Biology.
Introduction
The Intramural Research Program of the National
Institutes of Health (NIH) is one of the world's leading biomedical
research establishments. The NIH consists of 24 separate organizational
units housed in over 70 buildings, and has over 4,000 doctoral level
scientists involved in more than 2,000 research projects. The
computing environment serving this sizable community is a
hybrid of centralized and distributed hardware and software resources.
In this type of scientific research environment, minimal time is
available for developing computer skills. Thus, among the biggest
barriers to the use of scientific software is frequently the lack of
appropriate computer training, and the difficulty in locating
the appropriate application to address a particular problem.
Additionally, much of the leading-edge academic scientific software is
underutilized because such programs are typically characterized by poor
user interfaces (e.g. command line or script driven) and little or no
documentation, which results in a steep learning curve for the average
user. However, coupling such programs to Web/Mosaic technology via a
FORM interface, offers the potential to greatly improve accessibility
of centrally maintained computer resources to users of all
backgrounds.
Our ultimate goal is to develop a package of useful utilities which
would specifically address the need of the NIH intramural scientific
community and secondarily the scientific research community at large.
Described below are the operational and prototypical Web/Mosaic
resources which we have developed in order to provide this audience
with simple, fast, user-friendly, access to a wide variety of scientific
computer applications.
Methodology
We have used NCSA's httpd server (v 1.3) and custom built CGI programs, written in
C, to provide the link between the server and the processing program.
The programs we have interfaced to the Web have been very
varied...ranging from commercial software to public domain code written
in C or Fortran, as well as in-house developed C programs and Perl
scripts.
Approach
In developing a package of useful Web/Mosaic-based utilities our
initial approach has been to select specific computational tasks which
by their nature:
- require the input of a "relatively" small amount of text-based data
(data that can be directly entered or pasted into a Mosaic text box)
- accept defined values for operational variables (via
Mosaic buttons or input boxes)
- return as output either text, an image, or binary data in a form which can be viewed/manipulated by a
user-specified local "viewer"
- are computational tasks which require "relatively" little CPU
time on a central server. We arbitrarily selected 5 minutes as the upper bound
for an acceptable CPU time limit (Clearly the power of the server dictates
the acceptability of a task).
We were also heavily prejudiced towards tasks currently addressed by
software which was able, or could be easily modified, to run in
"batch mode" and thus capable of being spawned by a CGI script/program
as a non-interactive process.
Resource Development
In developing tools for the Web/Mosaic environment we have concentrated
on the related areas of Computational Chemistry and Computational
Molecular Biology. While our primary target audience is the NIH
intramural scientific community our goal is to make these resources as
widely available as possible within
the limits imposed by licensing agreements and institutional policy.
Specifically the applications we have developed are aimed at
providing:
-
Access to molecular structure data for DNA, proteins
and small organic molecules.
- Analysis of primary protein sequence data.
- Analysis of primary DNA sequence data.
Included among these resources are tools which range from those
closely related to more traditional database searching problems
(involving little computation but large data storage capacity) to
those which are purely computational.
Access to these resources is normally found under the NIH
Molecular Modeling Home Page
and the NIH
Molecular Biology Resource Page on the
NIH WWW Server.
[Note:Links to all applications may not work from all sites...
access is restricted due to licensing
agreements and/or institutional policy.]
Discussion
While our efforts in this area are still ongoing we have found that
Mosaic clients and Web-Server-linked application programs provide a
very powerful and user-friendly computing environment. The advantages
of using Web/Mosaic technology to provide access to central computing
facilities include:
- Traditional Client-Server functionality
- A client-side interface which makes use of the functionality of the
client computer and has access to local resources (printers and storage).
- Access to remote high performance computing resources.
- Central maintenance of server-side resources (hardware,
software and data).
- Added Benefits
- Universality of Interface (one client can interface to many different
types of application).
- Development time of interface is "minimal".
- Easy to interface to many existing programs.
- Machine architecture independence.
- Interface is easy to modify and distribute since the application interface is
defined on the server not the client.
On the development side we have found that in many ways the selection
process for the incorporation of application programs into the
Web/Mosaic environment runs contrary to normal doctrine. The more
primitive a program's user-interface, the easier it is to incorporate
the program into this type of environment. Programs which use cryptic
command-line arguments, or command files, are more easily incorporated
than those which prompt for the user-selection of run time parameters.
Programs which take input from "standard in" and stream their output to
"standard out" also simplify the procedure.
Future Outlook
Based on the success we have had in implementing a number of
prototypical examples we are working to develop a complete package of utilities
which address a broad spectrum of Computational Chemistry and Molecular
Biology tasks. Our goals include integrating this technology into a multi-computer
environment in which task submitted via Web/Mosaic are passed-off to the
most appropriate platform.
At the present time Web/Mosaic does not yet fulfill the role of a true
"universal interface", since there are many situations in which
its limitations preclude its use. However, in its present implementation
it can solve many problems, and ongoing development offers even greater
functionality for the future.
Biographies:
Peter C. FitzGerald
- Academic Training:
- B.A.(mod) Biochemistry, Trinity College, Dublin, Ireland - 1978
- Ph. D. Biological Chemistry, University of Cincinnati, Medical College - 1983
- Present Position:
- Chief of the Computational Molecular Biology Section (CMBS).
- Division of Computer Research and Technology (DCRT)
- National Institutes of Health (NIH)
- E-mail: Peter_FitzGerald@nih.gov
Robert A. Pearlstein
- Academic Training:
- B.A. Chemistry, Case Western Reserve University - 1976
- M.S. Macromolecular Science, Case Western Reserve University - 1980
- Ph. D. Macromolecular Science, Case Western Reserve University - 1983
- Present Position:
- Computational Chemist
- Computational Molecular Biology Section (CMBS).
- Division of Computer Research and Technology (DCRT)
- National Institutes of Health (NIH)
- E-mail: Robert_Pearlstein@nih.gov
Corresponding Author:
Peter FitzGerald
Bldg. 12A, Room 2008
9000 Rockville Pike
Bethesda, Maryland 20892-5620, USA