Chemistry and the Web: Hyperactive Molecules.
Henry S. Rzepa* and Christopher Leach
Department of Chemistry, Imperial College of Science Technology and
Medicine, London, SW7 2AY, U.K.
Abstract
The concepts of Hyperactive molecules and chemical structure markup language
are introduced in the context of a discussion of the solvation of 1,3-cyclohexane diol.
The implications as a model for virtual reality and
for the scientific publishing of chemistry and its teaching are discussed.
Introduction
Chemistry is in many ways an ideal subject for the application of hypermedia
concepts.[1] More than ten million molecules
have been well documented in the scientific literature, providing a
particularly fertile database for isolating three dimensional structural themes
and relationships and associating these with the large diversity of measurable
molecular properties. Traditionally, chemists have been forced to use the
printed medium almost exclusively for their molecular communication, and have
hence been forced to adopt remarkably creative but nevertheless contrived
formalisms for representing their subject on paper.
The last 15 years or so have seen a sea change in the way the tangled mass of
molecular information is searched and delivered to users, with the gradual
introduction of computer networks and on-line databases enabling text-based
bibliographic searches to be carried out on the "worktop". Graphical object
based searches (in the jargon "sub-structure searches") are a more recent
innovation, a recognition that most chemists think in terms of icons rather
than text. Molecules after all are a collection of simple objects such as atoms
whose relationship to each other (i.e. bonds) defines molecules, and
where collections of these molecules and their three dimensional relationships
often defines the macroscopic properties and activity of chemicals. Chemistry
and molecular biology really are "virtual reality" subjects par excellence, and
it turns out that much of the infra-structure available in the World-Wide-Web
system is remarkably suited for delivering molecular information in a more
innovative and instantly productive form than the traditional mechanisms.
In this presentation, we will first outline some additional chemically oriented
components that we have added to the infra-structure, illustrate these with a
specific example, and then discuss the implications of this new medium for both
chemical research and teaching.
The existing seven primary MIME (Multipurpose Internet Mail Extension) types
are well suited for the delivery of text, a variety of two dimensional images
and multi-media video and sound. However, there is little recognition in these
existing definitions of what we might describe as "virtual reality" MIME types,
enabling the specification of three dimensional objects and their
relationships. As it happens, such definitions have been around for some time
in chemistry. For example, the protein data bank or "pdb" definition[2] was specifically created in the early 1970s
to enable virtual display and navigation around large molecules such as
proteins, carbohydrates and oligonucleotides (DNA). We have identified a number
of precisely specified molecular definitions and proposed[3] that these be collectively identified
via a new primary MIME type to be known as chemical. This would
enable molecular data to be delivered chemically intact via the Web
metaphor. Next, we have identified a number of freely distributable existing
graphical packages, or "helpers" that will recognise and process this data and
produce on-screen navigable displays of what we have termed "hyperactive
molecules". These include programs such as [5]
[7] Many more commercial programs are available
which are also suitable.
Chemical Structure Markup Language (CSML).
The concept that text in a HTML document can have attributes defined such as
size, color, font and style specifications, and orientation within a two
dimensional space is of course very familiar. It struck us that atoms, bonds
and other molecular properties can be assigned similar visual attributes.[8] Just
as the HTML specification is used to define the markup of text and simple 2D
images in Web browsers, so molecular viewers can be thought of as formatters of
chemical structural information. Currently, existing programs such as Rasmol do
not implement any transport protocol such as HTTP, and we have therefore used
the following combination to achieve markup of chemical structures. A hyperlink
is inserted into text or 2D images in a HTML document, with a URL corresponding
to a file containing a set of CSML commands.[8] These indicate how specified atoms
or collections of atoms within a molecule will be rendered on the screen
display.[9] This file is allocated a MIME
type chemical/x-csml, and associated with a local helper script which
reads the CSML instructions and passes them to e.g. a running Rasmol
process. Local chemically cognisant computations based on the marked-up
molecules can thus be enabled, involving perhaps the calculation of interatomic
distances, molecular weights of specified residues, isotope patterns, or even
more elaborate evaluations of molecular wavefunctions, energies and other
derived properties.
Currently, the function of the Web browser and the molecular visualiser are
separated, with CSML and its helper script serving as the communication channel
between them. In the future, we expect that compound document architectures
such as OpenDoc will enable molecular functionality and CSML markup to be
integrated into the operation of the Web browser itself.
The Solvation of cis Cyclohexane-1,3-diol: An application of Hyperactive Molecules.
The following two paragraphs are couched in the two dimensional based
description of a chemical problem, typical of how the science is presented in
most existing paper bound journals or books.
Although small, cis cyclohexane-1,3-diol has several quite subtle
molecular properties, which present an interesting communicational challenge.
This molecule can exist in two rapidly interconverting molecular shapes or
"conformations", known as di-equatorial and di-axial. The precise concentration
of each form is significantly influenced by the solvent used to dissolve the
molecule.[10]
The preference of the molecule for either of these conformations can be used as
a specific indicator of the molecular energetics and the influence of the
surrounding environment on those energetics. Another feature which needs to be
discussed, which as it happens is of paramount importance in understanding the
shapes and activities of many enzymes, is a phenomenon known as hydrogen
bonding, defined in this case as an interaction between the small molecular
subunit O-H...O. Only the di-axial conformation of cyclohexane-1,3-diol can
indulge in any intramolecular hydrogen bonding, whilst the shape of the
di-equatorial isomer precludes this.
Such a description of chemistry in action will of course be largely familiar to
trained molecular scientists, who are used to interpreting this symbolic
language in their version of three dimensional "virtual chemical" reality. Even
so, most chemists would traditionally at this stage rush off to a set of
plastic "molecular models" and start constructing cyclohexane-1,3-diol in order
that they fully appreciate the above discussion. The following stages are used
to convert this into a description couched in terms of Web based hyperactive
molecules;
- The three dimensional coordinates of cyclohexane-1,3-diol are generated using
a 3D modelling program, in our case called MOPAC, and saved as say
diol_ax.pdb and
diol_eq.pdb.
The suffix pdb is associated
with one of our standard proposed chemical MIME definitions
chemical/x-pdb on the HTTP server.
- Hypertext references to these two documents are inserted within the context
of the chemical discussion into a HTML document.
- The Web browser has a molecular helper associated with the
chemical/x-pdb MIME type, in our case Rasmol. Activating the hyperlink
will result in the coordinates being retrieved and visualised on the screen in
a separate window, where the molecule can be navigated under user control. The
addition of active LCD based glasses or screens linked to the video card
enables the molecule to be visualised in true 3D. Such systems have been in
routine use in chemistry and molecular biology for more than six years now,
although their scientific use outside chemistry appears limited.
- Discussion of say the hydrogen bonding would require the O-H...O component of
the molecules being suitably highlighted with chemical markup commands. This is
done by inserting a hyperlink to a file allocated a MIME type of
chemical/x-csml containing CSML instructions. When activated, these
instructions are retrieved and executed using a previously acquired script
called csml, which passes the markup instructions to the running Rasmol
process.
We are now ready to move on to the next stage of the research project, which is
to study the energy of the di-axial conformation of cyclohexane-1,3-diol as a
function of the geometrical variables describing the precise orientation of the
two O-H groups in the molecule.[11] This is
achieved using a quantum mechanically based program such as MOPAC, and the
derived energy can be represented as an isometric projection of the two
geometry dimensions;
The next requirement was to isolate the effect of hydrogen bonding on the
features of this energy map. We have achieved this in a novel way by applying
the COSMO continuum solvation model to the MOPAC energy Hamiltonian to simulate
the effect of a solvent, defined in terms of the dielectric constant
[[epsilon]] of the solvent varying over the values 1-80. The idea here was that
those orientations of cyclohexane-1,3-diol in which an intramolecular O-H...O
hydrogen bond is present would have a lesser solvation energy than those
conformations where such a feature was absent. The effect was anticipated to
grow as the solvent dielectric increased from 1 (a gas phase non-solvating
environment) to 80 (corresponding to the highly solvating water environment).
The issue then is how to present this complex set of information in a manner
that is easily comprehended by a reader. The information comprises a calculated
solvation energy as a function of two molecular geometrical variables and a
solvent dielectric. An additional 3N-8 geometrical parameters (N = number of
atoms present in the molecule) may need to be presented visually at interesting
points in the energy map. There is a great deal of information here, and any
attempts to present it on the printed page lead to complexity and loss of
clarity to the reader.
How can the multimedia Web and hyperactive molecule metaphors help clarify and
communicate the science? Our solution was as follows.
- The effect of solvation on the two dimensional energy surface was presented
as a difference map. This means subtracting the energy surface calculated at
[[epsilon]]=1 from that calculated at [[epsilon]] > 1. The only features left are
those due entirely to solvation effects, thus removing unnecessary detail from
the images.
- The effect of systematically varying [[epsilon]] is presented as an MPEG or
Quicktime(TM) animation, running over 16 frames, each frame comprising a
difference map evaluated for a different value of [[epsilon]]. The effect of
changing [[epsilon]] can thus be rapidly spotted by the eye as the region of
the energy map showing the most rapid change with time.
- The reader may now wish to investigate how the geometries of the molecules
differ in these various regions. A single difference map (between [[epsilon]]=1
and [[epsilon]]=80) is chosen for ISMapping, hyperlinked to individual sets of
molecular coordinates, and via chemical MIME definitions to the Rasmol
helper program.
- The OH...O region in the RasMol display can be highlighted using CSML
commands activated from a suitable hyperlink either in the ISMap or the
associated text.
- The entire collection of information can be presented in this manner as an
on-line scientific article. It is anticipated that several chemically oriented
journals will commence presenting articles in this fashion during 1995.
Discussion.
It is easy to identify many limitations in the currently paper based
journals and textbooks in which chemistry as a subject is presented. Chemical
structures have had to be presented in static two dimensional representations,
or as highly symbolic text based descriptions. Much information such as
molecular coordinates are regularly discarded by authors of such papers because
there is no effective and cheap method available for its publication. There is
no possibility for the reader to interact with the data; only the author's
point of view can be visualised. The infra-structure described above has the
potential for changing much of this, including the face of scientific journals
and the way in which chemistry as a subject is taught and popularised. This in
turn raises important issues.
- Chemistry as a subject has an archival history going back 200 or more years.
Will the current mechanisms of the Web, together with our proposal of
hyperactive molecular coordinates and other supplemental information, enable
archival mechanism of this order of lifetime to evolve? More succinctly, will
URLs and URNs or their successors survive for more than a few years?
- A proliferation of chemical data has the potential of leading to chemical
chaos unless it is suitably indexed, and its quality checked. We believe that
the introduction of chemical MIME types will allow such automated indexing and
quality assurance by a variety of independent agents. The overall effect should
be to improve the quality of published science, at a time when serious concerns
at the expected rate of expansion of chemical journals have been expressed, and
in an era where the pressures of "publish-or-perish" might have had an opposite
effect on quality.
- Much of the existing career structure and funding of scientists is based on
their published output, and "paper-counting". To become recognised as a valid
medium for recognising scientific achievements, the rich tapestry of a
hyperlinked World-Wide-Web based body of work has to have criteria by which it
can be assessed. The user feedback and usage monitoring mechanisms inherent in
the Web may become important in this regard, and peer review of published work
will be ever more critical.
- The ways in which we teach and assess chemistry may also be subject to
change. [12] may
become more common, including much greater contributions from chemical industry
and hyperlinks to other subjects.
- The cost structure of molecular data may change dramatically, with the
traditionally expensive areas of paper bound and highly indexed and correlated
information becoming much cheaper.
We believe that the implications for how chemistry is taught and communicated
are profound indeed, and that furthermore, this subject forms an interesting test-bed
for future developments in complex indexing, searching and virtual reality.
Acknowledgements: We thank Benjamin Whitaker (Leeds), Mark Winter
(Sheffield), Peter Murray-Rust, Roger Sayle and Martin Hargreaves (Glaxo), and
Glaxo Research and Development (Greenford) for a studentship and funding.
Footnotes
1 For a review of the applications to
chemistry, see H. S. Rzepa, "Chemistry and the World-Wide-Web", in "Chemistry
and the Internet", Ed. S. Bachrach, ACS Publications, 1995, to be published.
2 The Protein Data Bank, Chemistry Department, Brookhaven National
Laboratory, Upton, NY.
3 H. S. Rzepa and P. Murray-Rust, Internet Draft: Chemical Mime
Type, May-November 1994. See
ftp://cnri.reston.va.us/internet-drafts/draft-rzepa-chemical-mime-type-00.txt
4 R. Sayle, Rasmol: A Molecular Visualisation System.
5 XMol, Minnesota SuperComputer Center, Minneapolis, Mn, USA., see
http://www.arc.umn.edu/GVL/Software/xmol/XMol.html
6 O. Casher, H. S. Rzepa and S. Green, J. Mol. Graphics,
1994, in press.
7 D.C. Richardson and J.S. Richardson, Protein Science,
1992, 1, 3; D. C. Richardson and J. S. Richardson, Trends in
Biochem. Sci. 1994, 19, 135-8
8 O. Casher, G. Chandramohan, M. Hargreaves, P. Murray-Rust, H. S.
Rzepa and B. J. Whitaker, J. Chem. Soc., Perkin Trans. 2, submitted for
publication.
9 This is distinct from the proposed chemistry SGML dtd (T. Tallant,
personal communication, Oak Ridge National Laboratory), which is to be used for
two dimensional markup of chemistry in a manner suitable for eventual
printing.
10 R. J. Abraham, E. J. Chambers and W. A. Thomas, J. Chem. Soc.,
Perkin Trans. 2, 1993, 1061.
11 O. Casher, C. Leach and H. S. Rzepa, paper submitted to the First
Electronic computational Chemistry Conference;
http://www.ch.ic.ac.uk/eccc.html. For details of the conference, see S.
Bachrach, http://hackberry.chem.niu.edu:70/0/webpage.html
12 For example, Global Instructional Chemistry, Ed. H. S. Rzepa;
http://www.ch.ic.ac.uk/GIC/
Author Biographies
Dr Henry S. Rzepa is a Reader in Organic Chemistry at the Imperial College
of Science, Technology and Medicine, London, UK. The author of more than
150 papers in the areas of computational and structural chemistry, and organiser of
the WWW94 Chemistry Workshop at Geneva, he has long advocated
the use of computer networks for chemical information delivery. He has progressed
from using 110 baud terminals
in 1971 to an ATM based Superjanet connection for video and chemical conferencing
to his colleague Dr Benjamin Whitaker at Leeds University. Starting with Gopher
servers some 18 months ago, he now uses World-Wide-Web systems for disseminating his
research papers, and has recently introduced teaching materials and books in this format.
Christopher Leach is currently working on a Ph.D. program with Dr Rzepa at
Imperial College. The work presented here was completed as part of an
undergraduate research project, and indeed we think represents the first such chemistry
project to be made available for scrutiny by the External degree examiners in the form
of a URL reference.
E-mail: rzepa@ic.ac.uk