As a byproduct of this work, a method for displaying subscripts (e.g. M ,
)
and superscripts
(e.g. T
, CO)
using the minimum size and number of images has been developed.
We plan to package and release this set of images at the time of this conference.
A page organizing access to these services is available at:
http://www-astro.phast.umass.edu/data.html
Some of these problems have been addressed in an ad hoc manner for at least the last decade and a half by the growth of a preprint culture, both to make research results more immediately available and to direct circulation to those most interested in the work. This does not address the issue of the availability of the data sets, but does make the approximately correct subset of the literature available to those working in a field in a more timely manner than does journal publication.
For the past decade or longer, the intermediary of the secretary/typist has been removed from the process of manuscript preparation as word-processing and formatting software has become almost universally available. The draftsman has been replaced by a variety of more and more sophisticated computer drawing tools for constructing figures. The widespread availability of the Internet has made it possible to transmit these electronic files easily and rapidly around the world. The process of writing a paper with a collaborator across the country, or in another country, has been greatly facilitated by these advances.
Most of the professional journals in astronomy are now accepting submissions in LaTeX, but each has developed its own set of macros which is designed to reproduce the traditional "on paper" appearance of that journal. Astronomers using electronic preprint servers must collect a set of LaTeX macros for each journal of interest and update their collection as new versions are issued. This very shortly becomes a burdensome task. In response, many papers were sent to the preprint servers in Postscript files, but these files, particularly when figures are included, can be very large. On the other hand, most papers that I have downloaded, LaTeX or Postscript, do not contain the figures. They must be acquired separately. In neither case is much data readily accessible to the reader. At the same time, the journals consider the acceptance of the LaTeX files as a big step in their transition to "electronic publication" but the time between acceptance of a paper and the appearance of that paper in the journal is still six months for The Astrophysical Journal and four months for the Astronomical Journal. The median "age" of a paper referenced in these journals is between four and five years, with many of the referenced papers being still in press. It is clear that advances in manuscript preparation have not been matched by increases in distribution speed - the first and most easily achievable gain which should result from some standardization of the format for manuscript submission.
For the past two years astronomers working in the field of star formation have been kept abreast of the developments in their field by the Star Formation Newsletter, edited by Bo Reipurth of the European Southern Observatory. This newsletter is distributed monthly by email to a community of over 800 astronomers worldwide. A standard LaTeX form is used in which astronomers enter the abstracts of their recently accepted papers. These are then emailed to a central collection point and assembled into the monthly newsletter. The recipients can then simply strip off the email header and process the LaTeX file to read the newsletter. Notices of upcoming meetings and new books are also included. The newsletter helps to eliminate the previous hit and miss circulation of preprints, especially to younger workers. Some of these astronomers have also submitted their preprints to an electronic preprint distribution service to make actual retrieval of the preprint possible without the distribution of a large amount of paper through the mail delivery services. However, this system suffers from the above stated disadvantages, as well as the fact that many authors have their own LaTeX or TeX macro files which they neglect to include.
There are enormous advantages to having papers available electronically with their associated data sets. They can be made available almost instantly upon the completion of the refereeing process. They are easily searchable, and, if the tabular material is in a HTML document (if small) or other easily interpretable form, the data is instantly accessible. Larger data sets can be made available in standard, generally accepted formats. (Tabular data presented in Postscript format is not easily accessible in numerical form.) More sophisticated graphics are easily and cheaply included. Links can be built into the paper to references and abstracts that are on-line as well as to other non-standard materials not usually available to readers of the papers.
In order to address some of the problems outlined above and to explore the additional possibilities, we have begun a preprint and data distribution service for the papers and data of the Star Formation group in the Five College Astronomy Department from our Word Wide Web server.
Since the astronomy journals are now accepting manuscripts in LaTeX,
the obvious method for conversion of these manuscripts into HTML is
by use of Nikos Drakos'
LaTeX2HTML.
This perl script will translate a LaTeX manuscript into a set of linked
HTML pages, translating the mathematical expressions into bitmaps in
older versions of the program and to transparent GIFs in the most
recent version. These images are then placed in the text as in-line images.
This program creates a new in-line image each time a math mode
entry is encountered, no matter how many times the identical image
has been created before in the same document. This results in the
creation of tens of identical images for commonly used symbols and
expressions of units, such as cm.
Since all images are installed with ALIGN=BOTTOM
, e.g.
cm
or s, the resulting text can be difficult to read.
It is also true that math mode expressions that include subscripts
appear to float above the rest of the text on the line, (e.g. , balanced on the
subscript). Of course, because these expressions are inserted into the text
as images, the font will rarely match that of the rest of the
text.
In response to these practical and perceptual problems, I developed an image library of the most commonly used subscripts and superscripts in the astronomical manuscripts which I have converted to HTML. This library was designed with several considerations in mind.
The last two items take into consideration the default image cache size of Mosaic, 2048 kilobytes and the fact that the least recently accessed image will be the first discarded. The typical size of a 2 - 5 color image used as a figure in these preprints is 4 - 8 kilobytes. The typical size of an image to be used as a sub- or superscript is 0.06 kilobytes.
These three considerations thus work in concert to provide a clean-looking, easily readable paper, taking minimal time to load each page, thus allowing browsing as well as reading in depth without forcing the scientific community to adopt another, albeit temporary, shorthand for complex mathematical expressions or simply delaying the onset of electronic publication of journal papers.
The library I built included the Greek letter set previously made available, the numerals, the entire alphabet, in both capitals and lower case, and some special symbols and letter combinations of common use in astronomy. To implement this set of images, perl scripts were written to preprocess the manuscripts, searching for the most commonly used symbols and inserting the in-line images as required. LaTeX2HTML was then used to convert this manuscript to HTML pages and a post-processing perl script was used to clean up the few things that may have been affected by LaTeX2HTML. The manuscript was then ready for insertion of the images, tables and references.
As a small sidelight, I note that the display of subscripts is a
simple use of the HTML tag for displaying an in-lined image. The unexpected
subscript effect occurs because, in this case, the image is smaller than the
text height. Thus when you specify <IMG ALIGN=MIDDLE ALT="_sun"
src="www-astro.phast.umass.edu/kicons/smsun.GIF">
, the middle of the small image
is aligned with the base line of the text. Thus a subscript, M , is
displayed. This method is also used to place the Grrek letters having trailers,
e.g. ,
, on the
base line of the text.
I also urge users of this library to always use the
ALT = "xxxx"
option so that people accessing your files with
LYNX may more easily
read your pages. This is easily accomplished when the substitutions are
made in the manuscript by the perl script.
Although the new version of LaTeX2HTML allows the option of conversion of postscript figures, at a user-specified scale, into transparent GIFs in-lined into the text, I prefer to exercise more control over the scaling and appearance of the figures because the figures were initially designed for display on paper. I display the file using ghostview and then scale the image to the desired size, then capturing the image using xv. I have chosen to place the scales, axis labels and other labeling text in a deep blue to separate it from the text. If there is only one line in the figure, it is also left in the same color. However, although the figure on the printed page may be very small, and we are not tightly constrained by space considerations on the HTML page, the spatial resolution available to us is still less than that in the paper representation. Therefore I have chosen to make use of the fact that color is an option available at no extra cost to the server. When different line styles or point styles are called for, different colors are substituted by making use of xpaint. When the desired changes are completed, the file is converted into a transparent GIF and placed in the HTML document at the appropriate location. There is also much more flexibility in figure placement in HTML documents as one is not forced to manuever within fixed page sizes. For this reason, figures may be more reasonably located with respect to their textual references.
There are certainly some very complex figures for which this technique will not be appropriate. In that case a thumbnail or postage stamp image can be made, either by downscaling the entire image or by cropping a recognizable section out of the image for insertion into the text. This smaller image can then act as a link to the Postscript (or GIF) high resolution figure.
Tabular material presents different problems. Small tables can be
placed within an HTML document by using the <PRE>
tag. Large tables are only practical, at the moment, as Postscript
files linked to the text at the appropriate point (but see the
Catalogs section below).
The use of HTML+ should allow easy introduction of tabular material into the text.
Very large tables can be associated with papers in other formats more
appropriate to the data.
With original manuscripts in LaTeX using the AASTeX macros, the conversion of the listing of references is not easily handled by LaTeX2HTML. However it is a trivial matter to use global substitutions to replace the abbreviations used for the macros with those appropriate for the references section of the paper. The large value added that is available for astronomers is the possibility of linking to the ADS Abstract server. By adding these links, the reader has immediate access to the abstracts of the papers referenced, allowing him to judge the relevance of this paper to the information sought. This can be especially important to people located at institutions where they may have easy access to the Internet but no library down the hall.
The Herbig-Bell Catalog of Emission Line Stars has always been somewhat awkward to use because the data for a single star was spread over one line on two pages which were printed in landscape mode. There were many columns empty for all but the most frequently observed stars making it difficult to follow the correct line across the page. In making the electronic version, I have linked the catalog number of the object to its companion entries on the alternate pages so that the top of the window will act as a guide line across the columns. An asterisk in the notes column is linked to the note for that object. All references contained within the database of journal article abstracts used by the ADS Abstract Server were linked to these abstracts to provide more information on the contents of the reference.
Because of the difficulty of reading the table and the unlikelihood of the need for all of the information spread across the two pages, I also provided a forms-based access to the data. This form requests the catalog number of the object desired and then returns all of the most commonly desired data for the object as well as any other data that was requested in a nicely formatted page. This catalog has been successfully (and thankfully) used in the middle of the night from the tops of several mountains.
There were two problems posed by the references in this catalog. Among the references that predated the beginning of the compilation of the abstract database by NASA were approximately 25 references with dates ranging from 1894 to 1960, references which might be difficult to obtain for people located at smaller institutions. The librarian at Kitt Peak National Observatory kindly copied either the title page and abstract when available, or just the first page of the paper and forwarded these copies to me. From these copies, either the title, author, citation and abstract were entered, or the first few paragraphs of the paper were used in place of the abstract. In a few cases, the papers in question were notes so short that the entire paper was entered.
Due to the extremely active state of this field of research, there were a very large number of papers (25 - 30% of the references) too recent to be found in the abstract database as yet. To make the abstracts for these papers available, we placed the entire archive of the Star Formation Newsletters on line, using LaTeX2HTML and the perl script,s and linked the catalog references to these abstracts. This procedure was almost entirely successful, leaving only those papers which are still in preparation and the papers published between 1960 and 1975, which I felt were more easily accessible than were the pre-1960 papers. As these papers which were still in preparation appear in the Star Formation Newsletters the links will be made to the catalog references.
Susan Kleinmann made the data set available to me, and I used a utility in STSDAS to convert the ASCII files into standard IRAF image files for plotting purposes. Plots were made for all of the spectra to accompany the distribution of the data files. The spectra were written as standard FITS (Flexible Image Transport) files and made available as a a compressed tar file. The plot files were also made available as a compressed tar file. A hypertext README file describing how to display the plots was given as well as a hypertext README file, written by Susan Kleinmann, on the spectra themselves.
The spectra of these stars will be made available, with the associated paper (Allen & Strom, in press), as FITS files of images in the IRAF multispec format. Plots of a selected set of spectra are also be available as well as tables of the stellar identifications, positions, magnitudes and colors.
Herbig, G.H. & Bell, K. Robbins 1988, Lick Observatory Bulletin No. 1111.
Herbig, G.H. & Rao, N.K. 1972, Ap. J., 174, 401.
Kleinmann, S. G. & Hall, D.N.B. 1986, Ap. J. Suppl., 62,1986.
Reipurth,B. 1994, A General Catalogue of Herbig-Haro Objects, electronically published via anon. ftp to ftp.hq.eso.org, directory /pub/Catalogs/Herbig-Haro.
Over the past year she has devoted a large fraction of her time to exploring the use of the World Wide Web and, in particular, the Mosaic browser, in the distribution of large data sets, electronic publication and interactive educational software.
In the past she has had several one or two person photographic exhibitions as well as exhibiting in many juried competitions. She is currently working on a htpertext book to be distributed over the World Wide Web.
kstrom@hanksville.phast.umass.edu