Electronic Publication and Data Distribution
for the Star Formation Group
in the Five College Astronomy Department

by

Karen M. Strom
Research Professor of Astronomy

Abstract

The Star Formation Group at FCAD has begun to make use of the World Wide Web to:
  1. explore the advantages of hypermedia presentations for the distribution of preprints and observatory publications as a first step toward expanding the definition of electronic publication;
  2. create hyperlinked catalogs of astronomical data which enable not only the recovery of tabular data but instantaneous links to abstracts of the associated reference material;
  3. create on-line catalogs of spectroscopic and image data.
We here illustrate the first results of this effort. The Herbig-Bell Catalog of Emission Line Stars (Herbig & Bell 1988) and the Catalog of Herbig-Haro Objects (Reipurth 1994) have both been translated into HTML and linked to the ADS abstract server, to each other, to HTML versions of the Star Formation Newsletter (for abstracts of newer papers) and to papers that are available on line. Abstracts of pre-1960 references, many not easily available, have also been placed on line and linked to these catalogs.

As a byproduct of this work, a method for displaying subscripts (e.g. M_sun , Delta) and superscripts
(e.g. T^4 , C^18O) using the minimum size and number of images has been developed. We plan to package and release this set of images at the time of this conference.

A page organizing access to these services is available at:
http://www-astro.phast.umass.edu/data.html

Introduction

The explosion in scholarly literature has recently far outstripped the traditional, paper based methods of scanning the literature and locating that fraction of the published work relevant to the research of an individual or group. Many researchers no longer subscribe to even the most basic journals in their fields because they cannot hope to store the massive amount of paper in the ever smaller offices that are supplied by their institutions. Libraries are confronted by ever increasing subscription rates, partly to compensate for the decline in individual subscriptions, and are therefore being forced to choose between journals and books. Departmental libraries are also becoming a thing of the past in many institutions where their collections have been turned over to central libraries which have moved them ever farther from those who require them on a daily basis. As an adjunct to this problem, the rapidly growing databases, combined with the costs of publication, have meant that a smaller and smaller fraction of the data associated with a given paper is made publically available. An associated phenomenon is the disappearance of catalogs of similar objects from the archival literature. A few years ago, these could be found among the Observatory publications (see the Catalog of Emission Line Stars ,Herbig & Bell 1988). Today catalogs are issued in electronic form only, (see A General Catalogue of Herbig-Haro Objects Reipurth 1994) usually via anonymous FTP, perhaps in LaTeX format.

Some of these problems have been addressed in an ad hoc manner for at least the last decade and a half by the growth of a preprint culture, both to make research results more immediately available and to direct circulation to those most interested in the work. This does not address the issue of the availability of the data sets, but does make the approximately correct subset of the literature available to those working in a field in a more timely manner than does journal publication.

For the past decade or longer, the intermediary of the secretary/typist has been removed from the process of manuscript preparation as word-processing and formatting software has become almost universally available. The draftsman has been replaced by a variety of more and more sophisticated computer drawing tools for constructing figures. The widespread availability of the Internet has made it possible to transmit these electronic files easily and rapidly around the world. The process of writing a paper with a collaborator across the country, or in another country, has been greatly facilitated by these advances.

Most of the professional journals in astronomy are now accepting submissions in LaTeX, but each has developed its own set of macros which is designed to reproduce the traditional "on paper" appearance of that journal. Astronomers using electronic preprint servers must collect a set of LaTeX macros for each journal of interest and update their collection as new versions are issued. This very shortly becomes a burdensome task. In response, many papers were sent to the preprint servers in Postscript files, but these files, particularly when figures are included, can be very large. On the other hand, most papers that I have downloaded, LaTeX or Postscript, do not contain the figures. They must be acquired separately. In neither case is much data readily accessible to the reader. At the same time, the journals consider the acceptance of the LaTeX files as a big step in their transition to "electronic publication" but the time between acceptance of a paper and the appearance of that paper in the journal is still six months for The Astrophysical Journal and four months for the Astronomical Journal. The median "age" of a paper referenced in these journals is between four and five years, with many of the referenced papers being still in press. It is clear that advances in manuscript preparation have not been matched by increases in distribution speed - the first and most easily achievable gain which should result from some standardization of the format for manuscript submission.

For the past two years astronomers working in the field of star formation have been kept abreast of the developments in their field by the Star Formation Newsletter, edited by Bo Reipurth of the European Southern Observatory. This newsletter is distributed monthly by email to a community of over 800 astronomers worldwide. A standard LaTeX form is used in which astronomers enter the abstracts of their recently accepted papers. These are then emailed to a central collection point and assembled into the monthly newsletter. The recipients can then simply strip off the email header and process the LaTeX file to read the newsletter. Notices of upcoming meetings and new books are also included. The newsletter helps to eliminate the previous hit and miss circulation of preprints, especially to younger workers. Some of these astronomers have also submitted their preprints to an electronic preprint distribution service to make actual retrieval of the preprint possible without the distribution of a large amount of paper through the mail delivery services. However, this system suffers from the above stated disadvantages, as well as the fact that many authors have their own LaTeX or TeX macro files which they neglect to include.

There are enormous advantages to having papers available electronically with their associated data sets. They can be made available almost instantly upon the completion of the refereeing process. They are easily searchable, and, if the tabular material is in a HTML document (if small) or other easily interpretable form, the data is instantly accessible. Larger data sets can be made available in standard, generally accepted formats. (Tabular data presented in Postscript format is not easily accessible in numerical form.) More sophisticated graphics are easily and cheaply included. Links can be built into the paper to references and abstracts that are on-line as well as to other non-standard materials not usually available to readers of the papers.

In order to address some of the problems outlined above and to explore the additional possibilities, we have begun a preprint and data distribution service for the papers and data of the Star Formation group in the Five College Astronomy Department from our Word Wide Web server.

Preprints

In order to circumvent some of the problems discussed above, I undertook to make the preprints from our Star Formation group available in HTML over the World Wide Web. Since over 800 astronomers had embraced the Star Formation Newsletter, it would appear possible to convince them to obtain their preprints on line in an instantly readable format. This format would allow me to pursue alternate means for presentation of graphical material (for instance, the use of color instead of different line or point styles), possible inclusion of movies and the ability to link to other papers on line, ADS Abstract database, and to other material available on line but absent from the archival literature.

Since the astronomy journals are now accepting manuscripts in LaTeX, the obvious method for conversion of these manuscripts into HTML is by use of Nikos Drakos' LaTeX2HTML. This perl script will translate a LaTeX manuscript into a set of linked HTML pages, translating the mathematical expressions into bitmaps in older versions of the program and to transparent GIFs in the most recent version. These images are then placed in the text as in-line images. This program creates a new in-line image each time a math mode entry is encountered, no matter how many times the identical image has been created before in the same document. This results in the creation of tens of identical images for commonly used symbols and expressions of units, such as cm^2. Since all images are installed with ALIGN=BOTTOM, e.g. cm^2 or s^-1, the resulting text can be difficult to read. It is also true that math mode expressions that include subscripts appear to float above the rest of the text on the line, (e.g. Teff , balanced on the subscript). Of course, because these expressions are inserted into the text as images, the font will rarely match that of the rest of the text.

In response to these practical and perceptual problems, I developed an image library of the most commonly used subscripts and superscripts in the astronomical manuscripts which I have converted to HTML. This library was designed with several considerations in mind.

The first item on this list concerns the fact that, when the entire expression is captured as an image, with default font settings, as in LaTeX2HTML, the expression tends to look out of place in the rest of the text generated by Mosaic, no matter which font is selected. However, if only the sub- and superscripts are generated as images, this problem is greatly allieviated. Many times letters are used for the main symbols and are therefore simply part of the client generated text. The sub- and superscripts are always generated in another font and are much smaller so that few differences in the structure of the character are possible or noticeable.

The last two items take into consideration the default image cache size of Mosaic, 2048 kilobytes and the fact that the least recently accessed image will be the first discarded. The typical size of a 2 - 5 color image used as a figure in these preprints is 4 - 8 kilobytes. The typical size of an image to be used as a sub- or superscript is 0.06 kilobytes.

These three considerations thus work in concert to provide a clean-looking, easily readable paper, taking minimal time to load each page, thus allowing browsing as well as reading in depth without forcing the scientific community to adopt another, albeit temporary, shorthand for complex mathematical expressions or simply delaying the onset of electronic publication of journal papers.

The library I built included the Greek letter set previously made available, the numerals, the entire alphabet, in both capitals and lower case, and some special symbols and letter combinations of common use in astronomy. To implement this set of images, perl scripts were written to preprocess the manuscripts, searching for the most commonly used symbols and inserting the in-line images as required. LaTeX2HTML was then used to convert this manuscript to HTML pages and a post-processing perl script was used to clean up the few things that may have been affected by LaTeX2HTML. The manuscript was then ready for insertion of the images, tables and references.

As a small sidelight, I note that the display of subscripts is a simple use of the HTML tag for displaying an in-lined image. The unexpected subscript effect occurs because, in this case, the image is smaller than the text height. Thus when you specify <IMG ALIGN=MIDDLE ALT="_sun" src="www-astro.phast.umass.edu/kicons/smsun.GIF"> , the middle of the small image is aligned with the base line of the text. Thus a subscript, M_sun , is displayed. This method is also used to place the Grrek letters having trailers, e.g. beta , xi , on the base line of the text. I also urge users of this library to always use the ALT = "xxxx" option so that people accessing your files with LYNX may more easily read your pages. This is easily accomplished when the substitutions are made in the manuscript by the perl script.

Although the new version of LaTeX2HTML allows the option of conversion of postscript figures, at a user-specified scale, into transparent GIFs in-lined into the text, I prefer to exercise more control over the scaling and appearance of the figures because the figures were initially designed for display on paper. I display the file using ghostview and then scale the image to the desired size, then capturing the image using xv. I have chosen to place the scales, axis labels and other labeling text in a deep blue to separate it from the text. If there is only one line in the figure, it is also left in the same color. However, although the figure on the printed page may be very small, and we are not tightly constrained by space considerations on the HTML page, the spatial resolution available to us is still less than that in the paper representation. Therefore I have chosen to make use of the fact that color is an option available at no extra cost to the server. When different line styles or point styles are called for, different colors are substituted by making use of xpaint. When the desired changes are completed, the file is converted into a transparent GIF and placed in the HTML document at the appropriate location. There is also much more flexibility in figure placement in HTML documents as one is not forced to manuever within fixed page sizes. For this reason, figures may be more reasonably located with respect to their textual references.

There are certainly some very complex figures for which this technique will not be appropriate. In that case a thumbnail or postage stamp image can be made, either by downscaling the entire image or by cropping a recognizable section out of the image for insertion into the text. This smaller image can then act as a link to the Postscript (or GIF) high resolution figure.

Tabular material presents different problems. Small tables can be placed within an HTML document by using the <PRE> tag. Large tables are only practical, at the moment, as Postscript files linked to the text at the appropriate point (but see the Catalogs section below). The use of HTML+ should allow easy introduction of tabular material into the text. Very large tables can be associated with papers in other formats more appropriate to the data.

With original manuscripts in LaTeX using the AASTeX macros, the conversion of the listing of references is not easily handled by LaTeX2HTML. However it is a trivial matter to use global substitutions to replace the abbreviations used for the macros with those appropriate for the references section of the paper. The large value added that is available for astronomers is the possibility of linking to the ADS Abstract server. By adding these links, the reader has immediate access to the abstracts of the papers referenced, allowing him to judge the relevance of this paper to the information sought. This can be especially important to people located at institutions where they may have easy access to the Internet but no library down the hall.

Catalogs

In April of this year I received a request from Bo Reipurth of the European Southern Observatory to read over his preliminary Catalog of Herbig-Haro Objects and to make any corrections or additions necessary before he made it available via anonymous FTP from the ESO FTP server. Herbig-Haro objects are shock excited nebulosities associated with bipolar outflows from young stellar objects. These objects may be the only optically visible manifestation of star formation occuring within a dense molecular cloud since the outflow may have punched a hole through to the cloud exterior. Therefore a comprehensive catalog of these objects is a great aid in the study of the early stages of star formation. The imminent appearance of a new catalog of objects associated with the star formation process, solely in electronic form, motivated me to think of making an older, but still extremely useful, database, the Herbig-Bell Catalog of Emission Line Stars, available in electronic form as well. Instead of making these catalogs available solely for on-line browsing, I elected to create hypertext documents, linking them not only internally, but also to the online database of abstracts of astronomical papers.

The Herbig-Bell Catalog of Emission Line Stars

The Third Catalog of Emission Line Stars (Herbig & Bell 1988; HBC) was distributed solely as a Lick Observatory publication. It is a catalog of pre-main sequence stars which have had slit spectra taken to confirm their nature, as opposed to the much larger list of objects detected by objective prism or filter techniques. (The second edition (Herbig & Rao 1972) was published in The Astrophysical Journal.) Included in this catalog of approximately 750 objects is the complete coordinate information, magnitude, color, and variability in bands from the x-ray through the radio (less the IRAS data), spectral type and emission line information, both radial and rotational velocity information and the references for all of this information as well as the identification of the molecular cloud in which it is found. Of course, not all of this information was available for every star, but it is a considerable database of detailed information bound together in an easily portable reference. While a few copies of the catalog were distributed as 80 column card images on 9 track tapes, the catalog has basically been carried from observatory to observatory in briefcases for almost 8 years.

The Herbig-Bell Catalog of Emission Line Stars has always been somewhat awkward to use because the data for a single star was spread over one line on two pages which were printed in landscape mode. There were many columns empty for all but the most frequently observed stars making it difficult to follow the correct line across the page. In making the electronic version, I have linked the catalog number of the object to its companion entries on the alternate pages so that the top of the window will act as a guide line across the columns. An asterisk in the notes column is linked to the note for that object. All references contained within the database of journal article abstracts used by the ADS Abstract Server were linked to these abstracts to provide more information on the contents of the reference.

Because of the difficulty of reading the table and the unlikelihood of the need for all of the information spread across the two pages, I also provided a forms-based access to the data. This form requests the catalog number of the object desired and then returns all of the most commonly desired data for the object as well as any other data that was requested in a nicely formatted page. This catalog has been successfully (and thankfully) used in the middle of the night from the tops of several mountains.

The Reipurth Catalog of Herbig-Haro Objects

When Reipurth released his public version of the Catalog of Herbig-Haro Objects, I began the process of converting it into a hypertext document. The Catalog was formatted in LaTeX and consisted of a table of the basic data, extensive notes on each object, heavily referenced to the literature, published, preprint and still in preparation. The catalog contains approximately 250 objects and lists for each object the catalog number, any other designations, the best available positions, the source of the outflow, if known, the name of the star formation region in which it is found and the distance of the object. Each catalog number was linked to the notes for that object. In some instances the suspected source of the outflow was also listed in the HBC. In those cases, the source was linked to the entry in the HBC. The notes for each catalog object are heavily referenced. The majority of the references in this currently very active field are available in the ADS Abstract Server and have been linked to those abstracts.

There were two problems posed by the references in this catalog. Among the references that predated the beginning of the compilation of the abstract database by NASA were approximately 25 references with dates ranging from 1894 to 1960, references which might be difficult to obtain for people located at smaller institutions. The librarian at Kitt Peak National Observatory kindly copied either the title page and abstract when available, or just the first page of the paper and forwarded these copies to me. From these copies, either the title, author, citation and abstract were entered, or the first few paragraphs of the paper were used in place of the abstract. In a few cases, the papers in question were notes so short that the entire paper was entered.

Due to the extremely active state of this field of research, there were a very large number of papers (25 - 30% of the references) too recent to be found in the abstract database as yet. To make the abstracts for these papers available, we placed the entire archive of the Star Formation Newsletters on line, using LaTeX2HTML and the perl script,s and linked the catalog references to these abstracts. This procedure was almost entirely successful, leaving only those papers which are still in preparation and the papers published between 1960 and 1975, which I felt were more easily accessible than were the pre-1960 papers. As these papers which were still in preparation appear in the Star Formation Newsletters the links will be made to the catalog references.

The Star Formation Newsletters

The Star Formation Newsletters were placed on line in order to have immediate searchable access to the most recent abstracts of refereed papers. However, in doing so we found that it was necessary to integrate into the text a much larger number of symbols than was necessary for the proper display of our preprints because of the wider purview of the Newsletter. As a result we undertook a general expansion of the image library to cover most of the cases we would encounter. We also constructed the table of contents for these newsletters, each title linked to the corresponding abstract, in two forms:
  1. a single document containing all of the papers included thus far in the newsletters with each title linked to the abstract for the paper. This document is large but would allow an easy search through the entire database for the desired paper.
  2. individual table of contents files for each issue of the newsletter. These documents are much smaller but require knowledge of the publication date to be used effectively. These titles were also linked to the abstracts.
When the new issues arrive each month they are added to the available database. When papers are published, their final references are added to the newsletters to complete the cycle.

Additional Data Services

We have applied some of the ideas developed for use in electronic manuscript conversion and catalog delivery to some specific databases useful to astronomers engaged in the study of young stellar objects and red and infrared spectroscopy. I will briefly describe here a few of these services.

Pre-Main Sequence Evolutionary Tracks

While we have plans to make extensive databases of our own available through our World Wide Web pages, the first data set that we made available was the set of pre-main sequence evolutionary tracks computed by Francesca D'Antona and Italo Mazzitelli (D'Antona & Mazzitelli 1994). These authors did not have access to a high speed connection to the Internet and agreed that we should make their models available from our server. Accompanying the data files themselves is a hypertext README file written by Dr. D'Antona, README file for a set of plot macros supplied as a guide to the use of the tracks. Access to these files has been welcomed by the star formation community and references to this online resource are beginning to appear in the literature.

2µm Spectra of Standards

Spectra in the 2µm window for a set of 26 spectral standards were published in 1986 by Susan Kleinmann & Donald Hall. These spectra have gained in usefulness with time as the increase in sensitivity of infrared detectors has allowed us to take spectra in this region of many fainter stars, otherwise inaccessible to optical techniques. Infrared spectroscopy has now taken its place as a commonly used technique for the exploration of the character of newly accessible types of objects in the Universe. With this development, access to this never before available data set became crucial for many groups of astronomers.

Susan Kleinmann made the data set available to me, and I used a utility in STSDAS to convert the ASCII files into standard IRAF image files for plotting purposes. Plots were made for all of the spectra to accompany the distribution of the data files. The spectra were written as standard FITS (Flexible Image Transport) files and made available as a a compressed tar file. The plot files were also made available as a compressed tar file. A hypertext README file describing how to display the plots was given as well as a hypertext README file, written by Susan Kleinmann, on the spectra themselves.

Standard Spectra in the Red (5600 - 9000Å) Region

In order to establish standards for spectral classification in the red region for use in our investigations of the stellar populations in star forming regions, we concentrated on obtaining a large number of spectra of stars in the Praesepe cluster (age = 70 Myr) and in M67 (age ~ 3 - 5 × 10^8 yr). We have obtained 107 spectra of stars on the Prasepe main sequence with spectral types ranging from F1 - M4. In M67 we have spectra of 16 stars on the giant branch, 26 subgiants (F5 - K0) and 104 stars on the main sequence with spectral types from B8 - G9.

The spectra of these stars will be made available, with the associated paper (Allen & Strom, in press), as FITS files of images in the IRAF multispec format. Plots of a selected set of spectra are also be available as well as tables of the stellar identifications, positions, magnitudes and colors.

Standard Spectra in the near infrared region (1.25µm & 1.65µm)

In a related project, Michael Meyer, Suzan Edwards, Stephen Strom and Kenneth Hinkle have had a long-term observing project on the Kitt Peak National Observatory 4-meter telescope using the Fourier Transform Spectrometer to obtain an almost complete set of 1.25µm and 1.65µm spectra of the Morgan-Keenan spectral standards. We hope that by November we will be able to make these spectra available as well. There will be approximately 75 stars over the entire range of temperature and luminosity represented in this sample.

Acknowledgements

I wish to express my gratitude to my summer student and aide, Jessica Norman, without whose help much of the work on the Star Formation Newsletters and the Catalog of Herbig-Haro Objects would not yet be finished. I also wish to thank Cathy Van Atta, the librarian at Kitt Peak National Observatory, for her help in obtaining the older reference material.

References

D'Antona, F. & Mazzitelli, I. 1994, Ap. J. Suppl., 90, 861.

Herbig, G.H. & Bell, K. Robbins 1988, Lick Observatory Bulletin No. 1111.

Herbig, G.H. & Rao, N.K. 1972, Ap. J., 174, 401.

Kleinmann, S. G. & Hall, D.N.B. 1986, Ap. J. Suppl., 62,1986.

Reipurth,B. 1994, A General Catalogue of Herbig-Haro Objects, electronically published via anon. ftp to ftp.hq.eso.org, directory /pub/Catalogs/Herbig-Haro.


Karen M. Strom is a Research Professor of Astronomy at the Amherst Campus of the University of Massachusetts and a member of the Five College Astronomy Department. She is the author or co-author of over 150 papers published in refereed astronomy journals. Her current interest in astronomical research lies in the area of star formation and early stellar evolution.

Over the past year she has devoted a large fraction of her time to exploring the use of the World Wide Web and, in particular, the Mosaic browser, in the distribution of large data sets, electronic publication and interactive educational software.

In the past she has had several one or two person photographic exhibitions as well as exhibiting in many juried competitions. She is currently working on a htpertext book to be distributed over the World Wide Web.

kstrom@hanksville.phast.umass.edu