HEASARC/GSFC Home Page

Implementation of an Astrophysics Information System on the World-Wide Web

Alan Richmond, Margo Duesterhaus (Hughes STX); Dr. Nick White (NASA)


Abstract

Many astrophysics and astronomy tools have been developed utilizing Mosaic and the Web. Several tools have been developed at the High Energy Astrophysics Science Archive Research Center at NASA's Goddard Space Flight Center, and are now being used by scientists world-wide. NASA's first forms-based relational catalog browser on the World-Wide Web (WWW), StarTrax provides access to catalogues and astronomical data. The emphasis is on providing access to data from high energy astrophysics satellites. These satellites launched by NASA and other space agencies observe X-rays and Gamma-rays from all kinds of astronomical objects such as stars, galaxies, supernova remnants, clusters of galaxies and active galactic nuclei. StarTrax provides a simple interface used by simply typing the name of the object or its coordinates. This causes all the corresponding entries in the specified catalog to be retrieved. If there are any data files associated with a particular entry, they can be retrieved by following the hyperlink. We overview the implementation of this system.


The High Energy Astrophysics Science Archive Research Center

HEASARC was created by NASA in 1990 as a site for X-ray and Gamma-ray archive research. The motivation for the HEASARC is to provide a multi-mission archive for the high energy data from ROSAT, GRO, BBXRT, ASCA, and XTE missions, that coexists with the archivial data from past missions such as Einstein, HEAO 1, HEAO 3, OSO 8, SAS 2 and 3, Uhuru, and Vela5B. Data from non-US missions, eg, EXOSAT and Ginga, also is provided as international agreements allow. The total data volume will be of the order of 1 Terabyte by 1995. These data are available on-line for immediate access.

The HEASARC is located at the Goddard Space Flight Center and is a collaboration between Goddard's Laboratory for High Energy Astrophysics (LHEA), and the National Space Science Data Center (NSSDC). The LHEA is responsible for the science content of the archive, the NSSDC is responsible for the data archive managment. The HEASARC data holding will consist of data from past, current, and future missions.

Per WWW, ad Astra

The LHEA Computing Environment pages on the World-Wide Web contain information concerning the computer facilities in the the Laboratory for High Energy Astrophysics. The purpose of this on-line documentation is to provide easily accessible answers to questions that a computer user in the LHEA might have. The pages include lists of workstations, staff members, and available hardware and software, as well as a reference to all of the printers and their locations. There is also a menu of frequently asked questions and links to the pages were the answers can be found. The pages make use of Mosaic's fill-out form capabilities as well. The forms provided on the LHEA Computing Pages allow a user to request a new account, request a new IP address, or file a problem report. This information is sent directly to the system managers.

A Basic HTML Style Guide was written to help maintain a consistent, recognizable HEASARC identity on the WWW, and to help reduce the difficulty of maintaining pages written by diverse people. We soon discovered that people would move or delete their Web pages and not tell us.. but fortunately the task of discovering this is now automatic: we use verify_links from the Webtest tool suite. The link verifier tool starts from a given URL and traverses links outward, subject to a specified search profile, producing a report on the state of all the discovered links. This tool employs a non-redundant breadth-first traversal search strategy to unearth all HTTP HREFs in SRC, A, FORM, LINK, and BASE tags.

The WWW offers the opportunity to `mix and match' reusable components in a rapid prototyping approach; we were astonished in our early days (only 1 year ago!) at how quickly we could deliver useful functionality. Later we began to realise that some of our early optimism was perhaps too naïve: the need for good software engineering practices is probably even greater than before. There are many more components being interconnected and interfacing, and exponentially more failure points. Perl helps you to hack up something real fast without worrying too much about the engineering aspects too much.. till too late..


Architecture

Click on a box at the end of a numbered arrow to get explanation.


  1. Display Observatory/Mission Choices: Browse.html. The user will choose one of the following:
    [Master] An overview of all Browse catalogs
    [ASCA] Advanced Satellite for Cosmology & Astrophysics
    [ROSAT] The Röntgen Satellite
    [CGRO] The Compton Gamma-Ray Observatory
    [Einstein] The 2nd High Energy Astrophysics Observatory
    [EXOSAT] The European X-ray Astronomy Satelite
    [HEAO-1] The 1st High Energy Astrophysics Observatory
    [BBXRT] The Astro 1 Broad Band X-ray Telescope

    Other missions ( Ariel-V, COS-B, Ginga, EUVE, HST, IRAS, SMM, TD1, Uhuru, Vela 5B )
    Astronomical catalogs
    The casual user might start with the master catalogs which summarize the contents of the HEASARC database. These master catalogues are divided into observation logs, optical, radio and X-ray catalogs. Once the mission (e.g. ROSAT) or catalog with the source in has been established then a more detailed overview is obtained by going to that mission or catalog (by a direct hyper link from the found master catalog entry to the original). This sends a URL to the HEASARC WWW server, which invokes the next phase. [Back to map]

  2. The corresponding search form is generated by Squery.pl, which generates the HTML form for a specified mission; e.g.

    MASTER_CATALOG Databases:

    Select 1 or more: :
    The number before the catalog name is the default search radius.

    Enter name or coordinates to search: Coordinates can be given like 12 34 56.0, -78 12 34.0 or 12 34, -78 12 or 123.12, -67.0 (please separate them with a comma or +/-).
    Name search resolver :

    Input coordinate type: Fields :
    Input and output equinox: Search radius: arc mins

    If the object name is typed, then a name-to-coordinate resolver is used to give the coordinates on the sky for that object. Two different name resolvers are provided. The SIMBAD resolver sends a query to the CDS database in Strasbourg, France. The results of any successful queries to that database are cached locally, so that future requests for that object do not require a further request over the network. While SIMBAD is probably the most complete database of source object names, there maybe some missing X-ray and Gamma-ray objects. A HEASARC name resolver is provided for those special cases. [Back to map]


  3. A HEASARC catalog is queried for records & data products by squery.pl, which decodes it & generates SQL to query the HEASARC database (using pre-existing Browse/SQL software). The results are returned as a list of selectable targets:

    Results from SIMBAD name resolver: M1 05 34 05, 22 01 11 (J2000)

    Here is the documentation for ALLDATA.
    SQL: equinox 2000";" select '$lsm' from ALLDATA where @\(05 34 05, 22 01 11\) \<= 60.0

          Name        RA(2000)  Dec(2000)       Time         Exp    Database
                     (hh mm ss) (o  '  ") (yy.ddd hh:mm:)
    --------+-----+-----+--------+----+-----
    [..omissions..]
    CRABPULSA        05 34 26.2  21 54 03 85.294 10:41:36     2913 CMAIMAGE
    CRABPULSA        05 34 26.2  21 54 03 85.294 05:47:28    17389 CMAIMAGE 
    [..omissions..]
    

    When the original query came from the MASTER catalog, it is repeated on the source catalog (`Database'):

    Results from SIMBAD name resolver: M1 05 34 05, 22 01 11 (J2000)

    Here is the documentation for CMAIMAGE.
    SQL: equinox 2000";" select '$lsm' from CMAIMAGE where @\(05 34 05, 22 01 11\) \<= 60.0

       Name      Seq   Expos  Time  Filter  RA(2000)  DEC(2000)  Inst
                       (sec) (yy.d)        (hh mm ss) (o  '  ")
    -----+---+---+---+---+-----+-----+---
    [..omissions..]
    CRABPULSA    1863   2913 85.294     8  05 34 26.2  21 54 03  L1
    CRABPULSA    1863  17389 85.294     8  05 34 26.2  21 54 03  L1 
    [..omissions..]
    
    [Back to map]

  4. The user may select a target having data products. This invokes hdprods.pl, which decodes the URL to get the cache identifier and record number. The major `limiting characteristic' of HTTP is statelessness. For our Browse interface we want to engage in a dialog with the user which progresses from nothing, and builds up to a state where we have identified a target satisfying given constraints, and can offer associated data products. So we save state in a file identified by a sequential number, & only encode the identifier in the URL. This state keeps the result of a first data product query, to avoid repeating potentially lengthy queries later (i.e. cache them). From this we extract information such as type, description, format, & location, & generate html which is displayed back on the browser. The type is displayed as a hyperlink. The file name, size, description, and format are also listed.

    Click on an underlined record to display/retrieve the data. Before doing this for the first time, please ensure your system is configured correctly. Note that in NCSA Mosaic for the X Window System you can use the switch Load To Local Disk (formerly Binary Transfer Mode) under the Options menu, to replace spawning of a viewer.

    IMAGE	(0.04-2.0 keV, FITS, 81887 bytes):	a63259.img.Z
    [Back to map]

  5. The user may select an associated data product. The user can click on one of these listed data products, to retrieve it. This invokes HTTP directly on the machine holding the data products. The client interacts directly with the server at the data publishing site to retrieve images and relevant textual data. This need not be the same site as the query server. One of our major errors was to try to have an HTTP server on one machine initiate FTP transfers from another to the user; i.e. the built links were to the FTP server on a third machine, and we relied on the user having modified their mime.types so that their local browser could invoke the appropriate viewer. We finally realised that it's far better to have an HTTP server on the third machine so as to ensure the reliability of the MIME-typing. [Back to map]

  6. The data product is delivered to the user. The client-side has mechanism for specifying which `viewer' to use for a given data-type, e.g. xv for gif's. These have to be MIME-types (Multimedia Internet Mail Extensions) or sub-types (e.g. x-fits). The browser consults the mime.types configuration file to determine what action to take with the data. All astronomical data files are stored in a standard format called FITS. FITS images can be directly viewed by setting up your .mime.types and .mailcap file to spawn the appropriate FITS image viewer (saoimage, nihimage, etc...).

    When we were writing GUI's we had complete control over our interface and could simply create user-side code as we saw fit, e.g. to pop up some dialog. We've given up that flexibility for the time being, because the advantages we gained were very significant, but we hope that ultimately, there will be some equivalent on the client side, to CGI, so that we can for example, interface data analysis tools to browsers.

    Another problem is variations in presentation via different browsers/platforms. We really would like to see more uniformity between browsers. Often one may spend considerable time getting the layout to look good on one (with all defaults selected), only to find that it's a horrible mess on another. We found this especially problematic in our Remote Proposal Submission system (RPS), which provides facilities for submitting Observation Request forms.

    One of the major problems in implementing this were was how to get satisfactory layout. This was experimented with and finally we settled on parsing a LaTeX template to get position information and make the screen layout resemble the actual paper form layout using preformatted layout. And if you don't - well, the widget and its label may get separated by a line break.

We're now working to make this software reusable by others. The current modules are: coordinate convertor; coordinate/name parser; name resolver [SIMBAD/HEASARC]; conversions to/from RA/Dec - HMS format. There are also some tools integrated with StarTrax, e.g. Coco coordinates convertor; Viewing: dates when object or coordinate is observable from a specified mission.


Planned Enhancements

A dynamic observatory selection page.
The user needs to be able to look at more than one mission at a time. This would allow the user to select one or more catalogs and setup the catalogs that are available from the page where you select the catalog and enter the coords. They would be given a window with all the current missions that are available, and they could select one or more observatories or catalogs. The result of this would be to query and fill the catalog selection window for the subsequent search selection page. This way one could select ROSAT and ASCA then get all those tables in the subsequent page so one could simultaneously search both ASCA and ROSAT.

A general parameter search page.
This is a scheme for a general STARCAT-like WWW browser. There are two ways to go about this. One is to overwhelm the user with all the possible parameters they can select on, or to let the user to select the parameters they want, both to search on, and then to display. Once they have selected the ones to search on, then one gives helpful information like the valid range for that parameter, or if there are only a few (say less than 10?) valid values given in a selection window. This would require some preprocessing of the database in advance to have all the values ready. This facility should allow joins between tables. The first bit is actually an improvement to the current way we provide catalog selection, and is in a sense independent of the development for the more general catalog selection.

One way to get to the more complex query page is if no name or coordinates are given, it then sends back the more complex form. The user will at this point select one or more catalogs, as they do now. And/Or there could be an option on the Browse observatory selection page. The first page allows the user to choose which parameters they wish to select on. The format would be

parameter name, description, & type
	
For multiple table selection, the parameter name would be preceded by the database name. e.g. rospublic.name (which is needed for the database query anyway). We would include a cone search as a search parameter. The next page is the entry form for the selected parameters where they specify the search. and make the search. The user gets a series of boxes to fill in for the parameters selected previously. We would allow users to save this page so they can bring it back at a future time without have to reselect parameters. This form would be
Parameter range:  min value, max value
	
The user might give 10000:20000 meaning they want all values between 10000 and 20000. Or >20000 or only 10000 and 20000. Also on this page we repeat again all the parameters and allow the user to choose which ones they want displayed as a result of the query. The default will be the ones they are selecting on.


Observations

We have found the WWW method of development to be remarkably effective, when applied to client-server style information systems. The greatest advantage for the developer accrues from the high level of functionality built into the browsers & servers. The price to pay is some loss of control & flexibility; you cannot do everything you might wish. In particular, the browser side is relatively dumb - its currently not easy to add much functionality there, in contrast to the server side through the CGI. In spite of these sometimes severe disadvantages, we are sure the WWW is a very effective platform for rapid information systems development in astronomy & astrophysics.

The major difference between the `classical' and the WWW modes of development, is that because the WWW provides a substantial pre-existing infrastructure supporting a client-server architecture on the Internet, one can deliver application functionality at a much faster rate - say, an order of magnitude faster (to be quantified later). In the `classical' mode - which we started this project in - the developer(s) spend a great deal of time wallowing in relatively low level code. We haven't yet achieved the goal of `reusability' in spite of the promises of modern software engineering methodologies.

In WWW mode, using HTML (HyperText Markup Language) (& ideally, Perl), since you build on that infrastructure, you can deliver functionality very quickly. This not only delivers the promises of rapid prototyping (throw one away), but also of rapid development. The way we did the development was, one of us is a scientist, with a positive interest in this development. The other is a software developer with technical competence in the WWW. The work was very much a synergy: the scientist would propose a design, and the software developer would explain to him, why it was impossible to implement. A little while later (depending on how impossible) the developer would invite the scientist to try out his design; the scientist would then propose impossible improvements.

The iteration cycle time was often of the order of minutes, rather than hours or days - as classically. Not only was this due to the pre-existing WWW functionality, but also because we chose to use Perl, instead of C. Perl is ideally suited to this kind of work, because it too provides a great deal of ready-made high-level functionality. For example, we converted some 30 lines of C code, for decoding URLs, into 3 lines of Perl. This method of development was found to be an order of magnitude faster than classical methods (e.g. GUI builders and C). The greatest advantage for the developer accrues from the high level of functionality built into the browsers & servers.


Related Publications

Richmond, A., et al., 1994,
Design of a Remote Proposal Submission System , to be demonstrated at the Fourth Annual Conference on Astronomical Data Analysis Software and Systems.

Richmond, A., White, N., 1994,
The Design and Architecture of an Astrophysics Information System , to be published in the Bulletin of the American Astronomical Society.

Richmond, A., 1994,
Towards an Astrophysical Cyberspace: The Evolution of User Interfaces, to be published in the proceedings of The Third Annual Conference on Astronomical Data Analysis Software and Systems .

Richmond, A., et al. 1994,
StarTrax - The Next Generation User Interface, to be published in the proceedings of The Third Annual Conference on Astronomical Data Analysis Software and Systems .

Pasian, F., Richmond, A.
User Interfaces in Astronomy, in Data Handling and Analysis in Astronomy, ed. D. Egret, M. Albrecht, Kluwer Academic Publishers, June, 1991.


Biographies

Alan Richmond

Principal Systems Engineer and Group Leader at NASA's Goddard Space Flight Center. Pioneered the presentation of astrophysics satellite data on the World-Wide Web (WWW), through the StarTrax interface. Administrator for the The WWW Virtual Library section on WWW Development. Created A Basic HTML Style Guide; WebCalc, A Pocket Calculator Demonstration; StarChild, A NASA - K12 Proposal Demonstration; WebStars, A Directory of Resources for Astronomers; CyberWeb, A Directory of Resources for WWW Developers; Remote Proposal Submission; Tools integrated with StarTrax: Coco coordinates convertor; Viewing: dates when object or coordinate is observable from a specified mission. Built software for several international scientific research projects, e.g. the European Synchrotron Radiation Facility (ESRF); the NASA/ESA Hubble Space Telescope (HST); and the Joint European Torus (JET). Over 16 years software development experience, has been a member of several major computer societies, and has published several papers on software development. Degrees in physics and mathematics from King's College, London, and the Open University (UK), and a Eur.Ing (Paris) and CEng (UK).

Margo Duesterhaus

Margo Duesterhaus is currently a senior programmer/analyst for the High Energy Astrophysics Science Archive Research Center (HEASARC). She is the Proposal Manager for ROSAT, ASCA, and XTE and is responsible for maintaining the proposal databases for each project. She is leading the development of a new Remote Proposal Submission (RPS) software that has both an electronic mail and X-Mosaic interface. She is also leading the development of a new Mission Information and Planning System (MIPS).

Dr. Nicholas (Nick) White

Dr. Nicholas White is currently the head of the Office of Guest Investigator Programs, OGIP, in the Laboratory for High Energy Astrophysics at the Goddard Space Flight Center. The OGIP organization includes The ROSAT Guest Observer Facility, The ASCA Guest Observer Facility, The XTE Science Operations Center, The Compton Gamma-Ray Observatory Science Support Center. Nick White is the director of the HEASARC and the Project Scientist for the ASCA mission, with the specific responsibility of directing the ASCA Guest Observer Facility, GOF. His astrophysics interests include studies of X-ray binaries and stellar coronae. He has over 100 publications in refereed journals. Prior to joining GSFC in November 1990 Nick White was the EXOSAT Project Scientist and was responsible for the EXOSAT Observatory activities (1986-1990). This function included creating the EXOSAT archival data products and catalogs, and the EXOSAT database system. The EXOSAT database system now is called the HEASARC Browse facility, and is in use for accessing all the HEASARC archival data. He worked as part of the EXOSAT Observatory team from 1982-1985. Nick White obtained his Ph.D. in Astrophysics at University College London in 1977 using Copernicus (OSO-3) and Ariel V X-ray data. Upon receving his doctorate, he worked at GSFC from 1978-1982 working on HEAO 1, HEAO 2 and OSO 8.


[SkyView] [WebStars] [Feedback] [StarTrax] [Browse]
http://guinan.gsfc.nasa.gov/Confs/MW94.html
richmond@guinan.gsfc.nasa.gov