David J. Bianco
Systems Analyst
Computer Sciences Corporation
NASA Langley Research Center
The World Wide Web [1] and NCSA Mosaic [2] have become part of the information dissemination architecture for many LaRC projects, including fulfilling LaRC's commitment to technology transfer [3]. It is part of NASA's mission to "research, develop, and transfer advanced aeronautics, space and related technologies." However, technology transfer is not as simple as placing computer codes, reports and telephone numbers on the Web. Before transfer can occur, the potential receiving party must be aware that a technology is available which can be adapted to their needs. Advertising and displaying available LaRC technologies on the Web has proven to be a highly effective tool in demonstrating the national relevance of Langley Research Center's mission.
The application of WWW to technology transfer projects at LaRC can be categorized as follows: the retroactive cataloging and automation of existing or previous non-electronic services, the adaptation or upgrade of existing electronic technology services, and the creation of totally new WWW technology services. This paper will explore the design and lessons learned in retroactively placing the entire Technology OPportunities Showcase (TOPS 93) on-line, the adaptation of the Langley Technical Report Server (LTRS) to the Web, and the creation of a totally new service, the NASA Technical Report Server (NTRS). Finally, future directions of WWW technology access directions are discussed.
The Technology OPportunities Showcase (TOPS 93) was held October 19-21, 1993. The purpose of TOPS was to showcase critical LaRC technology, expand potential dual-use opportunities, and strengthen existing and cultivate new strategic industrial partnerships. The initial conference had approximately 850 attendees from 400 organizations visit 185 exhibits. There were over 3500 requests for more information left by the attendees, and 42 significant potential industrial contacts were identified. Attendees were given 1 page Technical Information Sheets for each exhibit and additional overview material to take back with them, but for the most part, when the showcase was over and the displays were broken down, access to the technology showcase ceased. If a company was unable to attend the showcase, it would be difficult to reproduce the full impact.
To address this issue, the center director asked that an on-line repository be constructed to preserve this information and allow for the continued use and display of this tremendous institutional investment. At that point, WWW and NCSA Mosaic were just beginning to enjoy wide spread popularity at LaRC, so the TOPS database was to be accessible via WWW. Creating a TOPS database involved maintaining both textual and multimedia information. At a minimum, the technical information sheets plus the photographs of each booth had to be available. Other information would be added if available and appropriate.
The creation of the TOPS database is an interesting story in itself. Many people came together in an interdisciplinary grass roots effort to bring the database together in just a few months. Most of the time spent in preparation was due to the fact that none of the Technical Information Sheets or photographs were available electronically. All of the information had to be scanned in and hand corrections made to the OCR (Optical Character Recognition) output. The full process for the conversion of the conference from paper form to electronic form is covered in [4].
The TOPS database design was guided by the model of having a guaranteed base-line functionality: the HTML technical information sheet. Hyperlinks to the appropriate Points of Contact (POCs) would query the LaRC phonebook in real time. If any technical references exist on-line, they should be linked in as well. If the POCs of a particular exhibit were interested in providing more information, a "tour" of the exhibit link is provided by a hyperlink from the technical information sheet.
The TOPS home page offers several levels of functionality, including: browsing the entire collection of technical information sheets, browsing by subject category, or searching by keyword. A map of the exhibit layout is also available, and allows the user to choose an exhibit and view the associated technical information sheet. Additionally, all of the photographs taken at TOPS 93 are available. They can be browsed through small subject oriented collections of inlined thumbnail GIFs which contain hyperlinks to full-size JPEG images.
The TOPS database takes advantage of several interesting Web technologies, including building upon other Web databases. The links to the POCs are hyperlinks to the existing LaRC phonebook databases, ensuring that the most up to date information is returned. TOPS also builds upon the Langley Technical Report Server (LTRS), so if the references in the technical information sheet are available on-line, a link is provided to the report residing in the LTRS database. The full architecture of the TOPS database is provided in figure 1. Additionally, TOPS provides for automated tracking of metrics. When a customer wishes to request more information about a certain exhibit, an HTML form is filled out and the information is mailed to the appropriate POCs and to a central repository. This facilitates tracking the status of the request and automated the inquiry response.
The TOPS home page has been visited over 1550 times since 6/1/94. LaRC has received over 25 requests for more information since the TOPS database went on-line. The TOPS database also serves as useful reference material for both internal and external customers. Perhaps the largest drawback for the TOPS database is the problem of increasing customer awareness of its existence. Currently, many of the intended primary customers for TOPS are not well connected, and may not have full Internet access. However, as the popularity of such tools as NCSA Mosaic continue to grow, the intended audience that can take advantage of this resource should increase.
On January 14, 1993, NASA Langley Research Center (LaRC) made approximately 130 formal, "unclassified, unlimited" technical reports available via the anonymous FTP Langley Technical Report Server (LTRS) [5]. LaRC was the first organization to provide a significant number of aerospace technical reports for open electronic dissemination. LTRS has been successful in its first 18 months of operation, with over 11,000 reports distributed and has helped lay the foundation for electronic document distribution for NASA. LTRS strives to provide technical publications in the most cost-efficient manner using the best open systems technology available. As a result, the LTRS provides researchers with quick and easy access to Langley information in the aerospace-related fields.
In the attempt to provide the best system possible, several design requirements became evident. First, LTRS should not require significant human intervention or maintenance. Second, it should support as many hardware platforms as possible. Third, to be counted a success, LTRS must be "better, cheaper, faster."
A high maintenance system negates any benefits gained from an electronic report distribution system. LTRS is highly automated, and requires minimal operator intervention during production. By adapting existing tools, and by implementing new ones when necessary, there is minimal time required for general maintenance and the inclusion of reports into LTRS.
Because it was designed to be platform independent, LTRS insures that the widest possible customer base is reached. Limiting access to information based upon the computer preference of the user places arbitrary restrictions on its potential market. There is a point of diminishing returns; not all potential customer system configurations can be supported. However, it is advantageous if a single system can simultaneously support a range of modern computing platforms. It is even better if a conceptually similar access method exists for multiple platforms. As a Web application, LTRS is able to leverage existing technology to easily provide platform independence.
LTRS was intended as a rapid prototype, not an extended software development effort. The system was made available when it appeared reasonably stable, but advertised with the appropriate caveats about being an experimental service. As such, all resources had to be readily available at little to no cost. This implied the use of existing tools, methods and protocols where possible. LTRS is accessible via numerous Internet-based information access methods [6], so a custom front-end did not need to be developed. Report retrieval is based on the File Transfer Protocol (FTP) and Hyper-Text Transfer Protocol. Indexing and searching is implemented with public domain version of Wide Area Information Server (WAIS) [7].
Currently, over 350 reports are available via LTRS. This number reflects the fact that inclusion of many of these reports depends on the authors submitting them to the system. A more automated method of inclusion is being developed. Despite the relatively small sample, LTRS has so far distributed over 11,000 documents to users all over the world, during the time frame of 1/93 to 7/94. Table 1 shows the number of reports served to the different Internet domains during the first 18 months of operation. While foreign usage of LTRS remains significant at 34%, it is interesting to note that, according to an internal LaRC report, foreign addresses account for 35% of the paper report distribution for NASA in the "General Aeronautics" category.
Building upon the experiences of LTRS, the NASA Technical Report Service (NTRS) is an inter-center effort to provide uniform access to various distributed publication servers residing on the Internet. It curr ently provides access to documents from 9 different NASA organizations spanning the United States: Dryden Flight Research Center, Numerical Aerodynamic Simulation Division (NAS) of NASA Ames Research Center, Goddard Institute for Space Studies (GISS), Institute for Computer Applications in Science and Engineering (ICASE) at NASA Langley, the SCAN (Selected Current Aerospace Notices) and RECON databases maintained by the NASA STI (Scientific and Technical Information) Program, the STELAR (Study of Electronic Literature in Astronomical Research) Project from Goddard Space Flight Center and the Astrophysics Data System (ADS) Abstract Service.
As with LTRS, the emphasis of NTRS is ease of use and conceptual simplicity. When users access NTRS, keywords are entered in the dialog box. If they wish, they may also select which collections of documents to search. NTRS then returns a list of documents matching the specified search terms, from which the user selects abstracts to view. If, after viewing an abstract, users are interested in reading the associated paper, they can choose either to view or to download a PostScript file. If an on-line copy of the paper is for some reason unavailable, they are told how to order the printed document through more traditional means.
The time for these on-line searches and retrievals is measured in seconds, not the days or even weeks normally associated with receiving a hardcopy report. A search across all NTRS databases may be completed in as little as 15 seconds. This is a tremendous savings in time over normal library searches, and is completed entirely at one's desktop.
While LTRS solves many of the problems associated with document delivery from a single, centrally located database, it does very little to address the larger issues of dealing with multiple databases. A distributed information system is a special class of problem: logically centralized, yet physically distributed. As an experimental service, NTRS must also reuse existing resources whenever possible to provide maximum functionality with minimum new resources.
NTRS must present a unified view to the user, but should take advantage of the distributed nature of WWW to allow for flexible construction. Users do not wish to spend time learning how to ferret information out of all of the different caches of data available on-line, so a uniform user interface is mandatory. At the same time, large-scale monolithic databases do not scale well, and as the number of reports and users increases, the administrative burden on the maintaining organization increases to the point where quality control and accuracy becomes difficult.
NTRS maintains simplicity for the user without introducing a high administrative load by separating the user interface layer from the database searching layer. Users do not need to know that they are searching several databases with one query. As long as the query engine interface remains constant, contributors are free to organize the actual databases as they see fit, and to provide whatever access controls they deem appropriate. Administrative control over documents remains with the contributing organization.
Web resource reuse was the second main requirement. In achieving requirement 4.1.1, NTRS could not break or request a rewrite of existing abstract and technical reports databases. NTRS could not assume that all resources would change to meet its interfaces, but rather NTRS should accommodate as many diverse databases and systems as appropriate. Figure 2 illustrates a simplified version of the NTRS architecture.
Initial customer feedback has been very positive. NTRS made its debut on June 6, 1994 and has been accessed 10,232 times as of September 15, 1994. A total of 10,484 queries have been performed, at an average rate of 103 queries per day, with significant usage increase each month (It is important to note here that these numbers represent database queries, not document retrievals. Since documents may be stored in any of several available databases, retrieval statistics for NTRS are more difficult to assess than those for LTRS). At the time of this writing, 5956 unique sites have used the service. Of these, 5266 were non NASA addresses. User feedback is received via e-mail, or by an on-line form within NTRS. Work is underway for the inclusion of other NASA centers and related institutes.
Judging from feedback received from users of the various systems, it is apparent that Langley is providing quality implementations of necessary services, but NASA's mission is to look ever forward. Experience and expertise gained during the past year provide ideas and guidance for the future.
The TOPS effort taught several valuable lessons, the most important of which was "A bit of planning is worth a terabyte of work." Since the original planning committee did not originally anticipate making TOPS '93 available electronically, the team who put the Web version together had to duplicate much of the work of collecting exhibitor information and writing display descriptions. The concept of an on-line version of a physical event proved so successful that it was applied to an Internet Fair held at Langley in June of 1994 [8]. As a result, the TOPS '95 committee determined from the start to organize an on-line TOPS, even prior to the physical one.
The usage statistics for the Langley and NASA Technical Report Servers are clear indicators that the public is interested in the results of NASA's research. The sustained success of Oak Ridge National Lab's NETLIB software distribution server indicates that there is a demand for network accessible computer programs [9]. Many projects produce software, for example, which may be of use to others in their field. The LSS will allow quick and easy access to the body of locally developed source code and binary distributions.
An interesting new report in LTRS is the 1993 Research and Technology Highlights (NASA TM 4575). This annual compendium of research highlights is produced each year by the Research Publications and Printing Branch (RPPB) and is intended to provide to LaRC's customers an overview of the breadth of LaRC's research involvements, many of which have not been published. These annual summaries have significant value, providing points of contact for a variety of projects, and it is desirable to disseminate as many copies as possible. However, documents of this nature pose an interesting dilemma: if the presentation quality of the document is upgraded to make it attractive for general consumption, the accompanying rise in production costs limits the number of copies that LaRC can afford to print and distribute.
The answer is to use WWW to produce an accompanying on-line version of the report. This solves the problem of distribution costs, since Web access is both convenient and "free". Some presentation problems are also overcome. For example, now color images can be included at no additional direct cost. Other services can also be provided that increase the usefulness of the electronic version of the document beyond that of the paper version. For example, now keyword searching is included within the document. Multi-media data, such as the provision of sample data sets, representative videos, and audio narrations are now possible. These capabilities allow the WWW version of a document to far surpass the capability of the printed version. The WWW version of the Research and Technology Highlights has not been available long enough to report usage statistics here, but it is not difficult to imagine that this could be the most widely read and successful one yet.
Of course, it is often the case that a paper contains a reference to an associated software package, and that the software in question contains references to other papers. Many customers also want access to the data sets, visualizations, and other assorted materials. Thus, there is value in having a unified index to search for multiple representations of a technology. A proposed project, the Langley On-line Research Explorer (LORE), will serve as a central interface for any type of Langley-generated information: technical reports, conference papers, software packages or even multimedia experimental datasets will all be accessed from the same logically central point.
The availability of network technology services not only benefits the existing class of technology transfer issues, but also introduces a number of new considerations. Among these are the opportunity for more meaningful metrics, new formats and presentation of data, and security concerns regarding this collective body of technology.
One advantage of having all of the resources on-line is the possibility of increased meaningful metrics and user feedback. Now, by tracking the activity in the server logs, it is possible to determine which resources are favored by whom. Through the use of feedback forms and other tools, users now have the capability of providing instant feedback about the technology services. If these metrics are applied in a feedback loop, they can answer the question of where to apply limited resources in developing new services and retro-fitting legacy systems. However, it should be noted that currently not all disciplines and customer classes are equally represented on the Web. This should be kept in mind when drawing conclusions based on usage metrics.
Through the use of WWW, it is now possible to present information in methods not possible in the current paper medium. This includes video, audio, real-time queries to databases, and easy references to other hypermedia works. Users are already requesting more highly integrated data to be delivered from systems like TOPS and NTRS. However, care should be taken to not bombard the user with an array of irrelevant hypermedia choices. After an initial period of experimentation and assessment of the needs of the intended audience, it should be apparent what level of hypermedia a document should possess.
When discussing WWW resources, the question of security is often raised. With the increased focus on partnerships with industry, LaRC now must be conscious of both traditional classified material and the new area of "proprietary information." When partnerships involve multiple companies, the hierarchy of which information can be shared with whom becomes difficult. Added to this are concerns of electronic espionage by both foreign and domestic competitors. So how is security achieved in World Wide Web technology access services?
The first order of security would be offering Internet hostname screening. Unfortunately, while this sounds attractive at first, it has a number of holes. The most troublesome is that there are no good rules for which Internet domain names are "safe" for a give purpose. For example, ".com" and ".edu" are not restricted to just domestic companies and universities. Even if the access were to be specifically allowed for just some companies, such as "ford.com", there are methods to "spoof" domain hostnames, thus bypassing the security. Even if the hostnames could be guaranteed to be "safe," there is no mechanism for authenticating the user at a particular host. So if a machine at ford.com is compromised, the intruder at that machine would have full access. The final insecurity is that the Internet is much like a telephone party line: it is not difficult to listen in to what others are saying. Even if a legitimate transaction is occurring, a untrusted third party could be "sniffing" packets passing through the Internet.
One method of authenticating users (independent of hostname exclusion) is the ability to password protect certain hierarchies of information. This allows for some level of protection, but it is not perfect either. Even if the problem of managing passwords is ignored, this method still allows unprotected data (including the password!) to travel across the Internet. Hostname exclusion and password protection are an easy method to achieve a modicum of protection which may be sufficient for certain applications. However, for real security, an encryption technique must be employed.
Motivated by the potential of wide-scale commercial transaction on the Web, it is probably just a short time before the majority of Web clients and servers have some form of encryption capability. Currently, it appears that the PGP (Pretty Good Privacy) package will be the encryption method of choice [10]. PGP is based on the public/private key encryption, and allows for the open, safe exchange of confidential, sensitive or proprietary material. A number of companies are finalizing HTTP servers and Web clients with internal support of PGP or related algorithms, and they should be readily available within the next 6 months.
Currently, nothing on the LaRC or any part of the NASA Web is sensitive or proprietary. Since the technology for true protection is not yet widely available, the information that is on the Web is never more than what people, domestic or foreign, could obtain through conventional channels. The current model is to provide awareness of a certain technology, and then establish a contact between LaRC and the other institution. Authentication, if necessary, can be done in a conventional secure manner.
Langley Research Center has been on the Web for approximately 15 months. During this time, WWW has grown to be an integral information dissemination tool for many projects. Among the most promising applications of WWW is in the area of technology transfer services. At LaRC, entire non-electronic events have been successfully retroactively been placed on the Web, such as the Technology OPportunities Showcase (TOPS). Some electronic resources, such as the Langley Technical Report Server (LTRS), have been successfully transitioned from less user oriented methods such as anonymous FTP to a intuitive, full WWW application. The WWW has also allowed for the creation of entirely new services, such as the NASA Technical Report Server (NTRS). More technology transfer services are expected to be available in the near future, such as the Langley Software Server. Users will be able to check the Technology Applications Group Home Page for new services as they become available.
TOPS has been visited 1550 times and has produced 25 requests for more information concerning LaRC technology. LTRS has distributed over 11,000 reports in its first 18 months. LTRS is still accessible as an anonymous FTP site, but WWW has now become the primary method of access. NTRS has proven quite successful in its first 4 months of operation, servicing over 10,000 queries from over 5,900 hosts. Many LaRC researchers are enthusiastically anticipating the release of the Langley Software Server (LSS), which should be another successful WWW technology access service. LaRC will continue to experiment and combine new and existing technology transfer resources in the hopes of producing a better user interface to the wealth of information that is on the Web.
Since coming to work for Computer Sciences Corporation in November of
1993, David J. Bianco
Please direct all correspondence to: M.L.Nelson@LaRC.NASA.GOV