Mike Crandall
Mark C. Swenson
Abstract:
Web technology has found a fertile breeding ground in large corporations because of its ease of use and initial deployment over existing high bandwidth network infrastructures. Making good use of the Web has proved to be a more difficult task, and issues such as long term maintenance and information management have become more visible as the Webs grow in size and visibility.
The Boeing Technical Libraries are building upon previously developed electronic information delivery systems to use the Web as an integrating tool for both internally generated and externally published information across the Boeing Company. Through a central indexing and validation service this information is being organized and made available to the entire company.
Development of these services has pointed out several issues and lessons which may be of value to other large distributed internal Web sites. These include the centralization of a few key services (indexing, link checking, html validation), and the value of having an organization with prior responsibility for information management and distribution integrating these services with its current processes.
Keywords:
Information Management, World Wide Web, Text Retrieval, Indexing, Publishing, Libraries, Corporate Web Applications, Information Filtering
The Boeing Company is one of the largest aerospace companies in the world, with over 105,000 employees in multiple locations throughout the United States and internationally. The business units and operating divisions within the company have historically developed their own information systems based upon customer requirements and internal needs, resulting in a diverse networking and computing environment, with virtually every major (and minor) manufacturer or vendor represented somewhere within the Company.
The one common thread that ties Boeing's systems together is the TCP/IP based backbone network. In early 1994, Boeing began implementing an internal Web site to take advantage of the cross platform connectivity provided by the Web browsers then available. Subsequent developments in Web technology have made the Web an attractive solution for many cross-company services that were previously impossible to implement.
The following paper discusses the use of Web technology to provide library services to the user population in Boeing, using the existing systems and networks. While this is probably not a state-of-the-art implementation of the tools available in today's rapidly changing Web world (1), it illustrates some real-world applications that have been enabled only because of the platform independent nature of the Web. It may also provide some insights into directions that Web technology developers and providers should channel their development efforts to provide the tools and services needed by large Intranet customers.
The Technical Libraries at Boeing have traditionally been the focal point for distribution of external information services to Boeing employees. They also index the internal company documents into their on-line catalog system. A full time staff of 42 people manage the collections and provide services to over 25,000 employees a year. The library has four separate geographic locations in the Seattle area, with smaller independent libraries in Philadelphia, Wichita and Huntsville which share the on-line catalog system for their documents.
Because of the wide geographic distribution of Boeing employees, many library services are provided remotely. A staff of 12 research librarians provides mediated access to on-line information resources, producing over 14,000 customized research packages for employees every year, along with 825 ongoing updates on specific topics for individual employees from these on-line systems. Nearly 7,500 employees have accounts on the online catalog, which is currently based on an IBM mainframe platform and uses the internal SNA network for access. The libraries also are the purchasing agents for publications distributed throughout the company, and provide routing services and direct delivery for newsletters, newspapers, periodicals and reports used by employees at the various locations.
Over the past four years, in response to employee requests and changes in the publishing industry, the Boeing Technical Library has been developing an automatic distribution system for full-text electronic publications. Because Boeing uses virtually every computing platform available, system development has been constrained by two basic requirements: the need to use existing delivery systems and to provide a commonly readable output.
The initial system development was in response to a request from the Boeing Defense and Space Group's business development organization for a specific daily publication, Commerce Business Daily. The Technical Library was routing multiple paper copies of this publication to users throughout the country, and the time delay through normal routing was unacceptable because of the necessity of quick reaction to items contained in the publication (primarily requests for proposals for government contract opportunities). The publication averages over 30 pages of fine print per day, and only small (but varying) portions were of interest to each employee needing access to the publication.
After evaluating several information filtering products available at the time, TOPIC from Verity, Inc. (2) was selected as offering the best performance and potential for future growth. The distribution mechanism was, by default, the company e-mail systems, which were the only paths allowing direct access to users in various locations on multiple computing platforms. This meant that the system would be limited to ASCII text delivery only, since image delivery and display varied from e-mail system to e-mail system, and also displayed differently on user's workstations.
Working within these constraints, an automated delivery system was set up to capture and redistribute the incoming publication on a daily basis during the night. Subject filters were built by the library staff for each user and stored in the system, so each employee would find only those items of interest to them waiting in their electronic mailbox every morning.
With the delivery mechanism, a good filtering capability, and both individual and broadcast delivery capabilities for internal distribution in place, it became a relatively easy task to add other publications. The major difficulties lay in convincing the publishers to offer a blanket copyright license for the company (at a reasonable cost), and to work with them in developing their ability to provide electronic text as a replacement for the paper copies currently being purchased.
In the past two years, six additional publications have been added to the system, all but one full text (the lone exception is NASA SCAN, which is a bi-weekly collection of abstracts from the world's aerospace and aviation literature). The current sources available are shown in Figure 1.
Figure 1: Sources available through electronic distribution.
The system has allowed the Technical Library enough flexibility to provide full access to all publications of interest through bulletin boards, filtered output to individuals when needed, or selective distribution to only certain user groups as appropriate. Savings of over $300,000 in annual subscription costs have been realized, and more than 50,000 users are being reached through the broadest bulletin board distributions. Almost 600 individual filtering profiles have been built for the various databases, and the information is getting to every individual on the day of publication, since all processing is done the night before the publication date. This usage, in combination with demand for more publications and a growing need for graphics delivery, was putting pressure on existing resources by mid-1994 (see Figure 2).
Figure 2: Monthly e-mail messages from the Technical Libraries automated delivery system.
The first opportunity to provide a centralized information repository that would be accessible from all platforms within the company came with the advent of Web technology. In January 1995, the Technical Libraries moved many of the full-text publications being distributed via e-mail to an internal Web site, and modified the e-mail distribution to work in conjunction with the Web.
Users now receive filtered notices of new publications as they arrive in summary format, with the URL of the full article as part of the mail message so that the document can be retrieved in full (with any associated images) from the Web. This combination of push (e-mail notification) and pull (Web repository) works well to reduce the total amount of e-mail traffic and to get more complete information to the users than was possible with the text-only e-mail system (Figure 3 and Figure 4).
Figure 3: e-mail notification of new document arrival.
Figure 4: Document pointed to from e-mail message.
As the Boeing Web grew, it became clear that it was an effective communication tool for the entire company, and the corporate offices began to explore the possibility of placing company-wide information bulletins on the Web as part of their communications plan. The Technical Library assisted in this effort, since many of the requirements for currency and management of the information were similar to those already being met for the full-text publications being offered on the library server. One of the major difficulties in this effort was the standardization of input from multiple information producers to allow automatic posting of the bulletins as they were released. Over a several month period a set of guidelines was established and agreed to by the bulletin writers. This allowed the material to be e-mailed to the library server, parsed automatically, and posted to the appropriate Web page without any human intervention. Currently, there are approximately 10 different authors, from locations throughout the United States, sending their copy to the library server via e-mail, which is automatically posted to 6 different Web pages.
As the volume of material available on the internal Boeing Web grew, it became apparent that some cross-company retrieval tool was needed for Web information. In December 1994, we met with Brian Pinkerton from the University of Washington to discuss his efforts with the WebCrawler technology (3). In May 1995, we asked for and received the source code, purchased a Pentium machine, installed the WebCrawler on it and started indexing the Boeing Web. The query screen is shown in Figure 5.
Figure 5: Boeing WebCrawler query screen.
This is now the primary retrieval tool for employees using the Boeing Web, and the source of material for the Company's "What's New" page (Figure 6) and a subject classification of Boeing Web sites, much like the Yahoo (4) directory (Figure 7).
Figure 6: What's New on the Boeing Web.
Figure 7: Subject Classifications on the Boeing Web.
The subject classifications are based on codes (Derived from COSATI - Committee on Scientific and Technical Information - codes) used by the library for indexing its paper publications in the catalog, and are integrated with the Company thesaurus, which will allow finer breakdowns as the volume of material grows with time. Individual information owners on the Boeing Web are responsible for registering their own information, and selecting the appropriate classifications for the content from the subject codes through a fill-in form.
In May 1995, we purchased the TOPIC Internet Server (5) as a longer term, more robust solution to Web indexing. We are currently working with Verity to define the requirements and solutions for a large distributed Web site, and have begun using the indexer for the large full-text collections on the Library's server, as well as a few key servers in other parts of the company.
Each of the library's individual collections and the information being posted by the Company Communications offices are searchable as separate collections from the main index page for the material. It is also possible to search across any combination of collections as needed through the central search page (Figure 8). The results integrate internal company information with full-text publications available on the library Web server, allowing employees to make use of both external and internal sources.
Figure 8: Library server search page.
As of January 1, 1996 over 250 separate sites are being indexed and classified under the subject codes on the library web server, and about 20,000 users are accessing the materials provided through the server. Transfer volumes are about 600 megabytes/day, and the library server is being hit 60,000 times a day on a consistent basis. Growth rates are approximately 4% per month, with no end in sight. The WebCrawler is handling over 2,000 queries per day (Figure 9).
Figure 9: Growth on the WebCrawler searching engine.
The Technical Library has purchased a replacement system for the mainframe based on-line catalog, and will be converting to the new system in 1996. One of the primary requirements for the new system was that access would be available through a Web browser. This capability, along with the system's ability to provide cataloged records of Web sites, will allow integration off the library's traditional collections with the electronic resources available through the Web, and provide a valuable management tool for the portions of the Web, both internal and external, which are judged significant enough to benefit Boeing's business operations.
Also in 1996, the TOPIC Internet Server will be used to index all registered company Web sites in-depth, and the Library will be cataloging those sites in the appropriate subject codes for cross-site or subject specific retrieval. In conjunction with this effort, a link checking service will be run on registration to help webmasters check and keep date documents that are on the web up to date. As people register their sites they will automatically be link-checked, with the results e-mailed to the submitter, and subsequently queued for checking on a preset interval.
The link checker will also check expire dates in the meta tags to ensure that outdated documents are not included in the index. The expiration date is recommended in the Company Web authoring style guide (6), and will eventually be a required field for inclusion of a site in the index. Sites submitted for indexing will also be automatically checked for html validity to insure that the information is readable across the wide variety of browsers currently running within Boeing.
As the volume of information available on the internal Web grows, implementation of intelligent agent technology is planned to supplement the existing library mediated filtering service. The deep indexing done by the Topic Internet Server will provide the agent with regular update cycles for information throughout the company, ensuring that users will receive notification when information on subjects of interest to them is available.
So what does all this tell us about the use of the Web in a large corporate Intranet environment? Clearly, the Web's platform independent nature is the one key attribute that allowed the services described above to be implemented as quickly and easily as they were. The ease of use and installation are also key elements - the growth of a company-wide Web within Boeing took only 6 months to implement.
One key factor in all this effort was the cross-organizational cooperation that allowed the rapid implementation of the infrastructure and support tools needed to keep the momentum alive and sustainable. Organizations stepped outside normal boundaries to share responsibility for key activities, and much of the original development work was done by cross-company teams of technical and management employees, including development of a company-wide web policy and guidelines for use (7, 8).
The traditional role of the Technical Libraries, as managers of the company's external information resources and provider of access to published information, has broadened somewhat with the Web, but the principles of information management and retrieval remain the same. The expertise and tools already available in the libraries provided a focal point for managing the information on the Web, and allowed integration of Web-based information into the resources already available through other means.
What is not so clear is how the next steps will be implemented. As requirements for information delivery through the Web become more sophisticated, the original simple model is rapidly becoming obsolete. Issues such as accurate information retrieval, validation of information, management of information, and support of a large and varied hardware and software environment are becoming more and more important.
Much of the work planned in 1996 is the direct result of the increasing visibility of outmoded and difficult to find information on the internal Boeing Web. Because the Web developed from a grass-roots level, the planning and forethought varied greatly depending upon the individual site owner's capabilities and discipline.
The implementation of a centralized validation tool, along with an indexing service to provide cross company access to information in distributed repositories seems to be the key to long term accessibility of information on the internal Web. The ability to integrate company information with external information is also important, since much of the material needed by employees can only be obtained from outside resources available through site licenses administered through the Technical Libraries. Inclusion of the library catalog in the Web allows the pool of resources already available to employees through the Technical Library to be tied to the very different information available on the Web.
The Web offers the potential of providing a valuable tool for accessing information in a large distributed company like Boeing, but also brings along with it the responsibility for providing the services that ensure the long term usefulness of that information. Rapid implementation on a central site of a few key tools and services, based on pre-existing resources, seems to be the key to success in Boeing's integration of their electronic resources. The next year will be an interesting one, and should provide more valuable lessons for others on the verge of jumping into this maelstrom. We hope to share them with you in 1997.
Mike Crandall
External Information Systems Requirements Librarian
The Boeing Company
PO Box 3707, M/S 62-LC
Seattle, WA 98124-2207
crandall@atc.boeing.com
Mike Crandall received his MLS from the University of Washington in 1986, and has been working on the development of the electronic systems described in this paper for the past 5 years. His primary interests are in the area of filtered and broadcast electronic information distribution, and effective management of that information.
Mark C. Swenson
System Analyst
The Boeing Company
PO Box 3707, M/S 6C-98
Seattle, WA 98124-2207
mark.swenson@boeing.com
Mark C. Swenson received his BS in Computer Science from the University of Idaho in 1985, and has been working on the systems described in this paper since the inception in 1990. His primary interests include information retrieval, information filtering, and the World Wide Web.
1. Feb. 26, 1996 Business Week "Here Comes the Intranet"
2.Bair, J. "Verity's Veracity as a Veritable Leader in Content Retrieval". Gartner Group, Office Information Systems, June 21, 1995.
3. Pinkerton, Brian. "Finding What People Want: Experiences with the WebCrawler". Electronic Proceedings of the Second World Wide Web Conference '94: Mosaic and the Web.
5. Verity, Inc. TOPIC Internet Server product information.
6. Boeing Web Style Guide, Document No. 6-6500-WEB-95-01. August 5, 1995.
7. Boeing Internal Web Use, Intracompany Procedure IC-CDN-321. September 7, 1995.
8. User Guide and Process Flows for Employees Accessing or Publishing Information on the Boeing Internal Web, Document No. BCS-G-3368. July 29, 1995.