(1)
Ethical Web Agents(1)
David EichmannRepository Based Software Engineering Program
Research Institute for Computing and Information Science
University of Houston -- Clear Lake
2700 Bay Area Boulevard
Houston, TX 77058
eichmann@rbse.jsc.nasa.gov
As the Web continues to evolve, the sophistication of the programs that are employed in interacting with it will also increase in sophistication. Web agents, programs acting autonomously on some task, are already present in the form of spiders. Agents offer substantial benefits and hazards, and because of this, their development must involve not only attention to technical details, but also the ethical concerns relating to their resulting impact. These ethical concerns with differ for agents employed in the creation of a service and agents acting on behalf of a specific individual. An ethic is proposed that addresses both of these perspectives. The proposal is predicated on the assumption that agents are a reality on the Web, and that there are no reasonable means of preventing their proliferation.
The ease of construction and potential Internet-wide impact of autonomous software agents on the World Wide Web [1] has spawned a great deal of discussion and occasional controversy. Based upon our experience in the design and operation of the RBSE Spider [5], such tools can create substantial value to users of the Web. Unfortunately, agents can also be pests, generating substantial loads on already overloaded servers, and generally increasing Internet backbone traffic.
Much of the discussion to date has been directed towards this single perspective, the impact of an agent (and indirectly, the operator of that agent) on the Web. Attention more recently has turned to the impact that an agent can have upon its operator - both positively, when needed information resources are ferreted out, and negatively, when an agent's actions results in mail-bombings and (in the near future) substantial financial charges for its operator.
This paper addresses our work in building a spider that is both a good Web citizen and a provider of a generally-useful resource, and how such approaches can be scaled to the increasingly massive information infrastructure of the Web. I contrast this approach with the provider-focussed approach of systems such as ALIWEB [17] and the agents hinted at in such venues as the recent television commercials by AT&T. I conclude with a proposed architecture to support ethical behavior while still allowing operator-defined search in the rapidly evolving Web.
This paper's motivation is the confluence of what was until recently, two distinct threads of research, development of intelligent software agents and development of hypermedia systems (and more specifically, distributed hypermedia systems). This confluence was first exhibited with the emergence of Web spiders(2) in 1993, when the Web grew to a sufficient size as to be interesting to study as an artifact in its own right. A brief review of recent work in software agents and Web spiders will be useful in setting a context for later sections of the paper.
An agent is a program which interacts and assists an end user. Research in this area has been driven primarily by the artificial intelligence research community, since an unintelligent agent is not of much use. The special issue of Communications of the ACM edited by Riecken [30] provides a good overview of the area. Agent research presents two useful perspectives for a discussion of ethics.
The first concerns the nature of interaction between agents themselves. Genesereth and Ketchpel [12] present an overview of the Knowledge Interchange Format (KIF) and the Knowledge Query and Manipulation Language (KQML), developed as part of the ARPA Knowledge Sharing Effort. Guha and Lenat [13] describe the Cyc project's evolution from an emphasis on addressing brittleness in expert systems to an emphasis on assisting with information management.
The second perspective concerns the nature of interaction between an agent and an end user. Kautz, et. al. [16] focus on what they refer to as "mundane tasks - setting up meetings, sending out papers, locating information in multiple databases, tracking the whereabouts of people, and so on." They distinguish between userbots (mediators for a specific user) and taskbots (responsible for carrying out a specific task). Maes [22] focuses on the reduction of information overload though agents that are trained using machine learning techniques to handle electronic mail, news filtering and meeting scheduling.
AT&T's recent television commercial involving a animated dog posting a note on a computer screen and responding to voice commands (including praise from its presumed owner!) is the natural (albeit still fictional) extension of this work. Norman [28] offers a reflective perspective on how the field might evolve.
A Web spider is a program that autonomously explores the structure of the Web and takes some action upon the artifacts thereby encountered. This action might be as simple as counting the number of artifacts found, or as complex as a full text indexing of the contents of the artifact. Given the relative ease with which a spider can be constructed, it actually somewhat surprising that there are only twenty-odd spiders documented to date [19]. A representative sample of research work in spiders follows, refer to [7] for a more complete survey.
The RBSE Spider [5] retains both the structure of the Web as a graph representation is a relational database and a full text index of the HTML documents encountered. Searches can be specified either as SQL queries against the relation, supporting information such as is displayed in Figures 1 and 2, or against the full text index, providing relevance-ranked results. McBryan's World Wide Web Worm (WWWW) [23] retains a more limited information base, comprised of the document's title and contained anchor information. Search is limited to scanning individual records using pattern matching.
Fielding's MOMspider [10] was designed as a maintenance tool for large webs of HTML documents. It reaches out only to validate the existence of a document corresponding to the URLs found in documents appearing in its maintenance list.
The fish search mechanism of De Bra and Post [3] is a completely distinct category, based upon an executing instance of a modified Mosaic, and supporting a spreading activation of URL retrievals similar in behavior to schooling fish (hence the name). Fish search results are transient, available only to the specific Mosaic user and only for the duration of that execution of Mosaic.
There are distinct benefits to be had in Web spiders, the most obvious of which is improvement in user satisfaction through effective search of the Web -- its scale has completely outstripped an individual's ability to assess and comprehend it. The problem has become similar to that experienced by users of anonymous FTP. Archie arose as a response to the increasing difficulty with locating specific software packages among hundreds of FTP servers. Search-directed access has the potential to reduce traffic by reducing revisitation and casual browsing to "see what's there." Whether this potential will be realized is an open question. Spiders also offer the opportunity to support archivists in the construction of virtual neighborhoods of information. Currently archivists are dependent upon the suggestions of users of the archives or providers who volunteer a description of their information.
Poorly designed spiders can severely impact both overall network performance and the performance of the servers that it accesses. Many of the known spiders fail to control document retrieval rates, repetition of requests or the retrieval of low-value artifacts (e.g., GIFs). Authors of new spiders frequently fail to address the question of infinite regress, whether in the form of cycles in the graph (obvious) or in the form of dynamic documents (not so obvious). Many of the early attempts at spider construction resulted in information sinks, where information flowed in and little or no information flowed out (frequently, all that flowed out was a simple metric, such as x servers located, or y documents retrieved). Much of this was due to a mixture of naivete and the cloak of anonymity. As providers shifted from pride over access counts to load management, peer pressure and education limited the more egregious cases.
Hypertext browsing is easily argued as one of the principle strengths and reasons for success of the Web. The Web has demonstrated a practical realization of much of the theoretical work in hypertext. At the same time, there were key aspects missing from the Web's technological infrastructure missing at its inception, the most critical of which was support for indexing.(3) McLeese [25] argues that browsing is central to effective hypertext -- making a distinction between navigation and browsing -- but also observes that a variety of tools are required to most effectively select where to go next within hypermedia. This isn't to say that browsing is not central to hypertext in general and the Web in particular.(4) For example, Campagnoni and Ehrlich [2] found that users of a hierarchical hypertext preferred browsing over indices for navigation. It's instead my claim that it is time that designers and providers of Web services and infrastructure to look to the literature for empirical evidence of what works and what doesn't, and act accordingly.
The important question to ask here is not "what can be built in such a manner as to limit resource consumption?" which is the refrain of opponents of even the concept of Web agents, but rather "what can be built in order to make usage of the Web more effective?" The explosive growth of the Web(5) is making it increasingly difficult to accomplish an effective search for information in a specific area. Effective is defined here as finding everything that is relevant to the search and not finding things that are not relevant -- the traditional sense of the word in library science. Multiple access modes are becoming a necessity as the Web scales up.
Navigation itself has been demonstrated to have distinct modes. Monk [26] distinguishes between directed navigation, where the user traverses a known path to a known artifact, and exploratory navigation, where the user is attempting discovery of previously unknown artifacts. His personal browser foreshadows the Mosaic hotlist facility as a facility supporting directed navigation. Frisse and Cousins [11] distinguish between local and global navigation, and claim that global navigation requires an index space distinct from the document space. This is a key distinction in relating traditional results in hypermedia research, where the corpus resided on a single host, and the Web, where the basic premise is massive decentralization.
The nature of the artifacts themselves can color user preferences for access mechanisms. Dillon, et.al. [4] and Van Dyke Parunak [32] comment upon the likelihood of disorientation as nonlinearity increases. Wright and Lickorish [34], comparing an index-based navigation scheme with one more hierarchical in nature, found readers preferred the latter for book-like text, but the former for more modular information.
Just what is the structure of the Web? One of the rationales for the separation in the RBSE Spider [5] of the discovery and storage of Web structure from the indexing and retrieval of the HTML documents thereby retrieved was the ability to generate queries concerning the characteristics of the Web itself. The February 1994 run of the RBSE Spider produced a graph comprised of approximately 36,000 HTML documents, 62,000 distinct target artifacts and 182,000 total edges. As can be seen in Figures 1 and 2, the modularity of the Web is extremely high. While the number of hyperlinks contained within a given document is fairly high (>10 links per document in our index), only a third of those hyperlinks are to other documents (~ 3.5 inter-document links per document in our index), and documents so referenced are referred to in general by only a few other documents (59% of the documents have a single inbound link and 96% have five or fewer). There are, of course, notable exceptions in the data:
Figure 1: External URL in documents
Figure 2: Inbound URL references to documents
- pruned from the right side of Figure 1 are two documents, with 3469 and 1184 outbound hyperlinks, respectively;
- most points pruned from the top of Figure 1 are in the range of 400 or fewer documents, but there are 12,398 documents with precisely 5 links (the spider mapped an on-line thesaurus with a very regular structure); and
- pruned from the right side of Figure 2 are 4 documents with approximately 12,000 inbound links (we'll leave to the reader just which URLs these might be... ).
Clearly, if this is a representative sample of the web, having one-half of the artifacts reachable from a single other document in the web implies that much of this traffic is likely to be redundant -- users must wander from node to node to discover information, and frequently revisit nodes in doing so.
The concept of an ethic for what is basically an information system might at first seem strange. In fact, we have an excellent example of how a virtual community evolves a set of ethics in a setting of potentially complete chaos -- Usenet news. The readership has evolved over the years a consensus as to how newsgroups come into existence, and what behavior is appropriate within specific groups. The only true constraints are those generated by peer pressure on the offending user and their system administrator. This section begins with a review of the consensus that has arisen amongst participating spider authors and resource providers, describes a similar manifesto originating in the intelligent software agent community, and concludes with a proposal for a web ethic.
Koster's guidelines [18], authored as a means of addressing the increasing load on his server by spiders, was the first wide-ranging attempt at Web ethics. It served as a basis for discussion amongst spider authors and resource providers, and in conjunction with his "list of robots" [19], began to create the first community pressure on spider authors to act ethically. Briefly, the guidelines entail:
- reconsider... (is another spider really needed?);
- identify the spider, sources of additional information, and yourself;
- test locally;
- moderate your speed (within a single run) and frequency (between runs) of access to any given host;
- retrieve only what you can handle - both in format and in scale;
- monitor your runs (there are "black holes" out there!); and
- share your results.
The guidelines were not intended to suppress legitimate research or resource discovery done as a part of the creation of an information service, but rather were intended to stem the tide of "because it's there" spider implementations. As mentioned above, the result has been an emerging consensus on appropriate Web behavior that is still operating reasonably well.
The difficulty with the guidelines is that they still don't provide a means for an information provider to indicate to a running spider that portions or even the entire file space of their server should be off-limits. The robot exclusion standard [20] was defined to address just this. The scheme entails the creation of a file on the server with a standard path (/robots.html) and contents detailing the nature of the desired constraints. For example, the definition appearing in Figure 3 indicates that all spiders should avoid this server, with the exception of alpha and beta, which are granted access to /private, and gamma, which is granted complete access. Further details are available in [20]. Note that while a provider can specify as many constraints on spiders as wished through an exclusion definition, it is still up to the spider itself to check for the existence of the file and adhere to its constraints.
Etzioni, et.al. define a softbot as
"an agent that interacts with a software environment by issuing commands and interpreting the environment's feedback. ... [a] softbot's sensors are commands meant to provide the softbot with information about the environment... Due to the dynamic nature and sheer size of real-world software environments it is impossible to provide the softbot with a complete and correct model of its environment... "[8]
envisioning a construct very similar to a Web agent. Softbots also effect change on their environment, leading to the formulation of a collection of softbotic laws (intentionally derivative of Asimov's laws of robotics) [9, 33]:
- Safety -- The softbot should not destructively alter the world.
- Tidiness -- The softbot should leave the world as it found it.
- Thrift -- The softbot should limit its consumption of scarce resources.
- Vigilance -- The softbot should not allow client actions with unanticipated results.
Any scheme of ethics for Web agents must address the motivation of the users who employ these agents, of the individuals that author agents and the providers of information resources that the agents access. Users are seeking guidance and organization in a chaotic, dynamic information framework. They are in a process of exploration when using the results of agents, since other mechanisms (i.e., hotlists and personal pages) serve only the need for directed navigation. Web agent authors respond to this need with a service that, if we believe the feedback received, far outweighs the problems created by their progeny. Even those authors who act in isolation are in a mode of exploration - the concept of "power user" carried to the extreme, where the entire world is at their fingertips. Information providers are interested in dissemination of their artifacts (why else publish them?). The issue is hence one of striking an appropriate balance between interests and concerns - accessibility for the individual balanced against accessibility for the community.
The guidelines/exclusion perspective and the softbotics perspective offer similar approaches to attempting such a balance. However, neither operates from assumptions that match completely with what the Web promises to become.
- The Web is a distributed information resource, and because of this, much of the research results available relating to more traditional hypermedia is not directly applicable.
- The Web is highly dynamic, and will remain so for the foreseeable future.
- Agents are too valuable a user resource given the chaotic nature of the Web for them not to be employed, even if their usage is forced into simulation of user behavior in order to avoid detection.
- Commercial offerings of spider-generated information services will appear in the near future. (When was the last time that you took a really close look at your log files?)
Factoring agent functionality into smaller categories offers a useful means of examining the issues. As shown in Figure 4, these categories include user-agent interaction, agent-agent interaction, and agent-server interaction. I specifically exclude user - server interaction, as this is the normal mode of activity on the Web, and doesn't involve agents. Koster's ALIWEB scheme [17] entails agent - server interaction in its accretion mode, when the server extracts current index information from participating servers and forms its aggregate index, and user - agent interaction when a user employs a Web client to search the index. Kahle's Wide Area Information Server (WAIS) system [15], entails a degenerate form of agent - server interaction when a server generates a local index for a collection of artifacts, a meta form of agent - agent interaction when a server posts a description of itself to a directory of servers, and two forms of user - agent interaction - one at a meta level then interrogating the directory of servers for information about servers and one at a normal level of interrogating a server about its contents. Other approaches include that of McKee [24], where a WAIS index is generated for the documents available from a particular server (another degenerate form of agent - server interaction), and the MORE system [6], which supports localized searching of metadata about artifacts residing anywhere in the Web, where two forms of user - agent interaction take place - one by the librarians posting information to the repository concerning already known artifacts, and one by the users interrogating the repository for referrals to specific artifacts on specific servers.
Figure 4: Categories of Agent Functionality
The spiders discussed so far fall into the general category of task agents - accomplishing a specific task and generating a result for subsequent multiple uses. The single exception to this is the fish search mechanism, which is readily identifiable as a user agent. An architecture that is capable of scaling to millions of artifacts on tens of thousands of servers accessed by millions of users requires aspects of both of these categories. Because of this, I propose a separation of design concerns into those which relate to service agents - those interacting with servers on the Web in the formation of information bases (thereby making themselves servers in their own right), and user agents - those interacting with servers on the Web in direct support of a particular individual.
A service agent should adhere to the following ethical guidelines:
- Identity - a service agent's activities should be readily discernible and traceable back to its operator.
- Openness - the information generated by a service agent should be generally accessible to a community whose size relates directly to the scope of the agent's activities.
- Moderation - the pace and frequency of information acquisition should be appropriate for the capacity of the server and the network connections lying between the agent and that server.
- Respect - a service agent should respect the constraints placed upon it by server operators.
- Authority - a service agent's services should be accurate and up-to-date.
Clearly a balance must be struck between the concerns of openness, moderation and respect, which limit a service agent's scope and activities, and the concern of authority which broadens them. A service agent that is not authoritative will not be employed, but one that is renegade [29] is as damaging to the Web as the impact avoided by the community it supports.
A user agent should adhere to the following ethical guidelines:
- Identity - a user agent's activities should be readily discernible and traceable back to its user.
- Moderation - the pace and frequency of information acquisition should be appropriate for the capacity of the server and the network connections lying between the agent and that server.
- Appropriateness - a user agent should pose the proper questions to the proper servers, relying upon service agents for support regarding global information and servers for support for local information.
- Vigilance - The user agent should not allow user requests to generate unanticipated consequences.
The current state of the Web makes the goal of appropriateness difficult to attain for a user agent. The service agents in existence at the current time are for the most part experimental, and hence still struggling with their own issues, particularly in the areas of moderation, respect and authority. Vigilance is a concern that has not yet even begun to generate the impact that it will. As commercial providers become an increasing presence on the Web and evolve towards services with strong authority, limiting the impact to a user of their agent's activity against fee-for-service agents will become a critical concern. Pottmyer has proposed some initial approaches to address this area [29].
While it's easy to argue that there really are not any resolved issues concerning Web ethics at this point, there are some key areas as yet unaddressed that deserve mention:
- How do we construct virtual neighborhoods of information? If much of the network traffic generated currently is by users in only partially successful search for information, generating stronger bindings of conceptually related artifacts should reduce that traffic.
- How do we support user comprehension of the existence of and access to these virtual neighborhoods? The "flying" mechanism of Lai and Manber [21] offers a novel means of generating an overall sense of organization of a hypertext, similar to flipping through the pages of a book. This type of visualization mechanism might assist in the formulation of users' mental models of the Web.
- Pottmyer in referring to the "organic nature" of the Internet [29] mentions cooperation for the good of the organism. The Web has a great deal of user level cooperation underway, and is predicated upon the assumption of server level cooperation. Agent research should address this issues as well.
Text-based indices are not a panacea for the Web. Monk observes that keyword-based indices exhibit similar problems to browsers with sufficiently large hypertexts [27]. Future work by the RBSE research group will include:
- Use of the spider to construct indices of sub-webs based on number of transitive hyperlinks from a starting point (result is an index of a connected subgraph);
- Use of the spider to sweep the Web identifying artifacts matching a domain profile and indexing only the matches (result is an index of a (potentially) disconnected subgraph); and
- Temporal studies of the Web, allowing assessment of mutation rates and fine-grained growth patterns.
The resulting knowledge will be used to modify the spider for more intelligent access patterns, integration of exploration results into the MORE storage scheme, and definition of a conceptual (as opposed to keyword) search engine.
- [1]
- Berners-Lee, T., R. Cailliau, A. Loutonen, H. F. Nielsen and A. Secret, "The World-Wide Web," Communications of the ACM, v. 37, n. 8, August 1994, p. 76-82.
- [2]
- Campagnoni, F. R. and K. Ehrlich, "Information Retrieval Using a Hypertext-Based Help System," ACM Transactions on Information Systems, v. 7, n. 3, 1989, p. 271-291.
- [3]
- De Bra, P. M. E. and R. D. J. Post, "Information Retrieval in the World-Wide Web: Making Client-based searching feasible," First International Conference on the World Wide Web, Geneva, Switzerland, May 25-27, 1994, pages 137-146.
- [4]
- Dillon, A., C. McKnight and J. Richardson, "Navigation in Hypertext: A Critical Review of Concept," Proc. of IFIP INTERACT `90: Human-Computer Interaction, Detailed Design: Hypermedia, 1990, p. 587-592.
- [5]
- Eichmann, D. "The RBSE Spider -- Balancing Effective Search Against Web Load," First International Conference on the World Wide Web, Geneva, Switzerland, May 25-27, 1994, pages 113-120.
- [6]
- Eichmann, D., T. McGregor and D. Danley, "Integrating Structured Databases Into the Web: The MORE System," First International Conference on the World Wide Web, Geneva, Switzerland, May 25-27, 1994, pages 369-378.
- [7]
- Eichmann, D., "Advances in Network Information Discovery and Retrieval," submitted to the International Journal of Software Engineering and Knowledge Engineering.
- [8]
- Etzioni, O., N. Lesh and R. Segal, Building Softbots for UNIX (Preliminary Report), University of Washington, Seattle, WA, November 1992.
- [9]
- Etzioni, O. and D. Weld, "A Softbot-Based Interface to the Internet," Communications of the ACM, v. 37, n. 7, July 1994, p. 72-76.
- [10]
- Fielding, R. T., "Maintaining Distributed Hypertext Infostructures: Welcome to MOMspider's Web," First International Conference on the World Wide Web, Geneva, Switzerland, May 25-27, 1994, pages 147-156.
- [11]
- Frisse, M. E. and S. B. Cousins, "Information Retrieval from Hypertext: Update on the Dynamic Medical Handbook Project," ACM Hypertext `89, Information Retrieval I, 1989, p. 199-212.
- [12]
- Genesereth, M. R. and S. P. Ketchpel, "Software Agents," Communications of the ACM, v. 37, n. 7, July 1994, p. 48-53,147.
- [13]
- Guha, R. V. and D. B. Lenat, "Enabling Agents to Work Together," Communications of the ACM, v. 37, n. 7, July 1994, p. 126-142.
- [14]
- Hayes, B., "The World Wide Web," American Scientist, v. 82, September-October, 1994, p. 416-420
- [15]
- Kahle, B., Wide Area Information Server Concepts, Thinking Machines Inc., November 1989.
- [16]
- Kautz, H. A., B. Selman and M. Coen, "Bottom-Up Design of Software Agents," Communications of the ACM, v. 37, n. 7, July 1994, p. 143-147.
- [17]
- Koster, M., "ALIWEB -- Archie-Like Indexing in the WEB," First International Conference on the World Wide Web, Geneva, Switzerland, May 25-27, 1994, pages 91-100.
- [18]
- Koster, M., "Guide for Robot Writers," Nexor Corp., http://web.nexor.co.uk/mak/doc/robots/guidelines.html.
- [19]
- Koster, M., "List of Robots," Nexor Corp., http://web.nexor.co.uk/mak/doc/robots/active.html.
- [20]
- Koster, M., "A Standard for Robot Exclusion," Nexor Corp., http://web.nexor.co.uk/mak/doc/robots/norobots.html.
- [21]
- Lai, P. and U. Manber, "Flying Through Hypertext," ACM Hypertext `91, Presentation Issues, 1991, 123-132.
- [22]
- Maes, P., "Agents that Reduce Work and Information Overload," Communications of the ACM, v. 37, n. 7, July 1994, p. 30-40,146.
- [23]
- McBryan, O. A., "GENVL and WWWW: Tools for Taming the Web," First International Conference on the World Wide Web, Geneva, Switzerland, May 25-27, 1994, pages 79-90.
- [24]
- McKee, D., "Towards Better Integration of Dynamic Search Technology and the World-Wide Web," First International Conference on the World Wide Web, Geneva, Switzerland, May 25-27, 1994, pages 129-135.
- [25]
- McLeese, R., "Navigation and Browsing in Hypertext," HYPERTEXT I: Theory into Practice, 1988, p. 6-44.
- [26]
- Monk, A., "The Personal Browser: A Tool for Directed Navigation in Hypertext Systems," Interacting with Computers, v. 1, n. 2, 1989, p. 190-196.
- [27]
- Monk, A. F., "Getting to Known Locations in a Hypertext," HYPERTEXT II: State of the Art, Navigation and Browsing, 1989, p. 20-27.
- [28]
- Norman, D. A., "How Might People Interact with Agents," Communications of the ACM, v. 37, n. 7, July 1994, p. 68-71.
- [29]
- Pottmyer, J., "Renegade Intelligent Agents," SIGNIDR V - Proc. Special Interest Group on Networked Information Discovery and Retrieval, McLean, VA, August 4, 1994. Presentation slides available as http://wyww.wais.com/SIGNIDR/Proceedings/SA3/.
- [30]
- Riecken, D., "Intelligent Agents: Introduction to Special Issue," Communications of the ACM, v. 37, n. 7, July 1994, p. 18-21.
- [31]
- Schatz, B. R. and J. B. Hardin, "NCSA Mosaic and the World Wide Web: Global Hypermedia Protocols for the Internet," Science, v. 265, August 12,1994, p. 895-901.
- [32]
- Van Dyke Parunak, H., "Hypermedia Topologies and User Navigation," ACM Hypertext `89, Navigation in Context, 1989, p. 43-50.
- [33]
- Weld, D. and O. Etzioni, "The First Law of Robotics (a call to arms)," Proc. of the 12th National Conference on AI, Seattle, WA, July 13 - August 4, 1994.
- [34]
- Wright, P. and A. Lickorish, "An Empirical Comparison of Two Navigation Systems for Two Hypertexts," HYPERTEXT II: State of the Art, Navigation and Browsing, 1989, p. 84-93.
David Eichmann is an assistant professor of software engineering at the University of Houston - Clear Lake and director of research and development of the Repository Based Software Engineering Program. Besides normal academic duties, his responsibilities include management of a research and development group working in the areas of reuse repositories and reengineering. He joined the UHCL software engineering faculty in August of 1993 after visiting for a year in his role with RBSE. He previously held positions at West Virginia University, where he lead the Software Reuse Repository Lab (SoRReL) group, and at Seattle University.
Email: eichmann@rbse.jsc.nasa.gov
Footnotes
- (1)
- This work has been supported by NASA Cooperative Agreement NCC-9-16, RICIS research activity RB02.
- (2)
- A number of terms are used for programs which autonomously navigate Web structure: wanderers, robots, worms, even fish. My use of spider is intended to include all of these.
- (3)
- That is, most critical from the perspective of this paper's focus. Other key aspects relating, for example, to expressiveness - support for tables and equations in HTML, etc. are not relevant to the discussion here.
- (4)
- The requirement for navigation is frequently cited as being, in part, cause of the demise of network database systems and non-navigational access has been, in part, the success of relational database systems.
- (5)
- Just how explosive is reflected in the fact that articles appearing in such generic venues as American Scientist [14] and Science [31] discuss the phenomenon of the Web as much as they do the technology.