This paper presents an environment for publishing information on the World-Wide Web (WWW). Previous work has pointed out that the explosive growth of the WWW is in part due to the ease with which information can be made available to Web users [Weibel 94]. Yet this property can have negative impacts on the ability to find appropriate information as well as on the integrity of the information published. We present a prototype environment that facilitates the publishing of documents on the Web by automatically generating meta-information about the document, communicating this to a local scalable architecture, e.g WHOIS++, verifying the document's HTML compliance, maintaining referential integrity within the local database, and placing the document in a Web accessible area. Additionally, maintenance and versioning facilities are provided. This paper first discusses an idealized publishing environment then describes our implementation, followed by a discussion of salient issues and future research areas.
This paper describes a system we have implemented that enables people to share structured in-place annotations attached to material in arbitrary documents on the WWW. The basic conceptual decisions are laid out, and a prototypical example of the client-server interaction is given. We then explain the usage perspective, describe our experience with using the system, and discuss other experimental usages of our prototype implementation, such as collaborative filtering, seals of approval, and value-added trails. We show how this is a specific instantiation of a more general "virtual document" architecture in which, with the help of light-weight distributed meta information, viewed documents can incorporate material that is dynamically integrated from multiple distributed sources. Development of that architecture is part of a larger project on Digital Libraries that we are engaged in.
This paper addresses the production, use, and maintainability of hypermedia-based on-line documentation in industrial plants. The proposed method for addressing all these issues is the utilisation of an information model. This paper describes how an information model can be used for structuring the documentation, for automatic generation of hypertext links, and as a basis for the design of user interfaces that support information retrieval. The tools for modelling and automatic link generation are briefly described. Two case studies utilizing the WWW for the implementation of on-line documentation are discussed.
New WWW services may be created either by extending Web protocols or by adding services in a lower layer. The DCE Web toolkit demonstrates that a broad array of new services can be provided in a layer below HTTP. Toolkit services include security, naming, and a transport-independent communications interface. Web applications can take advantage of these services by communicating their current protocols, such as HTTP, over the toolkit layer. The toolkit provides our prototype Web implementation with many new features, including security and location-independent hyperlinks without modification of the HTTP protocol and only minor changes to Web applications.
A Uniform Resource Name (URN) capability is being developed for the WWW. While this will provide important new capabilities for the WWW, the reliability of name servers in which a URN will be used is questionable. This paper looks at the basis of unreliability for name servers and proposes a scalable-cost solution for addressing the reliability problem.
Repeated access to WWW pages currently makes inefficient use of
available network bandwidth. A Distribution Point Model is proposed
where large and relatively static sets of pages (e.g. magazines or other
such media) are distributed via bulk multicast to LAN distribution
points for local access. Some access control issues are discussed.
Hopwise Reliable Multicast (HRM) is proposed to simplify reliable
multicast of non real time bulk data between LANs. HRM uses TCP
for reliability and flow control on a hop by hop basis throughout a
multicast distribution tree created by today's Internet MBone.
In this paper, a technique for structuring large amounts of interdependent data is presented. This approach which facilitates graph-based hierarchical structuring and allows for the definition of arbitrary views on graph structures can be applied to a broad range of very different application areas. Based on this concept we implemented a distributed software development environment supporting cooperative work on top of the World Wide Web. In general, the approach is intended to serve as a basis for decentralized efforts to tame the immense and hardly manageable collection of data accessible in the Web.
The National Space Science Data Center (NSSDC), located at NASA's Goddard Space Flight Center, provides access to a wide variety of data from NASA spaceflight missions. Traditionally, the NSSDC has made data available in a variety of hard media and via on-line systems. However, the WWW provides an exciting new enhancement by allowing users to examine and browse the data before retrieval. This paper presents OMNIWeb, a WWW-based system that allows users to produce plots and retrieve data. The browsing and retrieving capability was designed to aid researchers in identifying trends and to obtain the data through an uninterrupted process.
This paper presents the initial results from the second World-Wide Web User Survey, which was advertised and made available to the Web user population for 38 days during October and November 1994. The survey is built on our architecture and Web technologies, which together offer a number of technical and surveying advantages. In particular, our architecture supports the use of adaptive questions, and supports methods for tracking users' responses across different surveys, allowing more in-depth analyses of survey responses. The present survey was composed of three question categories: general demographic questions, browsing usage, and questions for Web information authors. In addition, we added an additional, experimental category addressing users' attitudes toward commercial use of the Web and the Internet. In just over one month, we received over 18,000 total responses to the combined surveys. To the best of our knowledge, the number of respondents and range of questions make this survey the most reliable and comprehensive characterization of WWW users to date. It will be interesting to see if and how the user trends shown in our results change as the Web gains in global access and popularity.
This paper proposes a series of improvements to Web authoring tools, in the hopes of stimulating the Web developer community to raise authoring tools to a higher standard. Web authoring tools are in their infancy. Most are little more than editors modified to output and display HTML. Few handle multimedia, integrate external conversion utilities, mimic the features of a server necessary to test the full function of the resulting HTML, or allow viewing the resulting HTML using more than one client. None handle more than one proposed extension to the current HTML standard. These and other features proposed here would improve authors' productivity, but more importantly the quality of their creations. When authoring tools integrate client and server functions, authors will be able to see their work as users see it, and this will allow them to better evaluate their material. In order to help the reader understand how these suggested improvements relate to existing capabilities, a review of authoring tools available on Unix platforms is included. The reviewed tools are HoTMetaL Pro, tkHTML, WWWeasel, WebMagic, and ASHE.
Authoring documents for the World-Wide Web is not always an easy task. Most authors either directly type HTML syntax with a text editor or convert files that they produce with various document preparation systems, but both methods pose problems. We propose another approach, based on a structured document editor, Grif. The main characteristics of HTML documents are analyzed and the extensions that these documents have imposed to the Grif editor are presented. With these extensions, Grif becomes a comfortable environment for authoring WWW documents, and it allows better and more rigorously structured documents to be produced. It also allows a smooth evolution towards SGML.
There is a real need for a tool to enable effective collaborative authoring of documents on the WWW. A number of sophisticated tools allow browsing of local and remote files but do not as yet allow authors to modify them. Our approach is to promote the creation of information directly on the WWW and so enable interaction between the different contributors. This approach relies on the use of a structured editing tool which recognizes the structured content of HTML documents and is wired on the network. We discuss various cooperative strategies and user interface issues and how SGML might help in the generalization of collaborative authoring on the WWW.
CoReview, an interactive document and data retrieval tool, has been developed to provide a seamless environment to assist evaluation by groups and individuals, distributed across the Internet, to interact on the progression of a project. It can also assist individuals to interactively put together a document in a collaborative manner. CoReview is based on the strengths of the World-wide Web server, Mosaic, and XTV, an X-window teleconferencing system. While Mosaic will be used to manage the project documents and reviewer annotation files involved in proposals and their evaluation, XTV will aid us in real-time remote collaboration among a group of users. CoReview incorporates the XTV features into a user friendly graphical interface and enables Mosaic to be shared by multiple networked participants. The system architecture embeds the concept of a chair and a set of participants for each document to be managed. The CoReview chair manages the shared resources. CoReview allows for easy creation of a pool of reviewers or proposal writers and automates the process of creating the necessary infrastructure---daemons and directories---at the needed sites.
Human cooperation and interaction is highly dependent on efficient means for communication. While the World Wide Web provides powerful tools for information retrieval, it still lacks support for document authoring. Distributed authoring of documents requires not only the possibility to jointly view and edit a document, but also the ability to create and share annotations. This article examines the issues behind a integrated authoring environment, and introduces a prototype for information access, telecooperation and distributed authoring in a wide area network.
Because of its characteristics, educational uses of WWW can evolve along two
major axes: use of the technology on a closed corpus of educational material,
for the hypermedia and distance delivery capabilities of the web on one hand,
and use of this technology on an organized structure of links for an open
corpus of material that was not necessarily meant for educational use but which
can be "redirected" and exploited in guided educational explorations.
These two axes are not antagonistic, but can be alternatively or
complementarily exploited.
In this paper, we analyze those features of WWW that are most interesting for
educational purposes and show possible sophisticated pedagogical uses of the
web.
The World Wide Web has already been demonstrated to be an excellent mechanism for the distribution of educational resources. However, current browsers do not provide much support for users navigating the Web. This is particularly true in cases where users follow a link from a previously prepared 'trail' and then experience difficulties in returning to the point where they left. This paper describes a guided tour mechanism, known as Footsteps, which has been developed in an attempt to solve this problem.
In this paper we describe a project aimed at addressing the lack of teaching material in high performance computing. The Teaching and Learning Technology programme involves producing courseware modules in hypertext format for students in computer science, as well as in mathematics, engineering, the biological sciences and economics. We consider the educational aspects of constructing sound courseware and detail the approach we have taken.
NCSA is adding support for group and public annotations to its HTTP server and Mosaic client. The primary concern addressed in this paper is how to ensure that the feature is scalable. Our solution requires each document server to tell the client where to get public annotations for a document, whereas the user tells the client where to get group annotations for a document. We argue that our solution is no less scalable than the web itself. Finally, we address the problem of finding out what is new.
The provision and maintenance of truly large-scale information resources on the World-Wide Web necessitates server architectures offering substantially more functionality than simply serving HTML files from the local file system and processing CGI requests. This paper describes Hyper-G, a large-scale, multi-protocol, distributed, hypermedia information system which uses an object-oriented database layer to provide information structuring and link maintenance facilities in addition to fully integrated attribute and content search, a hierarchical access control scheme, support for multiple languages, interactive link editing, and point-and-click document insertion.
This paper focuses on the problem of accessing interactive information servers ("interactive sessions" in W3-terminology, e.g. an existing library information system) directly with a W3-browser instead of opening an additional viewer (e.g. a terminal emulation program). IDLE is a language designed to describe the user interface of interactive information servers in a way that allows automatic translation of W3-requests to server-commands. Interactive information servers described with IDLE can be accessed by any W3-browser via a special translation server. Describing the user interface of an interactive information server is typically much easier than designing a fully featured gateway for each service.
The objective of InfoHarness [shk94] is to provide integrated and rapid access to huge amounts of heterogeneous legacy information through WWW browsers. This is achieved with the help of metadata that contains information about the type, representation, and location of physical data. The proposed InfoHarness Repository Definition Language (IRDL) aims to simplify the metadata generation process. It provides high flexibility in associating typed logical information units with portions of physical data and in defining relationships between these units. The proposed stable abstract class hierarchy provides support for statements of the language that introduce new data types, as well as new indexing technologies.
We have built an HTTP based resource discovery system called Discover that provides a single point of access to over 500 WAIS servers. Discover provides two key services: query refinement and query routing. Query refinement helps a user improve a query fragment to describe the user's interests more precisely. Once a query has been refined and describes a manageable result set, query routing automatically forwards the query to the WAIS servers that contain relevant documents. Abbreviated descriptions of WAIS sites called content labels are used by the query refinement and query routing algorithms. Our experimental results suggest that query refinement in conjunction with the query routing provides an effective way to discover resources in a large universe of documents. Our experience with query refinement has convinced us that the expansion of query fragments is essential in helping one use a large, dynamically changing, heterogenous distributed information system.
With the continuously growing amount of Internet accessible information
resources, locating relevant information in the World Wide Web becomes
increasingly difficult. Recent developments provide scalable mechanisms for
maintaining indexes of network-accessible information.
In order to implement sophisticated retrieval engines, a means of automatic
analysis and classification of document meta-information has to be found. We
propose the use of methods from the mathematical theory of concept analysis to
analyze and interactively explore the information space defined by wide-area
resource discovery services.
Substantial efforts to establish standards for encoding and accessing electronic resources have occurred over the past five years. We have designed a Web-based tool, called Spectrum, to enable individuals without specialized knowledge of library cataloging or markup to create records for describing and accessing networked electronic resources of various types. System users may create descriptions of electronic resources and view them as formatted USMARC bibliographic records, TEI headers and URCs. Because we anticipate continued volatility in the definition of data element standards, the Spectrum system is designed to allow maximum flexibility in the design of the input formats.
Why should web technologists care about intellectual property? This paper includes a general overview of intellectual property law, analyzes the impact on the web community, and provides evidence that web technology has important implications with respect to copyright issues. Based on the finding that web technology undermines the protection of intellectual property law, we encourage increased activism on the part of web technology designers with respect to taking over the role of legislators in shaping the framework for creative work. It is our conclusion that web technology must better support the revenue of authors and the creation of derivative work.
Information providers have taken great advantage of the WWW to disseminate information. Many would benefit from the ability to offer additional information to authenticated users. Methods for authentication are available with some Web servers. These authentication techniques do not address the needs of the Web community because they lack flexibility and are burdensome. We offer a solution that adds session level authentication to existing Web servers that does not require server software modification and does not dictate a specific web browser. Sessioneer provides a framework for Web administrators to provide session level authentication schemes that will suit specific needs.
Existing authorization schemes for the WWW present a client administration problem when hyperlinked documents and contents are stored in different servers. A new distributed authorization model is proposed where information servers are grouped into authorization domains. User administration is simplified as only one server in the domain needs to know its potential clients. Extensions to the model provide support document and content migration and implementation of user groups. A prototype of the model was implemented over an existing WWW system.
This paper presents a new Web
cataloguing strategy based upon the automatic analysis of documents
stored in a proxy server cache. This could be an elegant method of Web
cataloguing as it creates no extra network load and runs completely
automatically. Naturally such a mechanism will only reach a subset of
Web documents, but at an institute such as the Alfred Wegener
Institute, due to the fact that scientists tend to make quite good
search engines, the cache usually contains large numbers of documents
related to polar and marine research. Details of a database for polar,
marine and global change research, based upon a cache scanning
mechanism are given, and it is shown that it is becoming an
increasingly useful resource.
A problem with any collection of information about Web
documents is that it quickly becomes old. Strategies have been
developed to maintain the database consistency with respect to changes
on the Web, while attempting to keep network load to a minimum. This
has been found to provide a better quality of response and it appears
to be keeping information in the database current. Such strategies are
of interest to anyone attempting to create and maintain a Web document
location resource.
Current World Wide Web technologies concentrate on presenting documents to human readers. Although HTML identifies structures within a document, it does not allow the semantic content of document sections to be specified explicitly. We investigate a small extension to HTML which allows parts of a document to be mapped onto an underlying database schema. This allows automatic identification and extraction of key information from a web using standard database techniques. Such "lightweight" databases may span servers, with searches being performed at client- or server-side. We have applied this approach to generating "flattened" versions of hypertext documents suitable for printing.
Cataloging and searching procedures in traditional library systems are expensive, time consuming and often incomplete. OMNIS is a novel multimedia information retrieval system for the administration of documents in libraries and offices. Using the fulltext database system Myriad combined with scanning and OCR technologies it offers the disclosing, archiving and searching functions at drastically reduced costs with much more precision. Documents may contain page images, full-length PostScript or other medial information and offer the user a much better insight into documents. At Technische Universität München a considerable number of computer science documents have been made searchable by a simple fulltext query language. To make this document retrieval system available to WWW clients the OMNIS document access function was implemented as OMNIS-WWW server which is already in operation. This paper contains the substantial features of OMNIS (especially its searching function) and discusses the concepts and the implementation of its WWW server.
The original WAIS implementation by Thinking Machines et al. treats documents as uniform bags of terms. Since most documents exhibit some internal structure, it is desirable to provide the user means to exploit this structure in his queries. In this paper, we present extensions to the [freeWAIS][1] indexer and server, which allow access to documents structure using the original WAIS protocol. Major extensions include:
This paper presents the results of a study conducted at Georgia Institute of Technology that captured client-side user events of NCSA's XMosaic. Actual user behavior, as determined from client-side log file analysis, supplemented our understanding of user navigation strategies as well as provided real interface usage data. Log file analysis also yielded design and usability suggestions for WWW pages, sites and browsers. The methodology of the study and findings are discussed along with future research directions.
Overview diagrams are one of the best tools for orientation and navigation in hypermedia systems. However, constructing effective overview diagrams is a challenging task. This paper describes the Navigational View Builder, a tool which allows the user to interactively create useful visualizations of the information space. It uses four strategies to form effective views. These are binding, clustering, filtering and hierarchization. These strategies use a combination of structural and content analysis of the underlying space for forming the visualizations. This paper discusses these strategies and shows how they can be applied for forming visualizations for the World-Wide Web.
Busy Internet archives generate large logs for each access method being used. These raw log files can be difficult to process and to search. This paper describes a system for reading these growing logs, a combined log file format into which they are re-written and a system that automates this building and integration for multiple access methods. Automated summarizing of the information is also provided giving statistics on accesses by user, site, path-name and date/time amongst others.
This paper describes DeckScape, an experimental World-Wide Web browser based on a "deck" metaphor. A deck consists of a collection of Web pages, and multiple decks may exist on the screen at once. As the user traverses links, new pages appear on top of the current deck. Retrievals are done using a background thread, so all visible pages in any deck are active at all times. Users can move and copy pages between decks, and decks can be used as a general-purpose way to organize material, such as hotlists, query results, and breadth-first expansions.
This paper describes how to integrate the World Wide Web (WWW) with applications. By means of the web widget, which is part of Hush, the WWW is made available to Tcl/Tk/Hush programmers. Apart from using WWW as part of an application, it also allows one to embed scripts into a web page. This results in a mutual integration of applications and the WWW. Both forms of integration will be described. Some new possibilities of embedded scripts such as inline MPEG, interactive games and navigation facilities will be discussed.
The World Wide Web provides new opportunities for distance education over the Internet. The Web, when combined with other network tools, can be used to create a virtual classroom to bring together a community of learners for interactive education. The Cornell Theory Center, a national center for high performance computing, is investigating the use of emerging network technologies for training computational scientists and researchers in the concepts of parallel processing. This effort is being built on electronic educational materials already on the Web and will evaluate the effectiveness of various collaborative tools.
Developments in computers networks offer not only a wide range of
possibilities in terms of spreading knowledge, but also many problems about
retrieving and obtaing stored informative contents, which users are
interested in. It seems that a such networking environment could play an
important role from an informative and educational point of view. Actually,
by using computers networks, it is feasible to realize virtual classroom to
offer remote student full, interactive partecipation in a class that would
previously have been restricted to students who were attending locally
(1,1993).
In this paper the authors present a WWW student-centered educational
environment that has been realized using a particular methodology which
allows students to be inside an informative area. In this area they are
able to focus on the topics they prefer, to satisfy their particular
instructional needs and also to make an educational and engaging path,
through a specific set of tools. A Mosaic sensitive map makes it possible
to start the navigation in the organized cyberspace, by choosing one of the
displayed topics. After choosing the topic the students are interested in,
they have a set of tools by which it is possible to access to the following
functionalities:
a) virtual classroom;
b) digital libraries and museums;
c) attending to seminaries and conferences;
d) making cooperative work.
This set of tools can be different according to the choosen topic.