CrystalWeb - A Distributed Authoring Environment for the World Wide Web

Ralph Peters
Fraunhofer Institute for Computer Graphics,
Wilhelminenstr. 7, Darmstadt, Germany
peters@igd.fhg.de

Christian Neuss
Fraunhofer Institute for Computer Graphics,
Wilhelminenstr. 7, Darmstadt, Germany
neuss@igd.fhg.de
Abstract:
Human cooperation and interaction is highly dependent on efficient means for communication. While the World Wide Web provides powerful tools for information retrieval, it still lacks support for document authoring. Distributed authoring of documents requires not only the possibility to jointly view and edit a document, but also the ability to create and share annotations. This article examines the issues behind a integrated authoring environment, and introduces a prototype for information access, telecooperation and distributed authoring in a wide area network.
Keywords:
Authoring Environment

1 Introduction

Today's use of personal computers in office and information management increasingly involves telecommunications networks. Back in the 80s, personal computers were mainly used for text processing and spreadsheet calculation. In the 90s, we experience a trend towards inter-personal computing, allowing for communication via electronic mail, file transfer and information access. There are an estimated 50 million PCs in use in businesses in Europe, half of which have network access capabilities.

A large part of computer supported office work involves documents. Being able to electronically access them in a wide area network is an important element of telework. By providing an easy to use interface for information browsing as well as a flexible mechanism for hypermedia access, the World Wide Web is the ideal basis for a distributed information retrieval and authoring system. It allows for transparent transfer of heterogeneous documents and provides a point and click metaphor for interlinked hypertext.

Its ease of use and the availability of browsing software on a variety of platforms make it an interesting candidate for implementation of a corporate information system. However, the Web currently lacks a number of features necessary for distributed authoring and management of documents and document related meta-information.

If a team of authors are working together on a common set of documents (e.g. technical manuals), their write access to these documents has to be coordinated. Thus, the document repository has to be augmented with a document database functionality that provides transaction mechanisms and access control. One problem problem here is the fact that structural changes to the documents require manual maintenance of hyperlinks. On a conventional Web server documents are created and maintained as physical files, with links based on relative or absolute pathnames. Whenever a file is being moved or renamed, links to that document become inconsistent. Instead of their physical access path, links to documents could be based on an abstract, persistent naming scheme. Following this approach, a server would use a directory service for mapping an abstract name (which could be encoded into the document URL) to the actual physical location.

Finally, an integrated environment for distributed information authoring and retrieval not only has to support remote document access and resource discovery, but must also allow for making digital annotations and facilities for holding virtual conferences with co- workers. In order to create an integrated authoring environment, Web browsers/editors have to be augmented with sophisticated teleconference and annotation tools to support distributed authoring of hypermedia documents.

We will analyze the requirements for distributed authoring in a Wide Area Network (WAN), and introduce CrystalWeb, a prototypical implementation of an integrated environment for distributed authoring.

2 Collaboration in Wide Area Networks

This section discusses the World Wide Web as a basic element of an integrated authoring environment as well as existing tools and applications for collaboration in a wide area network (WAN).

2.1 The World Wide Web

The World Wide Web (WWW) [LCG92] is an information retrieval system on the Internet with an estimated 30 million individual users. It has created the first true global hypermedia network, with more than 3000 multimedia databases of hyperlinked documents. The World Wide Web is conceived as a seamless world in which information from any source can be accessed in a consistent and simple way. It was originally developed in 1989 at the European Particle Physics Laboratory CERN as a means of sharing research information through the organization. By the end of 1990, the first World Wide Web browser software was introduced on a NeXT machine. The World Wide Web has since then experienced tremendous growth and widespread popularity. For example, the World Wide Web has been mentioned by CNN, the Wall Street Journal, the Economist, Fortune magazine and the New York Times. The Web is already heavily used: Over 10 Gigabytes of World Wide Web data are transferred over the NFS backbone every day.

The great popularity the Web currently enjoys results from the availability of easy to use "point-and-click" browsing tools which make access to information a task as simple the use of a word processor. The most famous among them is probably Mosaic, which was developed by the National Center for Supercomputing Applications (NCSA) at the University of Illinois at Urbana-Champaign. Web browsers have been developed for a variety of platforms including MacIntosh and Intel PCs, which makes the Web not only a vendor but also a platform independent standard. Its ease of use makes the Web an attractive vehicle for electronic publishing and for conducting various business transactions. It offers encryption, authentication and data security for the exchange of sensitive documents. Tools for client side searches [BrPo94] and creation of indexes in a server local manner [NeHo94] as well as centralized [Kos94] establish mechanisms for resource discovery. Document formatting and hypertext information is stored in an SGML derived format called HTML (Hypertext Markup Language). It includes logical structures like headings, paragraphs, lists and emphasis e.g. by using a different font. The World Wide Web also allows for seamless integration of other information services. For example, a gateway program can provide access to an relational database system which contains product specifications.

2.2 Joint Viewing and Conferencing Systems

In order to provide computer support for collaboration, teleservices and tools have been developed which supply an audio/video connection. These services can basically be seen as a replacement for communication over telephone, enhancing it with a number of features like the ability to establish a conference and provision of a real time video connection.

Systems like the Multi Media Collaboration Service [AD93] within the BERKOM project, SunSoft`s ShowMe or Intel`s ProShare provide not only an audio/video conferencing tool, but also realize a joint viewing functionality. Stand-alone joint viewing systems like Shared X [Alt90], XMX [Baz90] and XTV [AWF91] are often X windows based, an overview can be found in [Jon93]. Their input can be shared via a so called floor control system, which basically defines from which machine the application is controlled. By passing the input focus to another conference member, cooperative work with the application is possible. Additional support for cooperation is provided by displaying a so called shared pointer or telepointer, which makes the mouse movements of one party visible to others.

However, audio/video connections and joint viewing are not the only services that are useful in a conference situation. The ability to make sketches and notes on a whiteboard is an important part of a conference. This paradigm is being used by whiteboard systems: The conference participants can make drawings and notes on a shared drawing area. An interesting feature of these group sketching tools is the ability to load an image as a background into the drawing area and thus make annotations on this image. This approach is taken by systems like SketchPad [SNR9][NS91], Wscrawl [Mal94] or GroupSketch [RG92]. Furthermore, some audio/video conferencing systems like SunSoft`s ShowMe also have an integrated whiteboard.

Using a shared whiteboard system for annotating and discussing documents can be done by taking the output of a non-shared document processing system as a background image. The output of the document system is captured by taking a "snapshot" of the screen output, and placed as an image in the background of the conferencing tool. It is then possible to make graphical and textual annotations, which can be shared with other conference participants. The problem with whiteboard based systems is that only a small fraction of the document (the page which is visible in the snapshot) can be annotated. This view is static; existing systems do not reflect changes in the document, or allow for navigation in the document. Furthermore, it is not possible to have conference members actually edit the document. This causes the need for annotating a living document which requires an approach entirely different to current whiteboard systems [PNB94].

3 An Integrated Authoring Environment

A distributed information system must provide methods for access and navigation in a remote document repository. For example, an author can use a hypermedia browser to access various documents like specification sheets or technical documentation, go through the minutes of a previous meeting, or comments and suggestions from co-workers.

Since the World Wide Web allows for seamless integration of heterogenous information sources, it can be used to provide an integrated information system. However, the World Wide Web as it exists today lacks support for authoring. State of the art servers use the file system as data repository, requests for documents are mapped directly to file system paths. A server does not know about a document until it actually serves it, especially, it has no control over the creation and modification of documents. Normally, documents are modified by loading them into a standard text editor and saving them back to the file system, the server is unaware of this process.

3.1 Requirements

In order to support team members with a mechanism for reviewing documents, an annotation facility has to be provided. Annotations can include sketches, text or even a digital recording of a voice remark. Since the annotations are displayed right on the document, making corrections and incorporating changes can be done in a very convenient way.

Finally, an audio/video connection to one or more co-workers can be opened in case a question arises which needs discussion. To support this discussion, the document being viewed by the author can simultaneously be displayed on the other workstations (shared viewing). Finally, after saving the document to the remote document repository, the new version is immediately available to the other team members.

In order to support distributed authoring in a wide area network, an authoring environment must provide the following functionality:

3.2 System Architecture

Figure 1 shows the software components of an integrated environment for distributed authoring: A distributed document database system stores hypertext files, annotations and secondary data, and makes them accessible in a wide area network. The World Wide Web serves as distribution mechanism: A hypertext browser and a information resource discovery frontend are used for identifying and retrieving documents. The authoring system component consists of a Desktop Publishing (DTP) system, an annotation facility which provides an additional sketching facility, and an audio/video communication tool. Including a mechanism for audio/video conferences and for sharing an application between several users allows for online discussions without the need for physical presence. In order to be able to use the DTP system in a distributed environment, its can be shared among co-workers through the use of a sharing component. To close the circle of information access and authoring, a finished document is published via a hypertext converter and stored in a proper format in the database.

    FIGURE 1. System architecture
Not only allows this approach for keeping a single source document for both electronic and paper version of a file, it also offers the features of a professional document processing system such as built in drawing tools, import filters for pictures, spell checking, and an integrated thesaurus.

An authoring environment must handle a document as a compound entity: while the HTML version of a document is adequate for electronic access, it lacks layout information required for a printable version. Thus, both version must be available, either as separate versions or generated dynamically through a filter. Furthermore, in order to support electronic annotations, a digital version of the sketch or the voice remark is also a required part of the document.

    FIGURE 2. Synchronous and asynchronous editing
The document database treats the electronic document as an opaque entity. Depending on the type of action an author performs, different views to this entity are being used. Figure 2 shows the information flow for synchronous and asynchronous work on a document. On the left side, the asynchronous case is being shown. The author requests write access to a document. If write access is granted, the document then is locked and transferred to the author. The author can make changes and create or modify annotations. After saving the document, another author or reviewer can in turn access and edit it. There is no direct communication between the authors, all information transfer is handled via HTTP. In case of synchronous work, the situation is different. After retrieving the document, it must be edited in a cooperative session. The clients communicate directly with each other via the sharing component, which multiplexes the in- and output of the DTP system and provides a floor control mechanism.

4 The CrystalWeb environment

This section describes CrystalWeb, an environment for distributed authoring created at the Fraunhofer Institute for Computer Graphics. It augments the functionality of the World Wide Web with a database system to manage hypertext documents, a professional DTP system which allows for creating both paper and hypertext versions of a document, and a set of tools for collaboration and communication. The Fraunhofer Institute for Computer Graphics has been working on World Wide Web applications since spring 1993, the focus has been in the area of information resource discovery and authoring environments. A prototype for distributed authoring was demonstrated at the Telework 94 conference in Berlin. It combines a Web editing tool with a facility for making digital document annotations [PNB94]. A Web browser is used for information retrieval and access to the document pool. When a document has been reviewed by a co-worker, the Web browser not only opens it in the document processor but also starts up an annotation viewing facility. In order to be able to discuss questions with other team members, a conference can be initiated. The annotated document can then be viewed simultaneously by all conference participants. For publishing we use the document processing system WebMaker (for information see http://www.cern.ch/WebMaker). WebMaker is a configurable converter of FrameMaker documents to the World Wide Web, which enables authors to publish simultaneously both the printed and the WWW versions of a document.

The following sections will show how the CrystalWeb components form an authoring environment. The elements of the authoring process include content based retrieval of documents, viewing and editing of digital annotations, as well as holding teleconferences with co-workers.

4.1 Information retrieval

The first step is the authentication process: A username and a password has to be entered in order to be able to access the system. Then, a document search query is being initiated (see Figure 3). The query is being entered into with the help of a so called forms interface. Forms are an element of HTML which allow for providing the user with an interactive interface where values can be entered and simple checkboxes and selections can be made. After finishing a forms interaction, the data gets encoded and is then transferred to the Web server for evaluation. The CrystalWeb search interface allows for retrieving documents based on title, author, or through free text searches that can optionally be restricted to document fields. As result of the query (see Figure 4), a list of hyperlinks which lead to the corresponding document is being returned.

    FIGURE 3. Query forms interface
Selecting the corresponding hyperlink retrieves the document. For those documents the user has write permissions, a virtual link is dynamically created and inserted into the hypertext document. Selecting this link calls up a CGI gateway which marks the corresponding document as 'checked out for writing', and returns the documents FrameMaker source. The Web browser then starts a FrameMaker session with this document.

Finally, after finishing the changes to the document, it has to be transferred back to the server in order to make the new version available to the other team members. Since a well defined interface for the 'publishing' process is being employed, the document repository can not only make the new version available (and store the previous version under a different name), but also initiate an update of the index for free text searches.

    FIGURE 4. Query result

4.2 Teleconferencing

Distributed authoring requires communication between team members and computer-based systems which provide an 'interface to a shared environment to support groups of people engaged in a common task or goal' [EGR92]. An electronic conferencing system enables the replacement of face-to-face meetings by computer supported conferences between geographically dispersed persons. It is responsible for the administration of the teleconference itself as well as for additional components like audio/video. In this workstation integrated conference system a desktop publishing application will be distributed to all users via a window sharing component to support shared viewing and editing. Such a joint viewing or editing tool displays the graphical output of an application simultaneously on multiple computers in a network, and thus gives all participants the same view of e.g. a document. The input to the shared application is managed by a so called floor control system; an author has to ask for the floor (the right to address an assembly) in order to perform an action. Figure 5 shows a screendump of CrystalWeb authoring environment.

    FIGURE 5. Distributed authoring in a teleconference (click figure to view larger screendump)
Besides the direct telecommunication with the other authors, in a distributed authoring scenario it is important to be able to communicate ideas by making freehand sketches and annotations. The shared desktop publishing application should be overlaid by the annotation facility like a set of transparencies. Each participant gets a private transparency where he or she can sketch comments in a unique color. Sound annotation allows an immediate addition of succinct comments to a text sector. If the comments consist of longer passages, text annotation should be offered in form of a simple editor. On demand, these passages may be integrated into the document later, e.g. through a clipboard mechanism.

4.3 Digital annotations

A document which has been reviewed and contains annotations is retrieved in the same way, the only difference is that not only FrameMaker is being started, but also the annotation facility of CrystalWeb which loads the annotations and displays them on top of the FrameMaker client.

The annotation facility augments the DTP system with a mechanism for including e.g. sketches and short sections of text. However, annotating a living document poses some problems for the annotating system which have to be taken into account. The problem with the electronic document is the fact that any navigation in it, or even scrolling up or down a page will cause the position of an annotation to lose its meaning. Even worse, the content itself is of course also subject to change. Whenever a reformatting takes place, the position of an annotation will have to be moved. This could be triggered by inserting or deleting of text, but also by changing attributes like font size or page margins. Figure 6 illustrates the problem: even scrolling backward or forward in the text causes the annotation to lose its meaning, since the relation to the text parts get lost. The same can happen due to the insertion or deletion of text. Annotations on dynamically changing documents are possible through introducing invisible tags in the underlying system, and using them as anchors for annotation mark-up.

    FIGURE 6. Problems with scrolling or modifying text
The author can edit the document and make changes according to the annotations. For displaying and editing annotations, the CrystalWeb environment uses CrystalPad [PNB94] which was developed in a project financed by Siemens. For shared authoring a desktop publishing (DTP) application is distributed to all users via a window sharing component. By this CrystalWeb supports shared viewing and editing in a multi media teleconferencing environment as well as making annotations by overlaying an application with a virtual drawing area, which can be used like a transparency film.

4.4 Hypermedia annotations

Although the ability to directly write sketches and short notes onto a document is an important feature in an editing process, other types of annotations must also be supported. Sound annotations allow for an immediate addition of succinct comments to a text sector, while longer comments can be made in the form of an attached text file. On demand, these passages may be integrated into the document later, e.g. through a clipboard mechanism. This also applies to image annotations illustrating a document section. The World Wide already supports the concept of external viewers: a hyperlink can point to a sound or an image as well as to another hypertext file. Upon selecting the corresponding link, an external viewer (depending on the file type) is being started to display the object.

CrystalWeb allows for inclusion of multimedia annotations via the annotation facility. The annotation facility also handles the distribution of the multimedia annotations during a conference session. Upon writing the document to the distributed database, an HTML version is being generated in which multimedia annotations are converted to hyperlinks.

Figure 7 shows the design of hypermedia annotations. Together with a "simple" annotation a link managed by the annotation facility is created. Internally, this link is transferred into the proper annotation in the form of sound, text, images, video, or other media. With the conversion of the document external files for each hypermedia annotation and corresponding hyperlinks in the HTML-File are automatically generated

    FIGURE 7. Hypermedia Annotations

5 Conclusions

In order to provide an integrated environment for retrieval and distributed authoring of documents, teleservices and information systems have to be combined into an integrated environment. The World Wide Web provides powerful and easy to use tools for information retrieval, but lacks support for document authoring. An integrated environment for distributed information authoring and retrieval not only has to support remote document access and resource discovery, but must also provide digital annotations and facilities for holding virtual conferences with co-workers.

We have analyzed the requirements for an integrated environment and presented a solution for combining electronic access of information, digital annotations and teleconferencing tools.

The ability to make annotations onto a living document while it is being modified provides a natural way of discussing changes with others while working them into a new version. Since these annotations can be stored together with the document, they can also be used outside a conference situation.

Both document and annotations can be distributed via the World Wide Web and are accessible by any team member. Future work will aim at support of workflow control and use of a database management system for the document repository.

6 Acknowledgement

The work on the annotation facility CrystalPad was performed in a project financed and supported by Siemens ICK / Siemens ZFE München and Siemens ZFE Saarbrücken.

7 References

[AWF91]
Hussein M. Abdel-Wahab and Mark A. Feit. XTV: A Framework for Sharing X Window Clients in Remote Synchronous Collaboration. X11R5 contrib distribution, contrib/clients/xtv, August 1991.
[AD93]
Altenhofen, M.; Dittrich, J.; Hammerschmidt, R.; Käppner, T.; Kruschel, C.; Kückes, A; Steinig, T.: The BERKOM Multimedia Collaboration Service, Proceedings of the ACM Symposium on MultiMedia, The Association for Computing Machinery, New York, 1993.
[Alt90]
Michael P. Altenhofen. Shared X. Technical report, NESTOR Project, Digital Equipment Corporation, CEC, Karlsruhe, 1990.
[Baz90]
John Bazik. XMX: An X Protocol Multiplexor, Version 1.0. FTP: wilma.cs.brown.edu: pub/xmx.tar.Z, Dezember 1990.
[BrPo94]
De Bra, P.M.E.; Post, R.D.J.: Information Retrieval in the World Wide Web: Making Client Based searching feasible, Proceedings of the First International World- Wide Web Conference, CERN, Switzerland, May 1994
[EGR92]
Ellis, C.A.; Gibbs, S.J.; Rein, G.L.: Groupware: Some issues and experiences. Communications of the ACM, 34(1): 39-58, January 1992.
[Jon93]
Oliver Jones. Multidisplay Software in X. In The X Resource, Issue Six. O'Reilly and Associates, Inc., 1993.
[Kos94]
Koster, Martijn: ALIWEB --- Archie--Like Indexing in the WEB, Proceedings of the First International World-Wide Web Conference, CERN, Switzerland, May 1994
[LCG92 ]
Berners-Lee, T.; Cailliau, R.; Groff, J.; Pollermann,B.: World-Wide Web: The Information Universe, in Electronic Networking: Research, Applications and Policy Vol.1 No.2, Meckler, Westport CT, Spring 1992
[Mal94]
Pal S. Malm. The unOfficial Yellow Pages of CSCW. Technical report, University of Tromso, January 1994.
[Kos94]
Koster, Martijn: ALIWEB --- Archie--Like Indexing in the WEB, Proceedings of the First International World-Wide Web Conference, CERN, Switzerland, May 1994
[NS91]
S. Noll and M.G. Schendel. Cooperative sketching in a network environment for the automotive industry in europe. In Proceedings of the Eurographics 1991, Technical Report Series, Vienna, 1991.
[PNB94]
Peters, R; Neuss, C.; Bernhard, M.: CrystalPad: Shared Authoring with Annotations, Postersession Proceedings IWACA '94 Workshop, Heidelberg, 1994.
[RG92]
Mark Roseman and Saul Greenberg. GroupKit: A groupware toolkit for building real--time. In Jon Turner and Robert Kraut, editors, Proceedings of the Conference on Computer--Supported Cooperative Work, Toronto, pages 43 -- 50. The Association for Computing Machinery, New York, 1992.
[SNR91]
M.G. Schendel, S. Noll, and J. Rix. Distributed SketchPad System: A tool for cooperative sketching in a network environment. In Contribution to COMICS-- Workshop, Toulouse, June 1991. .