Extending WWW for Synchronous Collaboration

Thane J. Frivold, Software Engineer
Ruth E. Lang, Senior Computer Scientist
Martin W. Fong, Senior Software Engineer

SRI International

ABSTRACT

The World-Wide Web (WWW), in conjunction with such tools as Mosaic, is an extremely effective mechanism for individuals to share distributed information. However, access to this information is unidirectional, asynchronous, and limited by a client/server model in which only predefined data are provided. We describe augmenting WWW to support bidirectional, synchronous collaboration between data producers and their consumers. This is accomplished by exploiting WWW's ease of access and use, and by incorporating a peer-to-peer model that provides real-time collaboration services.

1 INTRODUCTION

The World-Wide Web (WWW) [BERN], in conjunction with such tools as Mosaic, is an extremely effective mechanism for individuals to share distributed information. In effect, WWW and its associated viewers constitute a one-stop shopping center for static text, images, audio, virtual documents, etc. in a virtual and international market. However, the user's interaction is unidirectional, asynchronous, and limited to a client/server model in which only predefined (and sometimes dynamically determined) data are provided. In effect, as a shopping center, WWW is product-driven, where the goods are simply passive data supplied by remote servers acting as proxies for data producers.

Given WWW's convenience of centralized access, we describe augmenting WWW to support service-driven, real-time user interactions. The form of these interactions is bidirectional, synchronous collaboration between data producers and their consumers. Specifically, the producers and consumers are brought together in real time, along with their respective tools and data, to discuss the published WWW data. For example, these discussions may take the form of virtual office hours between teachers and students, or ad hoc discussions between authors and readers. This is accomplished by exploiting WWW's ease of access and use, and by incorporating a peer-to-peer model that provides real-time collaboration services.

This paper discusses our recent work in merging this capability into WWW's framework. Specifically, we describe how client/server and peer-to-peer paradigms are reconciled, the supporting collaborative infrastructure, the measures taken to mitigate security risks, and lastly, the scope and future directions of the current implementation.

2 TYPICAL USAGE SCENARIO

Alejandro, a high-school science teacher, wishes to include several hands-on experiments as part of the curriculum he is developing for his upcoming class. When using Mosaic to browse through a collection of project descriptions developed by master teachers, he notices one that uses data visualization techniques for plate tectonic analysis. Although he feels that this project is ideal for his class, he has questions about the data visualization techniques.

Then he notices a "Contacting the Authors" section of the project description that permits the readers to contact the authors, either Ms. Tijara, the geologist, or Mr. VanDorn, the master teacher; it also describes their availability. Because he is interested in the data visualization techniques, and because he sees that Ms. Tijara is scheduled to be in her office, he traverses the hypertext link and contacts her.

While at her workstation, Ms. Tijara notices an invitation box appear on her screen. It states that Alejandro has questions about the WWW document on data visualization for plate tectonics that she developed with Mr. VanDorn. She accepts the invitation and enters into a synchronous collaborative work session with Alejandro. After saying "Hello," using a microphone connected to her workstation, she sees windows appear with the Mosaic experiment description page and the simulation process windows that Alejandro was viewing at the time he contacted her. (These windows correspond to applications that are part of the shared workspace.) Alejandro uses his mouse to highlight sections of the text, and then uses his mouse to gesture at the three-dimensional view of the corresponding simulation data while asking his question. Ms. Tijara explains the correlation between the two, reruns the simulation, and narrates for Alejandro as the simulation progresses. To eliminate ambiguity, she brings into the session another simulation from her private workspace that shows a more detailed view of the fault area. With his questions answered, Alejandro thanks her for her time, and they close the collaborative work session. He then proceeds to complete his lesson plan by incorporating the plate tectonic analysis.

3 SUPPORTING SYNCHRONOUS COLLABORATION

The functionality described in this scenario has been achieved by merging two pre-existing systems, each designed to support differing forms of collaboration. WWW provides excellent, asynchronous worldwide access to information. SRI International's COllaborative Multimedia Environment Technology (COMET) [CRAI] provides synchronous collaboration in the form of shared access and control of unmodified X11 applications and collaborative network audio. The combination of WWW and COMET provides a mechanism that permits a user not only to browse through a wealth of static information, but also to contact the authors and discuss this information with them as a natural extension of the browsing process.

3.1 World-Wide Web

WWW supports asynchronous interaction among authors who publish information and browsers who view it. If WWW is being used as a repository for information produced in common (e.g., a paper written by geographically separated members of a project team), the interaction among authors who work at different times constitutes asynchronous collaboration.

The data exchange protocol that WWW employs, HTTP (Hyper Text Transfer Protocol), manifests this asynchrony by employing a strict client/server model. A client program submits an explicit request to a server, which composes and delivers a response to that client. While the data delivered may constitute products of the author, the author is not directly involved; rather, the HTTP server acts as a proxy for the author in these transactions.

3.2 SRI International's COMET

COMET supports synchronous interaction among users who work together to review and produce information. The goal of COMET is to create a collaboration environment in which users can consult one another with the ease of a telephone call, and can include in the conversation the active use of their own unmodified applications.

Because any conference participant may originate a shared action (e.g., speaking, gesturing, or modifying a shared document), COMET uses peer-to-peer communication to permit this real-time interaction. A client/server paradigm, such as the one employed by WWW, would restrict the natural multidirectional flow of information that occurs in a conversational paradigm.

3.3 Bridging the Gap Between Systems

Within the WWW client/server model, the server never transmits unsolicited data to a client. However, within the COMET peer-to-peer model, data may flow freely among peers. To bridge the gap between the two models embedded in each system, an adjunct process was introduced that acquires data through the standard WWW client/server mechanism and transfers it to the peer-to-peer collaborative environment.

Thus, the data retrieved from an HTTP transaction are treated not as an end result, but as the starting point for collaboration. Specifically, conference configuration information stored at an HTTP server serves as the basis for initiating, terminating, or otherwise modifying the configuration of a collaborative session.

The steps involved in the initiation of a collaborative session between a browser and an author is shown in Figure 1 and described below.

FIGURE 1. COLLABORATIVE SESSION INITIATION

Browser Selects "Contact Author" Option.
The browser clicks on a hypertext link in the current Mosaic document that advertises the ability to contact the document's author. Mosaic submits a request to the HTTP server to resolve the selected hypertext link.
HTTP Server Delivers Contact Information.
The HTTP server accepts the request and retrieves the appropriate conference configuration, drawing from auxiliary information the author stored in parallel with the original HTML document. This information is then delivered to Mosaic as a virtual document containing conference configuration data.
Mosaic Forwards Data.
Mosaic accepts the reply, identifies the helper application registered to handle conference configuration data, and forwards the data to that application.
Helper Application Forwards Data.
The helper application adds the browser's contact information to the conference configuration information, then passes the information to the browser's CCMA.
CCMAs Connect.
The browser's CCMA contacts the author's CCMA, then passes the conference configuration information to its peer. By sharing the conference information, all participants are informed of who will be involved and what resources are needed to participate in the conference.
Invitation Issued.
The browser's CCMA issues an invitation to the author's CCMA. Accompanying the invitation is a description of what actions the author's CCMA should take based on the author's response.
Author Responds to the Invitation.
A conference invitation appears on the author's screen. The author may accept, decline (with an optional reason), or fail to respond (in which case, the invitation times out).
Author's CCMA Responds.
On receiving the author's response, the browser's CCMA either establishes a conference or informs the browser why the author could not join the conference.
CCMAs Establish Shared Workspace.
As part of establishing a conference, the browser's and author's CCMAs collectively establish a shared workspace. Each CCMA ensures that the required collaborative media agents are available, properly configured, and active.

3.4 Shared Workspace Description

The result of establishing a collaborative session is the creation of a "shared workspace" that permits users to talk to each other (via a network audio channel) as well as see and interact with each other's applications. In this way, they can not only share control of an application (e.g., cooperatively review a WWW document within Mosaic), they can speak, gesture, and point as if they were working at the same computer (e.g., say "What does this mean?" while gesturing at a sentence in the document using the mouse).

As shown in Figure 2, the shared workspace is implemented using SRI International's X11 COMET Pseudo-Server (XCPS) and Lawrence Berkeley Laboratory's Visual Audio Tool (VAT), along with applications that represent the work products of the collaborators. Both XCPS and VAT are configured and managed by the CCMA.

FIGURE 2. SHARED WORKSPACE

The CCMA serves as a proxy for an individual user and responds on that user's behalf. It communicates with other CCMAs, each representing a different user, to establish and dynamically reconfigure multimedia, multiparty sessions among media agents.

XCPS supports synchronous multiuser access to unmodified single-user X11 applications by managing tool I/O to and from the keyboard, mouse, and display. It ensures that input to single-user applications is generated by one user at a time. This is accomplished by an activity-sensing contention resolution algorithm (i.e., floor control) that permits users to share applications in an informal and conversational manner [GARC].

Recalling the usage scenario described above, Figure 2 shows that the browser's Mosaic window, running through XCPS, is also seen on the author's screen, providing her with context from the browser's workspace. Similarly, the author has made a window from another application available to the browser in order to illustrate a particular concept. Although Alejandro and Ms. Tijara may be hundreds of miles apart, VAT allows them to talk and XCPS allows them to view and cooperatively control applications as if they were both working in the same office.

4 SYSTEM INTEGRATION

Existing configuration and extension mechanisms within the WWW architecture were used exclusively in bridging the gap between the asynchronous document browsing provided by WWW and COMET's synchronous collaboration capability. No modifications were made to the HTTP server or Mosaic client code. Specifically, the bridge consists of the following elements:

Server extensions in the form of data stored at the HTTP server plus a Common Gateway Interface (CGI)-compliant script [MCCO],
Client extensions to support a new Multipurpose Internet Mail Extensions (MIME) [BORE] content type, and
Helper application.

4.1 Server Extensions

Conference configuration information is stored within the server, which lists both the contact information for the author(s) of the document and the components that make up the conference itself (e.g., collaborative audio tool, shared workspace support application). To simplify the creation and use of this conference configuration information, the server employs templates that contain a variety of parameters naming:

The conference itself (e.g., PlateTectonicAnalysis)
The conference operation to be performed (e.g., Initiate)
Any scoping on that operation (e.g., TwoWay).

These options are implicitly selected when the browser traverses the associated document hypertext reference; e.g.,
/ccma-bin/ccma-script?PlateTectonicAnalysis+Initiate+TwoWay In turn, this invokes the CGI script that validates the conference identifier and options, constructs a virtual document containing information described by these parameters, and returns it to the client.

4.2 Client Extensions

We extended the client by defining a new content type to permit delivery of conference configuration information to a browser's CCMA, and exploited the ability of Mosaic to map any content type to a particular support application. All conference configuration information is transmitted using our own MIME content type, application/x-ccma-configuration, and associated data format. This approach ensures that the information is passed to our collaboration helper application on the client's side.

4.3 Helper Application

The role of the helper application is to inject data from the WWW client/server paradigm into the peer-to-peer synchronous collaboration environment. The helper application is invoked by Mosaic as a side effect of resolving the application/x-ccma-configuration content type. The helper application dynamically determines the name, host address, and X11 display of the browser, and then determines how to contact the browser's CCMA. The helper application then establishes a connection to the browser's CCMA and delivers the static conference configuration information (i.e., the virtual document created by the HTTP server), along with dynamically determined browser contact information. The CCMA uses this information in peer-to-peer communication with other CCMAs to initiate, terminate, or augment collaborative sessions.

5 SECURITY

The major source of flexibility in the WWW framework is the ability to tailor server and/or client behavior through the use of scripts and helper applications. Unfortunately, the features that provide this flexibility also have the potential to be misused by malicious clients and servers.

In particular, the issue of security must be addressed any time a command shell or interpreter is invoked by and passed data from a potentially untrusted source. This situation occurs when an untrusted client sends requests to a server, or a client receives responses from an untrusted server.

5.1 Server-Related Security

We have added conference configuration validation to our CGI scripts. Because the hypertext reference that a client requests has embedded conference parameters (each implicitly mapped to a data file, some of which are interpreted as shell scripts), a malicious client might attempt to access other data by modifying the specified hypertext reference. However, the CGI scripts validate all conference options against a table of allowable configurations. Furthermore, because the options implicitly map to files, all filenames are inspected to ensure that they do not access data from outside the server's conference configuration area.

While an author might wish to make a document generally available, he or she may also wish to restrict access to certain conference options to a limited audience. However, note that access control must be performed by the server. Thus, the CGI scripts grant or deny access to a conference option based on the requesting host.

If an error or an access denial occurs, the CGI scripts generate virtual HTML documents to inform the user of the problem.

5.2 Client-Related Security

With any application that processes MIME documents, care must be taken when choosing to interpret any data that can trigger the execution of a script or process. While conventional wisdom dictates that users always closely examine "enabled mail" components, this inspection step is cumbersome and, in the current context, defeats the goal of real-time collaboration.

Nevertheless, our proof-of-concept implementation transmits both shell scripts and Tcl commands [OUST] from the server which are interpreted by the client without user inspection. However, because this is obviously an unsecure implementation, access to this WWW extension has been limited to a few trusted users and hosts.

6 SCOPE AND FUTURE DIRECTIONS

The current system, which requires UNIX workstations running the X11 window system, is a proof-of-concept prototype that allows a browser to "drop in" and collaborate with the author(s) of a given document. It has been used internally by engineers at SRI International in an extended user study focusing on informal synchronous collaboration, as well as for several cross-country demonstrations.

While this prototype has met the goals of the original effort, several enhancements have been identified that will extend the usability of the system:

Allowing multiple browsers to connect simultaneously to more than one author (in the current model, one browser may connect to one or more authors).
Allowing "latecomers" to enter an existing session
Extending HTTP server to report state (and existence) of ongoing COMET sessions.

BIBLIOGRAPHY

[BERN]	Berners-Lee, Tim, Robert Cailliau, Jean-Francois Groff, 
	and Bernd Pollermann, "World-Wide Web: The Information 
	Universe" Electronic Networking, Vol. 2, No. 1, 
	pp. 52-58, Spring 1992.

[BORE]	Borenstein, N. and N. Freed, "MIME (Multipurpose Internet 
	Mail Extensions): Mechanisms for Specifying and Describing 
	the Format of Internet Message Bodies," RFC 1521.

[CRAI]	Craighill, Earl, Ruth Lang, Martin Fong, and Keith Skinner,
	"CECED: A System For Informal Multimedia Collaboration,"
	Proceedings of the ACM MULTIMEDIA '93 Conference, 
	Anaheim, CA, August 1993.

[GARC]	Garcia-Luna-Aceves, J. J., E. J. Craighill, and R. Lang, 
	"Floor Management and Control for Multimedia Computer 
	Conferencing," Proceedings of MULTIMEDIA '89, 2nd IEEE 
	Comsoc International Multimedia Communications Workshop, 
	Ottawa, Ontario, Canada, April 1989.

[MCCO]	McCool, Rob, http://hoohoo.ncsa.uiuc.edu/cgi/interface.html,
	"The CGI Specification."

[OUST]	Ousterhout, John K., Tcl and the Tk Toolkit,
	Addison-Wesley Publishing Company, Reading, MA, April 1994.

VITAE

Thane J. Frivold is a Software Engineer in the Augmented Collaborative Environments Program at SRI International. His specialized professional competence includes applied research, design, and implementation in the following areas: collaborative and multimedia systems, X11 programming, geographic database and display systems, graphical user interfaces, distributed databases, object-oriented design and development, and systems and network programming. Since joining SRI in 1986, Mr. Frivold has participated in the design and implementation of a multimedia conferencing system environment; led the development of a geographical map display/data fusion system; implemented dynamic control capabilities for a distributed database system; developed encoding schemes and decision analysis methods using fuzzy set paradigms; and developed analysis tools for study of magneto-encephalography brain wave data. Mr. Frivold received his B.A. in Computer Science from Dartmouth College in 1986.

Ruth E. Lang is a Senior Computer Scientist in the Augmented Collaborative Environments Program at SRI International. Her specialized professional competence includes applied research, design, and implementation in the following areas: multimedia and collaborative systems, multimedia user interfaces, directory services (X.500), and distributed, interactive applications and environments. Since joining SRI in 1985, Ms. Lang has led projects for research into multimedia information and collaboration systems, fielding of X.500 in the ARPA Internet, and application of artificial intelligence technology to solve problems with the DoD Directory based on X.500. She has participated in the design and implementation of the following systems or components: a multimedia conferencing system prototype, a multimedia collaborative editor, integrated system architecture for distributed interactive applications, and an object-oriented application that supports military situation display and assessment. Ms. Lang received a B.S. in Mathematics and Computer Science at University of California at Los Angeles in 1981, and a M.S. in Computer Science from the University of California at Berkeley in 1982.

Martin W. Fong is a Senior Software Engineer in the Augmented Collaborative Environments Program at SRI International. His specialized professional competence includes computer graphics, scientific applications programming, utilities programming, software design, and systems analysis and programming. Since joining SRI in 1979, Mr. Fong has performed component and system design and implementation for a real-time multimedia conferencing system; designed and implemented a structured, two-dimensional, object-oriented graphics package and editor; served as senior designer for a C3I application for situation map displays; served as principal designer of a mainframe-independent data tape format and data access package software for multicontractor use;, and led a team of engineers in developing a compiler, linker, and run-time executive for a special-purpose computer language. Mr. Fong received a B.A. in Astronomy in 1976 from the University of California at Berkeley.

CONTACT INFORMATION

Thane J. Frivold, (415)859-2786, tfrivold@std.sri.com
Ruth E. Lang, (415)859-5608, rlang@std.sri.com
Martin W. Fong, (415)859-4251, mwfong@std.sri.com

Augmented Collaborative Environments Program
SRI International
333 Ravenswood Avenue
Menlo Park, CA 94025

This paper was approved for public release with unlimited distribution by ARPA Directorate for Security Review, OASD (PA). The views and conclusions contained in this document are those of the authors and should not be interpreted as representing the official policies, either expressed or implied, of the Advanced Research Projects Agency of the U. S. Government. This work was conducted under Contract MDA972-92-C-0023, for DARPA Initiative in Concurrent Engineering (DICE).