Extending WWW for Synchronous Collaboration
Extending WWW for Synchronous Collaboration
Thane J. Frivold, Software Engineer
Ruth E. Lang, Senior Computer Scientist
Martin W. Fong, Senior Software Engineer
SRI International
ABSTRACT
The World-Wide Web (WWW), in conjunction with such tools as Mosaic, is
an extremely effective mechanism for individuals to share distributed
information. However, access to this information is unidirectional,
asynchronous, and limited by a client/server model in which only
predefined data are provided. We describe augmenting WWW to support
bidirectional, synchronous collaboration between data producers and
their consumers. This is accomplished by exploiting WWW's ease of
access and use, and by incorporating a peer-to-peer model that
provides real-time collaboration services.
1 INTRODUCTION
The World-Wide Web (WWW) [BERN], in conjunction with such tools as
Mosaic, is an extremely effective mechanism for individuals to share
distributed information. In effect, WWW and its associated viewers
constitute a one-stop shopping center for static text, images, audio,
virtual documents, etc. in a virtual and international
market. However, the user's interaction is unidirectional,
asynchronous, and limited to a client/server model in which only
predefined (and sometimes dynamically determined) data are
provided. In effect, as a shopping center, WWW is product-driven,
where the goods are simply passive data supplied by remote servers
acting as proxies for data producers.
Given WWW's convenience of centralized access, we describe augmenting
WWW to support service-driven, real-time user interactions. The form
of these interactions is bidirectional, synchronous collaboration
between data producers and their consumers. Specifically, the
producers and consumers are brought together in real time, along with
their respective tools and data, to discuss the published WWW
data. For example, these discussions may take the form of virtual
office hours between teachers and students, or ad hoc discussions
between authors and readers. This is accomplished by exploiting WWW's
ease of access and use, and by incorporating a peer-to-peer model that
provides real-time collaboration services.
This paper discusses our recent work in merging this capability into
WWW's framework. Specifically, we describe how client/server and
peer-to-peer paradigms are reconciled, the supporting collaborative
infrastructure, the measures taken to mitigate security risks, and
lastly, the scope and future directions of the current implementation.
2 TYPICAL USAGE SCENARIO
Alejandro, a high-school science teacher, wishes to include several
hands-on experiments as part of the curriculum he is developing for
his upcoming class. When using Mosaic to browse through a collection
of project descriptions developed by master teachers, he notices one
that uses data visualization techniques for plate tectonic
analysis. Although he feels that this project is ideal for his class,
he has questions about the data visualization techniques.
Then he notices a "Contacting the Authors" section of the project
description that permits the readers to contact the authors, either
Ms. Tijara, the geologist, or Mr. VanDorn, the master teacher; it also
describes their availability. Because he is interested in the data
visualization techniques, and because he sees that Ms. Tijara is
scheduled to be in her office, he traverses the hypertext link and
contacts her.
While at her workstation, Ms. Tijara notices an invitation box appear
on her screen. It states that Alejandro has questions about the WWW
document on data visualization for plate tectonics that she developed
with Mr. VanDorn. She accepts the invitation and enters into a
synchronous collaborative work session with Alejandro. After saying
"Hello," using a microphone connected to her workstation, she sees
windows appear with the Mosaic experiment description page and the
simulation process windows that Alejandro was viewing at the time he
contacted her. (These windows correspond to applications that are part
of the shared workspace.) Alejandro uses his mouse to highlight
sections of the text, and then uses his mouse to gesture at the
three-dimensional view of the corresponding simulation data while
asking his question. Ms. Tijara explains the correlation between the
two, reruns the simulation, and narrates for Alejandro as the
simulation progresses. To eliminate ambiguity, she brings into the
session another simulation from her private workspace that shows a
more detailed view of the fault area. With his questions answered,
Alejandro thanks her for her time, and they close the collaborative
work session. He then proceeds to complete his lesson plan by
incorporating the plate tectonic analysis.
3 SUPPORTING SYNCHRONOUS COLLABORATION
The functionality described in this scenario has been achieved by
merging two pre-existing systems, each designed to support differing
forms of collaboration. WWW provides excellent, asynchronous worldwide
access to information. SRI International's COllaborative Multimedia
Environment Technology (COMET) [CRAI] provides synchronous
collaboration in the form of shared access and control of unmodified
X11 applications and collaborative network audio. The combination of
WWW and COMET provides a mechanism that permits a user not only to
browse through a wealth of static information, but also to contact the
authors and discuss this information with them as a natural extension
of the browsing process.
3.1 World-Wide Web
WWW supports asynchronous interaction among authors who publish
information and browsers who view it. If WWW is being used as a
repository for information produced in common (e.g., a paper written
by geographically separated members of a project team), the
interaction among authors who work at different times constitutes
asynchronous collaboration.
The data exchange protocol that WWW employs, HTTP (Hyper Text Transfer
Protocol), manifests this asynchrony by employing a strict
client/server model. A client program submits an explicit request to a
server, which composes and delivers a response to that client. While
the data delivered may constitute products of the author, the author
is not directly involved; rather, the HTTP server acts as a proxy for
the author in these transactions.
3.2 SRI International's COMET
COMET supports synchronous interaction among users who work together
to review and produce information. The goal of COMET is to create a
collaboration environment in which users can consult one another with
the ease of a telephone call, and can include in the conversation the
active use of their own unmodified applications.
Because any conference participant may originate a shared action
(e.g., speaking, gesturing, or modifying a shared document), COMET
uses peer-to-peer communication to permit this real-time interaction.
A client/server paradigm, such as the one employed by WWW, would
restrict the natural multidirectional flow of information that occurs
in a conversational paradigm.
3.3 Bridging the Gap Between Systems
Within the WWW client/server model, the server never transmits
unsolicited data to a client. However, within the COMET peer-to-peer
model, data may flow freely among peers. To bridge the gap between the
two models embedded in each system, an adjunct process was introduced
that acquires data through the standard WWW client/server mechanism
and transfers it to the peer-to-peer collaborative environment.
Thus, the data retrieved from an HTTP transaction are treated not as
an end result, but as the starting point for
collaboration. Specifically, conference configuration information
stored at an HTTP server serves as the basis for initiating,
terminating, or otherwise modifying the configuration of a
collaborative session.
The steps involved in the initiation of a collaborative session
between a browser and an author is shown in Figure 1 and described
below.
FIGURE 1. COLLABORATIVE SESSION INITIATION
- Browser Selects "Contact Author" Option.
The browser clicks on a hypertext link in the current Mosaic document
that advertises the ability to contact the document's author. Mosaic
submits a request to the HTTP server to resolve the selected hypertext
link.
- HTTP Server Delivers Contact Information.
The HTTP server accepts the request and retrieves the appropriate
conference configuration, drawing from auxiliary information the
author stored in parallel with the original HTML document. This
information is then delivered to Mosaic as a virtual document
containing conference configuration data.
- Mosaic Forwards Data.
Mosaic accepts the reply, identifies the helper application registered
to handle conference configuration data, and forwards the data to that
application.
- Helper Application Forwards Data.
The helper application adds the browser's contact information to the
conference configuration information, then passes the information to
the browser's CCMA.
- CCMAs Connect.
The browser's CCMA contacts the author's CCMA, then passes the
conference configuration information to its peer. By sharing the
conference information, all participants are informed of who will be
involved and what resources are needed to participate in the
conference.
- Invitation Issued.
The browser's CCMA issues an invitation to the author's
CCMA. Accompanying the invitation is a description of what actions the
author's CCMA should take based on the author's response.
- Author Responds to the Invitation.
A conference invitation appears on the author's screen. The author may
accept, decline (with an optional reason), or fail to respond (in
which case, the invitation times out).
- Author's CCMA Responds.
On receiving the author's response, the browser's CCMA either
establishes a conference or informs the browser why the author could
not join the conference.
- CCMAs Establish Shared Workspace.
As part of establishing a conference, the browser's and author's CCMAs
collectively establish a shared workspace. Each CCMA ensures that the
required collaborative media agents are available, properly
configured, and active.
3.4 Shared Workspace Description
The result of establishing a collaborative session is the creation of
a "shared workspace" that permits users to talk to each other (via a
network audio channel) as well as see and interact with each other's
applications. In this way, they can not only share control of an
application (e.g., cooperatively review a WWW document within Mosaic),
they can speak, gesture, and point as if they were working at the same
computer (e.g., say "What does this mean?" while gesturing at a
sentence in the document using the mouse).
As shown in Figure 2, the shared workspace is implemented using SRI
International's X11 COMET Pseudo-Server (XCPS) and Lawrence Berkeley
Laboratory's Visual Audio Tool (VAT), along with applications that
represent the work products of the collaborators. Both XCPS and VAT
are configured and managed by the CCMA.
FIGURE 2. SHARED WORKSPACE
The CCMA serves as a proxy for an individual user and responds on that
user's behalf. It communicates with other CCMAs, each representing a
different user, to establish and dynamically reconfigure multimedia,
multiparty sessions among media agents.
XCPS supports synchronous multiuser access to unmodified single-user
X11 applications by managing tool I/O to and from the keyboard, mouse,
and display. It ensures that input to single-user applications is
generated by one user at a time. This is accomplished by an
activity-sensing contention resolution algorithm (i.e., floor control)
that permits users to share applications in an informal and
conversational manner [GARC].
Recalling the usage scenario described above, Figure 2 shows that the
browser's Mosaic window, running through XCPS, is also seen on the
author's screen, providing her with context from the browser's
workspace. Similarly, the author has made a window from another
application available to the browser in order to illustrate a
particular concept. Although Alejandro and Ms. Tijara may be hundreds
of miles apart, VAT allows them to talk and XCPS allows them to view
and cooperatively control applications as if they were both working in
the same office.
4 SYSTEM INTEGRATION
Existing configuration and extension mechanisms within the WWW
architecture were used exclusively in bridging the gap between the
asynchronous document browsing provided by WWW and COMET's synchronous
collaboration capability. No modifications were made to the HTTP
server or Mosaic client code. Specifically, the bridge consists of the
following elements:
- Server extensions in the form of data stored at the HTTP
server plus a Common Gateway Interface (CGI)-compliant script [MCCO],
- Client extensions to support a new Multipurpose Internet Mail
Extensions (MIME) [BORE] content type, and
- Helper application.
4.1 Server Extensions
Conference configuration information is stored within the server,
which lists both the contact information for the author(s) of the
document and the components that make up the conference itself (e.g.,
collaborative audio tool, shared workspace support application). To
simplify the creation and use of this conference configuration
information, the server employs templates that contain a variety of
parameters naming:
- The conference itself (e.g., PlateTectonicAnalysis)
- The conference operation to be performed (e.g., Initiate)
- Any scoping on that operation (e.g., TwoWay).
These options are implicitly selected when the browser traverses the
associated document hypertext reference; e.g.,
/ccma-bin/ccma-script?PlateTectonicAnalysis+Initiate+TwoWay
In turn, this invokes the CGI script that validates the conference
identifier and options, constructs a virtual document containing
information described by these parameters, and returns it to the
client.
4.2 Client Extensions
We extended the client by defining a new content type to permit
delivery of conference configuration information to a browser's CCMA,
and exploited the ability of Mosaic to map any content type to a
particular support application. All conference configuration
information is transmitted using our own MIME content type,
application/x-ccma-configuration, and associated data
format. This approach ensures that the information is passed to our
collaboration helper application on the client's side.
4.3 Helper Application
The role of the helper application is to inject data from the WWW
client/server paradigm into the peer-to-peer synchronous collaboration
environment. The helper application is invoked by Mosaic as a side
effect of resolving the application/x-ccma-configuration
content type. The helper application dynamically determines the name,
host address, and X11 display of the browser, and then determines how
to contact the browser's CCMA. The helper application then establishes
a connection to the browser's CCMA and delivers the static conference
configuration information (i.e., the virtual document created by the
HTTP server), along with dynamically determined browser contact
information. The CCMA uses this information in peer-to-peer
communication with other CCMAs to initiate, terminate, or augment
collaborative sessions.
5 SECURITY
The major source of flexibility in the WWW framework is the ability to
tailor server and/or client behavior through the use of scripts and
helper applications. Unfortunately, the features that provide this
flexibility also have the potential to be misused by malicious clients
and servers.
In particular, the issue of security must be addressed any time a
command shell or interpreter is invoked by and passed data from a
potentially untrusted source. This situation occurs when an untrusted
client sends requests to a server, or a client receives responses from
an untrusted server.
5.1 Server-Related Security
We have added conference configuration validation to our CGI
scripts. Because the hypertext reference that a client requests has
embedded conference parameters (each implicitly mapped to a data file,
some of which are interpreted as shell scripts), a malicious client
might attempt to access other data by modifying the specified
hypertext reference. However, the CGI scripts validate all conference
options against a table of allowable configurations. Furthermore,
because the options implicitly map to files, all filenames are
inspected to ensure that they do not access data from outside the
server's conference configuration area.
While an author might wish to make a document generally available, he
or she may also wish to restrict access to certain conference options
to a limited audience. However, note that access control must be
performed by the server. Thus, the CGI scripts grant or deny access to
a conference option based on the requesting host.
If an error or an access denial occurs, the CGI scripts generate
virtual HTML documents to inform the user of the problem.
5.2 Client-Related Security
With any application that processes MIME documents, care must be taken
when choosing to interpret any data that can trigger the execution of
a script or process. While conventional wisdom dictates that users
always closely examine "enabled mail" components, this inspection step
is cumbersome and, in the current context, defeats the goal of
real-time collaboration.
Nevertheless, our proof-of-concept implementation transmits both shell
scripts and Tcl commands [OUST] from the server which are interpreted
by the client without user inspection. However, because this is
obviously an unsecure implementation, access to this WWW extension has
been limited to a few trusted users and hosts.
6 SCOPE AND FUTURE DIRECTIONS
The current system, which requires UNIX workstations running the X11
window system, is a proof-of-concept prototype that allows a browser
to "drop in" and collaborate with the author(s) of a given
document. It has been used internally by engineers at SRI
International in an extended user study focusing on informal
synchronous collaboration, as well as for several cross-country
demonstrations.
While this prototype has met the goals of the original effort, several
enhancements have been identified that will extend the usability of
the system:
- Allowing multiple browsers to connect simultaneously to more
than one author (in the current model, one browser may connect to one
or more authors).
- Allowing "latecomers" to enter an existing session
- Extending HTTP server to report state (and existence) of
ongoing COMET sessions.
BIBLIOGRAPHY
[BERN] Berners-Lee, Tim, Robert Cailliau, Jean-Francois Groff,
and Bernd Pollermann, "World-Wide Web: The Information
Universe" Electronic Networking, Vol. 2, No. 1,
pp. 52-58, Spring 1992.
[BORE] Borenstein, N. and N. Freed, "MIME (Multipurpose Internet
Mail Extensions): Mechanisms for Specifying and Describing
the Format of Internet Message Bodies," RFC 1521.
[CRAI] Craighill, Earl, Ruth Lang, Martin Fong, and Keith Skinner,
"CECED: A System For Informal Multimedia Collaboration,"
Proceedings of the ACM MULTIMEDIA '93 Conference,
Anaheim, CA, August 1993.
[GARC] Garcia-Luna-Aceves, J. J., E. J. Craighill, and R. Lang,
"Floor Management and Control for Multimedia Computer
Conferencing," Proceedings of MULTIMEDIA '89, 2nd IEEE
Comsoc International Multimedia Communications Workshop,
Ottawa, Ontario, Canada, April 1989.
[MCCO] McCool, Rob, http://hoohoo.ncsa.uiuc.edu/cgi/interface.html,
"The CGI Specification."
[OUST] Ousterhout, John K., Tcl and the Tk Toolkit,
Addison-Wesley Publishing Company, Reading, MA, April 1994.
VITAE
Thane J. Frivold is a Software Engineer in the Augmented
Collaborative Environments Program at SRI International. His
specialized professional competence includes applied research, design,
and implementation in the following areas: collaborative and
multimedia systems, X11 programming, geographic database and display
systems, graphical user interfaces, distributed databases,
object-oriented design and development, and systems and network
programming. Since joining SRI in 1986, Mr. Frivold has participated
in the design and implementation of a multimedia conferencing system
environment; led the development of a geographical map display/data
fusion system; implemented dynamic control capabilities for a
distributed database system; developed encoding schemes and decision
analysis methods using fuzzy set paradigms; and developed analysis
tools for study of magneto-encephalography brain wave
data. Mr. Frivold received his B.A. in Computer Science from Dartmouth
College in 1986.
Ruth E. Lang is a Senior Computer Scientist in the Augmented
Collaborative Environments Program at SRI International. Her
specialized professional competence includes applied research, design,
and implementation in the following areas: multimedia and
collaborative systems, multimedia user interfaces, directory services
(X.500), and distributed, interactive applications and
environments. Since joining SRI in 1985, Ms. Lang has led projects for
research into multimedia information and collaboration systems,
fielding of X.500 in the ARPA Internet, and application of artificial
intelligence technology to solve problems with the DoD Directory based
on X.500. She has participated in the design and implementation of the
following systems or components: a multimedia conferencing system
prototype, a multimedia collaborative editor, integrated system
architecture for distributed interactive applications, and an
object-oriented application that supports military situation display
and assessment. Ms. Lang received a B.S. in Mathematics and Computer
Science at University of California at Los Angeles in 1981, and a
M.S. in Computer Science from the University of California at Berkeley
in 1982.
Martin W. Fong is a Senior Software Engineer in the Augmented
Collaborative Environments Program at SRI International. His
specialized professional competence includes computer graphics,
scientific applications programming, utilities programming, software
design, and systems analysis and programming. Since joining SRI in
1979, Mr. Fong has performed component and system design and
implementation for a real-time multimedia conferencing system;
designed and implemented a structured, two-dimensional,
object-oriented graphics package and editor; served as senior designer
for a C3I application for situation map displays; served as principal
designer of a mainframe-independent data tape format and data access
package software for multicontractor use;, and led a team of engineers
in developing a compiler, linker, and run-time executive for a
special-purpose computer language. Mr. Fong received a B.A. in
Astronomy in 1976 from the University of California at Berkeley.
CONTACT INFORMATION
Thane J. Frivold, (415)859-2786, tfrivold@std.sri.com
Ruth E. Lang, (415)859-5608, rlang@std.sri.com
Martin W. Fong, (415)859-4251, mwfong@std.sri.com
Augmented Collaborative Environments Program
SRI International
333 Ravenswood Avenue
Menlo Park, CA 94025
This paper was approved for public release with unlimited distribution
by ARPA Directorate for Security Review, OASD (PA). The views and
conclusions contained in this document are those of the authors and
should not be interpreted as representing the official policies,
either expressed or implied, of the Advanced Research Projects Agency
of the U. S. Government. This work was conducted under Contract
MDA972-92-C-0023, for DARPA Initiative in Concurrent Engineering
(DICE).