WWW5 Fifth International World Wide Web Conference
May 6-10, 1996, Paris, France


PageSpace: An Architecture to Coordinate Distributed Applications on the Web

Paolo Ciancarini
Dept. of Computer Science; Univ. of Bologna; Pza. di Porta S.Donato, 5; I-40127 Bologna
cianca@cs.unibo.it
Andreas Knoche
Technische Universität Berlin; Project KIT-PageSpace; FR 6-10; Franklinstr. 28/29; D-10587 Berlin
knoche@cs.tu-berlin.de
Robert Tolksdorf
Technische Universität Berlin; Project KIT-PageSpace; FR 6-10; Franklinstr. 28/29; D-10587 Berlin
tolk@cs.tu-berlin.de
Fabio Vitali
Dept. of Mathematics; Univ. of Bologna; Pza. di Porta S.Donato, 5; I-40127 Bologna
fabio@cirfid.unibo.it

Keywords: Java, Linda, Coordination, Web Applications, Open Distributed Systems

Abstract

Most Applications on the Web require active processing and coordination of services and components. Today, activity within the Web is tied to server machines and there is no integrated mechanism that allows it to coordinate activity located at clients, such as applets. In order to allow for really distributed application in the Web, such coordination platforms have to be built.

The PageSpace is a platform to support open distributed application on top of the Web. It utilizes Java to execute distributed agents that coordinate their exchange of services by Linda-like coordination technology. The PageSpace architecture comprises a set of agent classes. The user-interfaces is manifested are Alpha agents displayed in Web browsers. The representation of the user on the net is its homeagent, called Beta, which uses services on behalf of the user. Applications are formed by Delta agents that offer and use services. The coordination amongst agent is performed using a shared space of information and Linda-like primitives that operate on it.

With the PageSpace architecture, distributed applications on top of the Web and the Internet are enabled, as the platform decentralizes activity. By combining coordination technology with the Web and Java, the centralized, server-bound structure of todays Web-applications is replaced with a truly open distributed system.

Introduction

Since 1993, the Internet has been rapidly popularized and commercialized. The availability of the World Wide Web as a widely spread platform across all relevant hardware platforms has made the Internet the dominating networking platform today. Applications supporting collaborative work and distributed information services are enabled by networking. Within the Internet, the World Wide Web has become the standard integrative platform to access Internet services, which is best suited to build applications on top of it. However, the Web in its current state does not provide enough support for applications, such as groupware or workflow modeling, as its basic nature is that of a passive information system.

Activity within the Web is tied to server machines that are able to execute code using the CGI mechanism. Modern browsers that support applet languages, such as Java, allow activity at the user interface established by browsers which in turn are clients to Web-servers.

Currently, there is no integrated mechanism that allows it to coordinate activity tied to multiple, distributed clients, that together form an application. Coordination has to be done centralized at some server to which all users participating in an application have to connect to. Thereby, the activity located at the browser does not really make the application distributed, as some applet at the users browser can provide an active user-interface, but cannot connect to other applets providing services to them directly.

In this paper we present the architecture of the PageSpace, a platform to support distributed applications. It is organized as follows. First, we discuss the requirements for such a platform and review the building blocks of PageSpace. We then describe, how a distributed application in the PageSpace is composed of agents and discuss, how the PageSpace is engineered.

Supporting Distributed Internet Applications

The PageSpace provides a platform on top of the WWW to support applications and to make them available using the standard Web interface. It does so by introducing a notion of active entities - the agents - that are executed somewhere in the net. These agents are able to use and provide services from an to other agents, without requiring centralized coordination from servers.

The PageSpace has to be able to cope with the requirements of open distributed systems to provide an infrastructure in which agents use and offer services from and to others. As these agents work concurrently, communication and synchronization - in one word coordination - amongst them becomes necessary. A solution to the coordination problem in open systems has to provide the ``glue'' that holds together the components and enables them to cooperate.

Distributed PageSpace applications involve heterogeneous machine-, network- and operating-system architectures. Coordinating agents in such an environment means to make these heterogeneities transparent to the programmer. Transparency of network heterogeneity is achieved by using the Internet underlying the communication platform. Transparency of different hardware architecture and operating systems is provided by the Java language as one building block of PageSpace. It masks machine heterogeneity by introducing a virtual machine which executes Java programs in a non-native byte code representation.

An open system also exhibits a potential high dynamics by unrestrictedly joining and leaving agents. An open distributed system has no time of beginning or end and is formed by the agents that currently join it. There should be no restriction for agents on when they join or leave, for example to allow their replacement by new versions.

Coordinating open systems means to avoid restrictions and to provide mechanisms that can deal with these dynamics. In the PageSpace, this is achieved by using coordination technology as the final building block. It allows it to mask the absence of an agent by asynchrony, and to abstract from the presence of a specific agent by associative addressing.

Distributed cooperative applications usually support asynchronous collaboration. This requires, that its users can leave of shutdown the user interface without affecting their participation in the application. PageSpace therefore introduces a clean distinction between the agent which performs activity on behalf of the user, and the access to this agent and the results it has achieved. This is manifested by uncoupling the access interface - located in a running Web browser - from some agent process which is active all the time and represents the user on the network.

Open distributed applications work at a very large scale - potentially world-wide. The PageSpace architecture aims at a design which is scalable by introducing as much distribution and mutual independence of agents as possible. The platform will be installed at the sites of the participating institutions, providing a testbed for our approach at a European scale.

The Building Blocks of PageSpace

The PageSpace is build on a set of existing technology:

The PageSpace integrates these basic building blocks and thereby adds value to them. In the following, we describe these building blocks in detail.

Coordination Technology

Coordination technology has been initiated by the language Linda ([3]) developed at Yale University. The underlying basic conception of uncoupled coordination has been subject to a large variety of research projects, focusing on parallel, distributed and open distributed systems, on its theoretical foundations by giving semantics to coordination languages, and on a number of implementation oriented research concerning the embedding of coordination languages into programming languages and their efficient implementation.

Linda is a language for parallel and distributed processing that is based on uncoupling by means of a a shared data space, called the tuple space. It provides particular operations on the tuple space which together form a coordination language ([4]), i.e. a language focused on: (1) the creation of activities, (2) their synchronization, (3) the communication amongst activities.

The tuple space is a collection of finite ordered sets, called tuples (such as <10,'a'>). The fields of a tuple have a type and are either an actual value or a placeholder for a value of a given type, in which case they are called formals (such as <?int,'a'>). A tuple containing formals is also called a template A tuple is placed in the tuple space by an agent performing an out operator. From there it can be read or withdrawn by other agents using the rd or in operators, which both take a template as an argument. The tuple space is then searched for a tuple that matches the template. Matching tuples have and identical number of fields which are pairwise of the same type. For actuals the values must be identical, whereas formals match any value of the given type.

If no matching tuple is available, in and rd block, until some other agent performs an out with a matching tuple. The agent receives it as the result of the in or rd operation. In an embedding of Linda in a concrete programming language, the formal fields of a template can be related to program variables (such as <amount:?int,'a'>), which are then bound to the respective values from the matched tuple.

The blocking of in and rd together with the value-binding to formals provides a powerful combination of synchronization techniques with communication. The fourth operation of Linda, eval, creates new processes by means of active tuple-fields that denote functions to be computed. After their parallel evaluation, a tuple containing the results is put into the tuple space.

Linda is specific to the requirements for parallel processing, namely the assumption of a single program to be executed, and the strive for fast processing. For the PageSpace, these assumptions do not hold. The set of agents and their structure is not known in advance, as we are aiming at open applications, in which components can be added and removed at any time without the possibility of predicting their presence or absence. Also, networked applications do not mainly rely of optimized components, but on an efficient orchestration of their cooperation.

However, the key-concepts of Linda are well applicable to reach the goals of PageSpace in supporting open distributed applications. The following list identifies characteristic concepts of Linda that we take as key-issues for solutions of the coordination problem in open systems:

Laura

Laura is a coordination language for agents in open distributed systems that offer and request services via a service space. An agent willing to use a service puts a request in the service space and gets the results returned. The service space is a large collection of forms describing service offers, requests, and their results that are accessed by the three operations of Laura.

Let a functionality that looks up the e-mail address of a person in some phone book be the desired service. In Laura, a request for such a service is issued with the service-operation: service(<PHONEBOOK, GETEMAIL, "Alice Liddell", ?email>).

PHONEBOOK is merely a textual macro for the service description (GETEMAIL: string -> string) to shorten the examples. It does not introduce a name, whereas GETEMAIL is a name for an operation, respectively. Service descriptions contain the interface of a service, listing the available operations and their argument- and result-types.

The example puts a form in the service space, consisting of a service description, an operation name, an argument according to the type defined in the service description for the operation and a formal to take a result into a program variable. The service space abstraction ensures that this form will be taken by an agent performing the service and replacing the formal result fields with actual result values. Upon completion of service these are results bound to the program variables. service is blocking and therefore synchronizes the requesting agent with the existence of the service-result.

To perform a service, an agent tries to retrieve a request from the service space with serve(<PHONEBOOK, ?op, ?person, ?string>).

This blocking operation will return upon the presence of a matching request-form in the service space with arguments bound to the program variables of the serving agent. It then performs the service and delivers the result with result(<PHONEBOOK, GETEMAIL, person, email>) .

The central idea of Laura's service space is to apply a Linda-like matching process on the forms produced by the operations. However, Laura does not use a shared space of data, but one of service requests and offers. Thus, matching is applies to the service description. A system of interface types is defined for Laura, which includes a contravariant subtyping relation. Matching in Laura is based on this relation: A service request matches an offer, if the type of the interface of the offered service is a subtype of the one requested.

Thereby, offers of services that offer more operations or allow less arguments than the requested ones match requests. This means, that the request for a specific service can be satisfied by agents that offer more general set of operations.

These three operations are a very abstract model to describe service coordination. In order to program agents, they have to be embedded in a programming language, which provides the computational power in addition to Laura's coordination functionality. In PageSpace, this is Java enhanced by a PageSpace class.

Shade

Shade is an object-based coordination language. It offers a basic abstraction called the object-space, that is similar to a tuple space with the difference that it contains objects. In fact, the Object Space is a distributed collection of objects and messages. Each object encapsulates a state in form of multiset of tuples and methods in form of rewriting rules.

ShaDe objects are active, i.e. they are units (places) of computation, having the ability to react to the reception of messages sent by other objects with an internal - parallel - activity defined by methods. The state of an object is a multiset of tuples, so that the object itself can be considered as a tuple space, whereas the object space is a meta tuple space supporting inter object associative communication. In fact, ShaDe objects can use several types of communication mechanisms: namely unicast, multicast, and broadcast.

For the PageSpace, the most interesesting feature of Shade is that coordination is expressed by rules. We intend to exploit such a feature to build "coordination" services enacting declarative cooperation laws. Instead, (sequential) computation can be expressed in any programming language. In the first prototype Shade was matched with Prolog, to obtain a distributed logic programming language. To implement the basic PageSpace abstractions we are currently implementing Shade as the combination Java/Linda (Jada).

Web and Java Technology

The World Wide Web, the most popular Internet service, was developed at CERN in 1989. Initiated by the free distribution of Mosaic, Web-technology is available wherever an access to the Internet is provided. We understand the World Wide Web as a distributed system of communicating agents. They form an information system guided by three simple conventions.
  1. Communication of agents is standardized by a set of well-defined protocols (e.g. HTTP, CGI, etc.). HTTP establishes the uniforms information transport mechanism based on a fixed set of methods. CGI defines an interface between a Web-Server and a program written in an arbitrary language and executed on the same machine.
  2. Resources are referenced by a global naming scheme, the Unified Resource Locator (URL).
  3. The logical structure of a text is described using the Hypertext Markup Language (HTML) that offers the possibility to link related documents.

With respect to PageSpace the World Wide Web is the platform of choice. Its popularity provides a uniform access to applications, its platform independence allows uniform user-interfaces for PageSpace applications.

The Java programming language was designed to be as simple and small as possible but with the power of a class-based object-oriented language. In 1990 Sun started to define Java based on C++, but without its unclean and unsafe features. New principles and structures were inherited from other object-oriented languages such as Eiffel, or SmallTalk. The massive growth of the WWW soon demanded a portable and distributed programming platform to embed applications in the Web. It turned out that Java easily could match these requirements and its release became the major Web-related event in 1995.

The Java compiler translates sources to an intermediate machine-independent and platform-independent byte-code which is interpreted by a Java virtual machine. Because of the architecture-neutral definition of the intermediate code, only the virtual machine has to be adopted to a specific machine and operating system. Further advantages arise from the run-time linking which can retrieve required classes from the net.

By loading applets embedding in HTML documents across the network, security becomes an important issue. To restrict programmers from creating ``subversive'' applets, several layers of defense are provided. The byte-code is checked for invalid access to local system resources (memory, file-system, etc.), the loading of classes is always performed in a consistent and save manner, and the networking packages released with Java are configurable to local security requirements.

In respect of the outlined qualities, Java is the language of choice for the PageSpace:

  1. Hardware independence is a basic demand open distributed applications.
  2. Embedding applets in documents enhances the user-interface of PageSpace applications.
  3. The runtime linking provided by the Java Interpreter will simplify the transport of agents.

PageSpace uses Java as the implementation language for the platform, as well as the application programming language enhanced with a high level coordination support for distributed, concurrent agents to Java.

The PageSpace adds value to the underlying building blocks by integrating them in a manner in which the extensive reuse and combination of existing and wide-spread complementary technology leads to a platform that aims at supporting real-life applications. Our guideline is to add the glue to bind these technologies together, rather than extending them individually. We expect to achieve the following:

By using a highly abstract coordination model, we hope to ease the programming of distributed application on the Internet, as it incorporates a very small set of operations and meets the coordination needs of applications, not of their technical platforms.

The PageSpace Architecture

The architecture of the PageSpace can be described from two viewpoints. The user and the programmer is provided with an abstract conception in which distribution and heterogeneity are invisible. The PageSpace platform itself has to establish this abstraction and is concerned with its efficient implementation.

We identified the centralized nature of the current Web as its major drawback: All actual processing for the application is tied to the server-machine. Also, in most advanced applications on the Internet, the WWW is only used as a static access mechanism to an existing external environment: DCE components, CORBA objects, DBMS applications. The PageSpace architecture distributes the activity by incorporating coordination technology and decentralizes applications.

The combination of coordination technology and the Web is pretentious. Coordination languages and the WWW belong, conceptually and technically, to two completely different and independent worlds, with different properties and benefits. The WWW emphasizes a strong distinction between clients and servers:

This means that the HTTP protocol is directional, stateless, simple (one-step); it is conceived for interaction mechanisms where at the one end there is a software service-provider, and at the other there is a human requesting the service. Therefore it is a user-centered protocol, since it gives control, initiative and freedom to the human end, and sets the software end to dependence.

On the other hand, any tuple space like implementation implicitly denies any difference between clients and servers, thereby implying a completely different situation:

This means that a coordination protocol is bidirectional (peer-to-peer), stateful, complex. These protocols are conceived for software tasks, where control and initiative depends on the state and type of computation, not on a fixed authority.

The Agents of PageSpace

The PageSpace is populated by agents. Some of them are concerned with the application itself, others have more specific tasks. We distinguish the following entities in the PageSpace, each denoted by a different Greek letters:

Figure 1 shows the logical structure of a PageSpace application.

Figure 1: The logical structure of a PageSpace application

An Alpha interface connects via HTTP to the associated Beta agent. This carries out interactions within the PageSpace with other Delta agents on behalf of the user. It uses in/out or service/serve-result coordination provided by the Gamma environment. A Zeta provides a gateway to other worlds, here the CORBA-world. The Epsilons can provide special services to the Deltas which are mainly location dependent.

All agents (probably except Alpha) are programmed in Java using special PageSpace classes and executed as byte-code by Java virtual machines. In the following sections we describe their respective characteristics.

The Alpha Interface

The Alpha interface is the way data from agents is displayed to the human user. It can be as simple as a plain HTML document, or as complex as a heavily graphical Java applet. Alpha depends on the capabilities of the machine of the user, let it be a Newton PDA with a simple browser, or a Java enabled browser on a workstation.

While an Alpha is only interested in displaying data, all computations are performed on the corresponding Delta agent within the PageSpace. This is a way to rethink the role of clients and servers, and applying them to the coordination system as well: the Alpha does the interface, while the Delta does the coordination with other agents. There is a complete freedom in thinking of the Alpha interface. Yet, a few issues must be faced:

This behavior leads to a fixed set of messages that Alpha and Beta exchange. This Alpha-Beta protocol is implemented within standard HTTP-mechanisms and relies on form-interaction and CGI-scripting. Our initial design of Alpha divides the general interface in three parts:

Whenever the user selects a message in the Message Board, the appropriate interface is shown in the Page panel, and the actions on it are performed. Figure 2 shows an Alpha interface displaying the page of a poker playing application.

Figure 2: An Alpha interface for a poker application

The Beta Homeagent

We expect that the user will access the PageSpace with a WWW browser. Yet, there is no way for a WWW browser to be called by the WWW environment, since the HTTP protocol is directional and subject to the initiative of the client.

As previously stated, this is considered positive since it allows human users to maintain control of the interaction and leave them autonomous and free of coming and going and doing other things at the same time.Therefore, a different strategy of communication must be employed. An intermediate, specialized agent is built, whose task is to collect messages and requests from the various user's agents and deliver them to the user on request. Furthermore, it collects all responses from the user and distributes them to the appropriate agents.

The role of Beta is thus that of the avatar (Stephenson, Snow Crash), a software agent that acts the part of the human user within the PageSpace, and is the single access point for her to the shared environment.

The Delta Agents

We are supposing that several independent applications run within the same PageSpace. For instance, some players are playing poker, others are taking part to a distributed auction. Within the PageSpace, only agents interact - whose user interface is transported to Alpha. The autonomously interacting agents that are part of applications, are called Delta agents and reside constantly on the PageSpace, where they use and provide application specific services. For instance, there will exist a poker agent for every player, indirectly connected with the human user via Beta and Alpha and coordinating with the other agents in order to play the game. The software agent can be assumed to be reliably accessible by other agents for the whole length of the game, while the user can come and go at her leisure.

Deltas can exhibit a varying degree of autonomy for responding to other agents requests even in the absence of the human master. The response can range from a simple "My master cannot respond" to a more complex and autonomous behavior. For instance, a user can instruct her auction agent to wait until the object she is interested in is being offered and then perform offers of $100 higher than the current level until the maximum price of $5000 is reached, and then leave.

The communication amongst Delta agents (and Betas) uses the coordination technology in PageSpace. Thus, messages are exchanged by an in/out mechanism to a shared information space. The Delta-Delta protocol is application specific: A poker agent will speak a Poker-game-protocol, while a chess Delta speaks a chess protocol, each requesting or providing poker or chess services from others. In addition to these application specific messages, a basic set of common services is defined for Deltas, such as telling others about their location.

The Zeta Gateways

The Zeta gateways provide access to other external environment, such as NNTP news, CORBA objects, DBMS applications, etc. Similarly to Betas, they provide an interface between an external entity and the PageSpace, and must both talk the PageSpace protocols and the specific external one.

Engineering the PageSpace

While the previous section described the architecture of PageSpace applications from the perspective of the programmer (and the agents), the platform has to establish the abstractions provided. The platform forms an environment that is realized by the Epsilon agents.

The Epsilon Agents and the Gamma Environment

Epsilons are agents that have administrative tasks within the PageSpace. They are able to do privileged things that are prohibited to Delta and Betas. Each machine belonging to the same PageSpace has exactly one Epsilon agent constantly running.

Epsilon has two faces. On the one hand, it offers a set of common privileged services to Deltas and Betas. They include:

In addition, Epsilon actually implements ``behind the stages'' the coordination structure which we call the Gamma environment. It is based on concepts from Linda, Laura, and ShaDe. The figure 3 shows the implementation of the PageSpace platform where distribution is visible.

Figure 3: The distributed implementation of the PageSpace platform

The Deltas (and Betas and Zetas) communicate indirectly by manipulating the share information space. They do so by the primitives of the embedded coordination language. On each machine, there is an Epsilon which - besides the common Epsilon-services - implements a local information space and communicated with other Epsilons to distribute the space.

This requires that each machine supporting the PageSpace has to run a privileged Epsilon. In order to allow machines without this kernel to participate in the PageSpace, the coordination classes used in Deltas are preconfigured to contact a remote Epsilon if no local Epsilon is found.

The role of Epsilon is central to our architecture. Its implementation goes far beyond just providing the common services. First, Epsilon is responsible to manage the Beta homeagents, just as a Web server manages homepages. In order to provide the access mechanism to Betas, Epsilon includes an HTTP server. And third, Epsilon includes all mechanisms concerning the coordination platform.

Internally, Epsilon starts privileged Deltas for the common Epsilon services and for the HTTP server. So finally, Epsilon is responsible for the coordination of all involved privileged, user and application agents. It does so by using standard mechanisms of coordination technology.

Fault Tolerance

An open system has to be able to tolerate failures of individual components. For the PageSpace, four sources of failures can be identified:

Mobility of PageSpace Agents

In the PageSpace architecture, all processes are executed by a Java virtual machine. Thus, the implementation and the executable representation of an agent is not specific to some hardware or operating system. Also, the coordination technology used does not introduce a concept of a specific agent location. It is therefore possible to introduce mobility of Delta agents.

The Delta-Delta protocol in the PageSpace is location independent, as other agents are addressed associatively, or by name. The Delta-Epsilon protocol is implicitly location dependent, as it addresses the single Epsilon agent colocated with the Delta agent.

In order to move an agent, there must be some mean to express the destination location of an agent. An agent should be able to ask its Epsilon for its current location. This can be achieved by a simple service of the Epsilon agents, allowing a Delta to communicate its location to other agents. Thus, any Epsilon offers a service where_am_i to the Delta agents on the same machine, which in turn offer a service where_are_you to remote Deltas. For the implementation of the location data type, the underlying communication mechanism, such as IP-numbers, or identifiers provided by midware toolkits. At least three situations may trigger the move of an agent to a different machine:

When an agent knows where to move to, it can initiate the move. A move comprises three issues: Removing the agent from its old location, transferring the agent, and starting the agent at its new location.

In our architecture, the move and transfer of state is handled by the Epsilon agents. A Delta agent uses a service move_me(state,location) of its Epsilon. Epsilon, in turn, uses the service start_agent(delta,state) of the remote Epsilon, which restarts the moved Delta agent an initializes its state. This protocol emphasizes the role of the Epsilon as the only agent that is capable of addressing another agent at some specific location. Also, Epsilon is the only agent that has to have hold of the byte code of a Delta agent.

Conclusions

By combining three existing technology - namely Web-, Java-, and coordination-technology - it is possible to build a flexible platform to support open distributed applications. The necessary mechanisms turn out to be simple and require no extensions to the building blocks used. The protocols needed seem to be simple and scalable. PageSpace adds value to its building blocks by providing a highly abstract coordination mechanism for the development of open distributed applications.

Other approaches of enhancing the Web with coordination technology are WWWinda ([2]) an the WULinda ([5]). The first is a Web accessible interface to a local Linda tuple space, the second uses local Linda-like coordination for modules within one browser and helper applications. The PageSpace approach is different in that it makes use of coordination technology to build distributed application.

The PageSpace is currently under development and will be installed at a European scale, focusing on workflow management and groupware applications. In a second phase, the project will continue to engineer the platform into a product of commercial interest for Internet-applications support. Further information on PageSpace can be found at http://www.cs.tu-berlin.de/~pagespc.

PageSpace is supported by the EU as ESPRIT Open LTR Project #20179.

References

1
S. Castellani, P. Ciancarini, and D. Rossi. The ShaPE of ShaDE: a coordination system. Technical Report UBLCS, Dipartimento di Scienze dell'Informazione, Università di Bologna, Italy, 1995.
2
Y. Gutfreund, J. Nicol, R. Sasnett, and V. Phuah. WWWinda: An Orchestration Service for WWW Browsers and Accessories. In Electronic Proc. of the 2nd Conf. on the WWW: Mosaic and the Web, Chicago, IL, December 1994.
3
David Gelernter and Nicholas Carriero. Linda in Context. Communications of the ACM, 32(4):444-458, 1989.
4
David Gelernter and Nicholas Carriero. Coordination Languages and their Significance. Communications of the ACM, 35(2):97-107, 1992.
5
W. Schoenfeldinger. WWW Meets Linda. In Electronic Proc. 4th Int. World Wide Web Conference: The Web Revolution, Boston, MA, December 1995.
6
R. Tolksdorf. Coordination in Open Distributed Systems. PhD thesis, Technische Universität Berlin, 1994.