A practical approach towards active hyperlinked documents

Eckhart Köppen and Gustaf Neumann

Information Systems and Software Techniques, University of Essen
University Road 9, 45141 Essen, Germany

Eckhart.Koeppen@uni-essen.de and Gustaf.Neumann@uni-essen.de

Abstract
This paper presents an architectural implementation for Web-based, active documents. Although several approaches for distributed, active documents exist already, we decided to establish a new model which provides more flexibility and interoperability without giving up formality. The model is based mainly on the Extensible Markup Language (XML) and makes use of the Document Object Model, Cascading Style Sheets and the Intrinsic Event Model, which are all open standards defined by the W3 Consortium.

Keywords
XML; DOM; Browsers; Mobile code; Active documents

1. Motivation

The rapid success of the World Wide Web has led to a new class of applications which are constructed using HTML [Raggett (1997)] for the user interface and CGI scripts for the application's logic. They have a more or less strong resemblance to mainframe programs: the user enters data into a form which is transferred to the Web server, evaluated and the results are passed back to the user agent. As a result, the computational load and the application logic are located entirely on the server side.

In contrast to server-centered Web applications, a client-centered application model has emerged through the use of scripting languages such as JavaScript [Netscape (1997b)]. Interfaces to the user agent and the current document exist in the form of plug-ins [Netscape (1997a)], Java applets [Gosling and McGilton (1996)] and embedded scripts [Levy (1996)]. However, with most of these solutions a number of problems exist: plug-ins are strongly tied to the chosen user agent and the client platform, the interface to the document is in all cases either non-existent or allows only the changing of text and there are still distinctions between client- and server-side application logic, making the design of applications which can operate on- and off-line difficult.

This paper present an approach for document-centered computing, which promises more flexibility and removes the distinction between server and client. The document-centered approach allows to embed program and data within a (e.g. HTML) document, thus turning the formerly passive text into an application itself. We will refer to these enabled HTML documents as active, hyperlinked documents (AHDs). With this document-centered architecture, a different application model can be implemented. Typical applications range from small programs like a bookmark page which controls its logic and appearance itself and go up to workflow management systems which contain mostly independent documents with different states and possible operations on them. More generally speaking, the possible uses of AHDs ranges from controlling the contents and layout of a single document to support of coordination and collaboration techniques. The goal of our implementation of the model is to provide a means to develop and evaluate different applications of AHDs.

2. System overview

The general functionality which is needed by this document-centered architecture covers several aspects: an interface to the user agent, flexible structuring and semantic annotation of the document (state, behavior, presentation), introspection through an interface to the document itself, a layout mechanism and a scripting language. Thus, our AHD model contains the following building blocks: Structuring, Layout, External and Internal Access and Scripting. In an implementation, the abstract building blocks have to be replaced by concrete specifications and implementations. Mainly, we will be using two standard proposals which have been introduced recently by the W3 Consortium: the Extensible Markup Language (XML) and the Document Object Model (DOM) (see [Bray et al. (1997)] and [Byrne (1997)]) . The DOM provides a clear, programming language independent interface to the document structure. Additionally, it defines interaction with the user and the user agent through the Intrinsic Event Model (IEM). As a means to annotate a document semantically, HTML is too limited. Here, XML will be used as the basis for the semantic annotations and as a structuring language. Finally, the document layout is described using Cascading Style Sheets (CSS, see [Lie and Bos (1996)]).

The techniques used in the basic building blocks of the AHD model are: XML (Structuring), CSS (Layout), DOM/IEM (Access) and Tcl/OTcl (Scripting). Note however that the use of Tcl and OTcl ( [Ousterhout (1990)], [Wetherall and Lindblad (1995)]) as the scripting language is not a mandatory characteristic of an AHD. Tcl/OTcl is chosen here because it has a reasonable balance of simplicity and capability.

The proposed model for AHDs is embedded in a system architecture which is a natural extension to the architecture of current Web-based applications. On the client side, the user agent will be extended to incorporate a runtime environment (RE). The RE provides an interface from the user agent to the AHD and vice versa. In addition, it implements introspection and execution mechanisms for the AHDs. Related to the RE are the presentation and networking layers of the user agent, which handle the display and transport of AHDs. On the server side, the RE will be incorporated into the Web server (this step will not be discussed in this paper), so that certain functions in an AHD can be executed prior to the transfer of the AHD to the user agent. An AHD could also reside in the Web server for a longer time, responding to requests itself.

Although this architecture resembles a typical agent meeting place (see [White (1996)]), our model goes further by defining not only state and behavior of an AHD but aiming also at the end user through representation mechanisms like CSS (layout, aural attributes) and incorporation of a Web browser , thus letting the user interact with an AHD. The basic components needed to implement our model of AHDs will now be described.

2.1. XML

The Extensible Markup Language (XML) is very closely related to HTML and is a subset of the Standard Generalized Markup Language (SGML, see [Goldfarb (1990)]). Like SGML, XML documents are composed of physical units (entities) and have a logical structure which is formed by elements. Elements are declared in a Document Type Description (DTD) and marked with start- and end-tags in the document.

For our approach, the main advantage of XML over HTML is the ability to declare elements which are needed to form an active document and to describe state, behavior and presentation separately. In comparison to SGML, XML is explicitly targeted at the World Wide Web and more widespread support from content creators and software developers is expected.

2.2. DOM

The structure of HTML and XML documents can be accessed through the Document Object Model (DOM). The DOM describes the components of a document in terms of nodes which are organized in a tree. The more interesting node types are text, elements and attributes. The interface to the different nodes is described via the Interface Definition Language (IDL) from the Object Management Group [OMG (1997)]. An interface definition contains the attributes and possible operations for the node types. Corresponding definitions can be derived for various languages, e.g. Java. Among the operations defined for the node types are operations to create new elements, manipulate their list of children and modify their attributes.

2.3. Intrinsic event model

To map events to behavior, we use the Intrinsic Event Model (IEM) defined in the DOM. Events are tied to the element where they occur. If an element does not process an event, it is propagated to its parent element. Any activity is triggered by external events. They can be grouped into user events and events caused by the runtime environment. User events are pointer events (motion, clicks), keyboard events (key pressed, key released), form related event (list selections, text changes) and focus changes. Events triggered by the RE are document loading and document unloading.

2.4. Style sheets

Since XML defines only the structure of a document, a separate layout definition is provided through style sheets. The W3 Consortium proposes Cascading Style Sheets (CSS) to be used together with HTML and XML. Style sheets imply a new set of attributes which are associated with an element. These attributes control the visible aspects of the elements, e.g. margins, borders, text styles. In addition, a style sheet can also describe aural properties for elements to enhance accessibility of a document with text-to-speech programs.

3. Architecture

To implement the infrastructure for AHDs, basic definitions are needed to describe the structure and semantics of an AHD. In addition, a mapping of the event model to the document behavior has to be defined.

3.1. Document structure

The structure of an AHD is defined in a DTD. To allow the RE to process an AHD, two approaches are possible. On the one hand, the DTD itself could declare a set of elements which are recognized by the user agent and contain any code and data required for the AHD. This would require a very detailed description since any specific state and behavior elements would have to be made proper elements. In our model, we chose a more flexible approach. In an AHD, any element can contain code and data. For this purpose, the DTD declares only two specific elements, namely <FUNC> and <VAR>. They will be explained in detail in the next section. Instances of these elements are always associated with their parent element, i.e. their scope is the parent element. This association is established by the RE, which manages the access to code and data for each element through access and modification functions. The following sample illustrates the mechanism. The first part of the example declares a DTD with the elements <FUNC> and <VAR>:


<!ELEMENT ahd o o ANY>
<!ELEMENT func - - CDATA>
<!ATTLIST func
            name     CDATA #REQUIRED
            type     CDATA #IMPLIED>
<!ELEMENT var    - - CDATA>
<!ATTLIST var
            name     CDATA #REQUIRED>

In the second part, two additional elements (<order> and <buyer>) are declared. An instance of the <order> element is created. It contains the function print_header and instance of the element <buyer> with two functions (print and duplicate) and two variables (name and email):


<!DOCTYPE ahd SYSTEM [
<!ELEMENT order - - (buyer | FUNC | VAR)*>
<!ATTLIST order
            id       CDATA #IMPLIED>
]>
<!ELEMENT buyer - - (#PCDATA | FUNC | VAR)*>
<!ATTLIST buyer
            id       CDATA #IMPLIED>
]>
<order id="o80">
        <func name="print_header"> ... </func>
        <buyer id="p80">
                <func name="print"> ... </func>
                <func name="duplicate"> ... </func>
                <var name="name">John Doe</var>
                <var name="email">doe@uni-essen.de</var>
        </buyer>
</order>

The RE associates both the functions and the variables with the <buyer> element. As a consequence, the variables and the functions can be accessed only through the <buyer> element. In this example, the <buyer> element itself can be addressed using its id attribute. The addressing scheme for elements is defined in more detail in the XML specification [Bray et al. (1997)].

The local scoping of functions and variables requires a method to facilitate access to functions and variables. We will use parent delegation to look up function and variable elements. In order to access an element through its name or id, the parent chain of the current element will be searched. Function and variable elements are associated with an element if they are a direct child of this element. In the example, the variable name can be modified from within the function print since it is a direct child of the <buyer> element. Likewise, the function print_header is visible from the function print of the element <buyer>. Here, the lookup of the function is continued in the parent of <buyer>, <order>. This element contains the wanted script as a direct child.

3.2. Structural elements

The two elements which are used to incorporate state and behavior into an AHD are <FUNC> and <VAR>. They contain either function code or variable values and are a logical part of their parent element. It is important that any other element which will contain these elements is defined appropriately by including the <FUNC> and <VAR> elements in the content declaration. Both elements have the CSS display property set to none so that they will not be laid out and displayed.

Following the declaration of the <FUNC> as seen in the first part of the example, the element declares two attributes: name and type. The name attribute contains the name of the function under which it can be called, and the type attribute describes the MIME-type [Borenstein and Freed (1993)] of the function (which is basically the programming language used). The contents of the <FUNC> element (i.e. the text between the start- and end-tag) is the code of the function. The name of a function has to be locally unique, which means that the parent element does not have any other function element with the same name as an immediate child. If another function with the same name exists, only the first function is considered.

The <VAR> element has only one attribute, name. Like the <FUNC> tag, this contains a locally unique name for the variable. The contents of the <VAR> element is the value of the variable.

3.3. Event model

The events defined in the event model are mapped implicitly to element functions. Any function whose name equals an event name will be called when that event occurs. According to the IEM an event handling function can also be defined in an attribute which has the name of the handled event. Each event handler is passed additional information about the event, e.g. the pressed key or the pointer location.

3.4. Programming interface

An element consists of five components: an attribute list, child elements, functions, variables and textual contents. The access to the attributes and the child elements from within the RE and and AHD is provided through the functions defined in the DOM. These functions can also be used to access the functions, variables and the textual contents of an element, but to ensure the parent delegation mechanism for function and variable access, convenience functions are implemented as well. These are:

setVar: sets a variable value (the textual contents of a <VAR> tag)

getVar: gets a variable value (the textual contents of a <VAR> tag)

setContents: sets the textual contents of an element. If the element contains other child elements, they will be deleted (to achieve more control over the the contents of an element, using the functions of the DOM is recommended).

getContents: gets the textual contents of an element. Note that any child elements in the content are ignored (e.g. the contents of the <p> tag in "<p>A <em>nested</em> tag</p>" is "A tag").

To preserve the state of an AHD, another function is needed:

toText: returns a textual representation of the AHD. The resulting text is an XML document with additional style information. It reflects any modifications made through DOM functions or the functions exported by the RE.

Since the execution of an element function can not be triggered by DOM functions, the RE exports another utility function:

callFunc: calls an element function

4. Implementation

The implementation of the model is mainly tied to the implementation of the user agent. We developed the extensible Web browser Cineast [Köppen et al. (1997)] to develop and evaluate novel approaches like AHDs. The main part of Cineast is written in OTcl, this is also why we chose Tcl/OTcl as the scripting language for AHDs. Cineast is currently running under Unix, but its main parts can be ported to other operating systems and the model for the AHDs is platform independent.

The basis of Cineast is the prototyping environment Wafe [Neumann and Nusser (1993)]. It combines Tcl as a scripting language with different widget sets such as the Athena widget set or the Motif widget set. In addition, other libraries are linked into Wafe, among these are SSLeay [Hudson and Young (1997)], LDAP [Howes and Smith (1997)] and OTcl. For the implementation of Cineast, a special purpose widget called Kino [Köppen (1996)] is integrated into Wafe to handle the parsing, layout and display of XML source text. It is implemented in C to achieve high performance. The roles of Wafe and the Kino widget in the different layers of the user agent are discussed below.

4.1. Runtime environment

Most of the functionality required in the RE is provided by the Kino widget. The main task of the Kino widget in the RE is to parse any source text and to maintain an internal representation of the element tree. It also implements all of the DOM functions and the additional functions needed to access the functions and variables of the elements.

The Kino widget is made up of three components: the Parser, the Layouter and the Painter. In the RE, only the Parser is of interest. It produces a tree of parsed data (PData) elements. The Kino widget uses four different types of PData elements: a generic element, a box element which can contain children, a text element and an inset element which can be used to insert other widgets (text entry fields, push buttons, list boxes etc.) and images into the element tree. The most interesting element is the box element. Besides its role as a structuring element to contain other elements, it holds the CSS attributes. The PData box elements correspond directly to the XML elements in the document, i.e. any XML element results in a PData box element. Navigation in the PData tree is possible through the DOM on the one hand, on the other hand, the components of the PData elements can be used directly through their C pointers.

The Kino widget is extensible in two ways: the application can register a tag callback, which is called whenever a tag is encountered during the parsing process. In this callback, the application can for example modify the PData tree. The second callback which is used handles the execution of scripts. The Kino widget calls this script callback whenever a script has to be executed, e.g. when an event occurs or a script is called via callFunc. This makes the Kino widget independent from the chosen scripting language because the execution of a script is handled by the application, in this case Cineast.

4.2. Presentation

The presentation layer for XML documents is implemented in the Layouter and Painter components of the Kino widget. They are not needed for the basic functionality of an AHD and can be omitted if the RE is to be incorporated into a Web server. The Layouter works together with a CSS database. The CSS database is built during the parsing process and contains all style information. Currently, the CSS database is implemented completely in OTcl. The Layouter positions any element so that no calculations are needed for display by the Painter.

Our implementation handles most of HTML 3.2, including important features like tables, images and forms. The internal layout model is box-oriented, so that a PData box results in either a block-level or inline element according to the CSS specification. In contrast to a simple flow-oriented model, boxes can be nested. The box model allows the realization of more complex layouts such as tables. The current implementation does not support incremental layout as well as absolute positioning of boxes within a text flow.

4.3. User agent application logic

The Kino widget is tightly embedded within the infrastructure provided by Wafe and the Cineast browser. The infrastructure requirements for the implementation of a browser are partly generic and partly Web specific. The generic requirements are usable in a wide range of applications. These generic components are provided either by Wafe or by libraries accessible through Wafe. For example the colour and image management (for the image formats xbm, xpm, jpg, png and gif) or event dispatching, resource management, module management are directly provided by Wafe. Generic components provided libraries accessible through Wafe are for example the widget classes from OSF/Motif, basic networking facilities from Tcl, security algorithms and protocols from SSLeay, compression from zlib, etc.

The browser specific code is exclusively implemented in OTcl and is seen by Wafe as the Cineast source code, which is loaded by Wafe at startup time of the browser. The Cineast source code defines the typical browser semantics:

4.4. Networking

One of the most demanding tasks for implementing a browser is to ensure that the browser does not block event handling. Since most of the components of the browser are not thread safe the multitasking is implemented through a select() based event loop. All interaction and I/O has to be performed asynchronously. If an I/O operation would block, for example the browser windows could not be refreshed, or only a single transfer would be possible at a time. Furthermore it would not be possible to provide an incremental loading and display which are highly useful over rather slow connections.

The Cineast source code logic is greatly determined by this asynchronous event handling. As a consequence of the asynchronicity various handler and callbacks have to be registered to process for example incoming data from sockets or to process user events in various windows and to associate the events with the according tasks. Every handler must have enough context information to continue a task in the correct environment. The Cineast browser uses widget IDs for the context selection for GUI purposes, in the other cases OTcl objects. For example for every request that is started a new OTcl object is created which registers on the input side I/O callbacks to process the input data incrementally and reports to the presentation objects (typically image or HTML/XML text) important state changes in its life cycle (when the object is created, a MIME type determined, some new data is available, end of data is reached, the request was killed).

5. Applying the model

As stated earlier the RE incorporated in the browser provides the basic platform for implement the state and behavior components of AHDs. We will show now how a document centered application based on AHDs differs form purely server or client centered approaches. We will use the selection and purchase of goods over the Internet as an example.

As the central document, the catalog with the order form can be identified and implemented with an AHD. The life cycle of the catalog AHD begins with the transfer of the catalog into the RE of the user's Web browser. The first event the catalog AHD will receive is the onload event defined in the IEM. Upon reception of this event, the catalog AHD can update its contents, e.g. recalculate prices of special offers etc. We assume that the user now disconnects from the Internet and saves the catalog AHD locally as a file which is achieved through the RE's toText function (introspection is necessary for this task). The disconnect is not strictly necessary but emphasizes the strengths of document centered applications.

The catalog AHD can later be loaded from the local file and will reside in the Web browser's RE where it can react to user input like entering amounts or selections of goods. When the user is finished making his/her selection, he/she can save the new state of the catalog AHD and the order form again to a file. If he/she decides to transmit his/her order to the online store, the catalog AHD generates an order AHD. The order AHD is transferred to the server using a HTTP PUT request where it will be saved in a file on the online store's Web server, ready for further processing.

6. Conclusion

Ideas for applications which are based on Web techniques such as HTML and CGI were presented early in the history of the Web [Houh et al. (1994)], as well as client-side extensions like [Kaashoek et al. (1994)]. Our proposed model for Active Hyperlinked Documents and the implementation of the overall system differ in two key points from existing approaches:

More important however are the next steps that have to be made to prove the potential of AHDs. First, a formal model for the use of AHDs in distributed applications has to be developed. Starting points can be frameworks like [Mühlbacher and Neumann (1996)] where the potential of active documents as tools for collaboration are discussed as alternative to legacy systems. From the architectural point of view the distinctions between user agents and Web servers is not necessary. We will work towards a clearly defined RE that can be incorporating into classical Web server or other applications as well. Finally, security issues are not addressed in this paper. We believe that AHDs have a huge potential in electronic commerce (esp. when they are combined with e.g. technologies like cryptolopes [IBM (1996)]) or in intranet application in combination with role based access control [Neumann and Nusser (1997)]. We see here a huge potential in electronic commerce application where the presented approach can be used for example to implemented active, intelligent contracts supporting the negotiations and signing process.

References

[Borenstein and Freed (1993)] N. Borenstein and N. Freed, Multipurpose Internet mail extensions, Standards Track Protocol, RFC 1521, September 1993.

[Bray et al. (1997)] T. Bray, J. Paoli, and C.M. Sperberg-Queen, Extensible Markup Language (XML), W3C Working Draft, http://www.w3.org/TR/WD-xml, November 1997.

[Byrne (1997)] S. Byrne, Document Object Model (Core) Level 1, W3C Working Draft, http://www.w3.org/TR/WD-DOM/level-one-core-971009, October 1997.

[Gosling and McGilton (1996)] J. Gosling and H. McGilton, The Java language environment, http://java.sun.com/docs/white/langenv/, May 1996.

[Goldfarb (1990)] C.F. Goldfarb, The SGML Handbook, Oxford University Press, Oxford, 1990.

[Houh et al. (1994)] H. Houh, C. Lindblad and D. Wetherall, Active pages: intelligent nodes on the World Wide Web, in: Proc. of the 1st World Wide Web Conference, Geneve, 1994.

[Howes and Smith (1997)] T. Howes and M. Smith, LDAP: Programming Directory-Enabled Applications with Lightweight Directory Access Protocol, Macmillan Technical Publishing, New York, NY, 1997.

[IBM (1996)] IBM Corp., IBM Cryptolope Home, http://www.cryptolope.ibm.com/, 1996.

[Kaashoek et al. (1994)] M.F. Kaashoek, T. Pinckney and J.A. Tauber, Dynamic documents: extensibility and adaptability in the WWW, in: Proc. of the 2nd World Wide Web Conference, Chicago, 1994.

[Hudson and Young (1997)] T.J. Hudson and E.A. Young, SSLeay and SSLapps FAQ (draft), http://www.psy.uq.edu.au/~ftp/Crypto/, September 1997.

[Köppen (1996)] E. Köppen: Entwicklung eines erweiterbaren Widgets zur Anzeige von HTML-Texten, Master's Thesis, University of Essen, Germany, 1996.

[Köppen et al. (1997)] E. Köppen, G. Neumann, and S. Nusser: Cineast – An extensible Web browser, in: Proc. of WebNet 97, Toronto 1997.

[Levy (1996)] J. Levy, A Tk Netscape plugin, in: Proc. of the 4th Annual USENIX Tcl/Tk Workshop, Monterey, 1996.

[Lie and Bos (1996)] H.W. Lie and B. Bos, Cascading style sheets, level 1, W3C Recommendation, http://www.w3.org/TR/REC-CSS1, December 1996.

[Mühlbacher and Neumann (1996)] R. Mühlbacher and G. Neumann, Towards a framework for collaborative software development of business application system, in: Proc. of the 5th Workshops of WET ICE 96, Stanford, 1996.

[Netscape (1997a)] Netscape Communications Corp.: Plug-in guide, http://developer.netscape.com/library/documentation/communicator/plugin/index.htm, May 1997.

[Netscape (1997b)] Netscape Communications Corp.: JavaScript reference, http://developer.netscape.com/library/documentation/communicator/jsref/index.htm, October 1997.

[Neumann and Nusser (1993)] G. Neumann and S. Nusser, Wafe – An X toolkit based frontend for application programs in various programming languages, in: USENIX Winter Conference, San Diego, January 1993.

[Neumann and Nusser (1997)] G. Neumann and S. Nusser, A framework and prototyping environment for a W3 security architecture, in: Proc. of Communications and Multimedia Security, Joint Working Conference IFIP TC-6 and TC-11, Athens, September, 1997

[OMG (1997)] Object Management Group: The Common Object Request Broker: architecture and specification, ftp://ftp.omg.org/pub/docs/formal/97-10-01.pdf, August 1997.

[Ousterhout (1990)] J.K. Ousterhout, Tcl: an embeddable command language, in: Proc. of the USENIX Winter Conference, January 1990.

[Raggett (1997)] D. Raggett, HTML 3.2 reference specification, W3C Recommendation, http://www.w3.org/TR/REC-html32.html, January 1997.

[Wetherall and Lindblad (1995)] D. Wetherall and C.J. Lindblad, Extending Tcl for dynamic object-oriented programming, in: Proc. of the Tcl/Tk Workshop '95, Toronto, July 1995.

[White (1996)] J. White, Mobile agents, White paper, http://www.genmagic.com/agents/Whitepaper/whitepaper.html, 1996.

Vitae

Eckhart Köppen is currently a Ph.D. student at the Dept. of Information Systems and Software Techniques at the University of Essen, Germany. His working areas are Information Systems Modeling, Internet/Intranet Technologies and Software Engineering. He was born 1970 in Essen, began studying at the University of Essen in 1991 and received degrees in Information Systems 1995 and 1996. As part of his Master's thesis, he developed the Kino widget class, which provides an extensible means to display HTML text.

Gustaf Neumann was appointed Professor of Information Systems and Software Techniques at the University of Essen, Germany, in 1995. A native of Vienna, Austria, he graduated from the Vienna University of Economics and Business Administration (WU), Austria, in 1983 and holds a Ph.D. from the same university. He joined the faculty of WU in 1983 as Assistant Professor at the MIS department and served as head of the research group for Logic Programming and Intelligent Information Systems. Before joining the University of Essen, Gustaf Neumann was a visiting scientist at IBM's T.J. Watson Research Center in Yorktown Heights, NY, from 1985–1986 and 1993–1995. In 1987 he was awarded the Heinz-Zemanek award of the Austrian Association of Computer Science (OCG) for best dissertation (Metainterpreter Directed Compilation of Logic Programs into Prolog). Professor Neumann has published books and papers in the areas of program transformation, data modeling, information systems technology and security management. He is the author of several widely used programs that are freely available, such as the TeX-dvi converter dvi2xx and the graphical frontend package Wafe.