An API To Mosaic

Guy Singh
Roger Binns
IXI Ltd

Abstract

The Mosaic browser has been very sucessful as a standalone product. In fact as people learn about all the features that Mosaic has to offer, they wish to combine all or a subset of them into applications specific to their needs.

It is therefore proposed that there should be a standard way of incorporating this technology into existing software. This would lead to existing applications being able to utilise Mosaic and it rich WWW functionality from their software in a relatively easy manner.

The proposed mechanism for implementing this is to use currently available mature technologies that were built for exactly this purpose. There are a number of different issues that need addressing:

The language
The protocol
The transport layer

In addition to this the solutions to these different issues must be available in a cross platform manner. This paper presents the issues and the proposed solutions as well as demonstrating a working implementation.

Introduction

This document discusses an API to be used in programs that wish to communicate with a World Wide Web browser. The need for this API arose from the requirement to more tightly couple Mosaic (running as a help engine) and other programs running on the same display.

Part of this coupling involved placing a scripting language into Mosaic itself, and being able to send commands to the interpreter, and get their results. Our implementation uses the portable TCL interpreter.

Note: This API is only intended to be used to communicate in the context of user actions with a WWW browser. A better interface to the actual resources of the web is to use libwww or to talk to a proxy http server.

We set ourselves five objectives:

Transport independence: The API should hide as much as possible of the underlying method used to communicate with the browser.
Access to the scripting language: It should be possible to send arbitrary scripts to the browser which executes them and returns the results. The API should be independent of any particular interpreter.
Binary communication: This will allow file contents to be transferred across the connection, as well as languages that may be byte compiled.
Platform independence: The API should be as similar as possible on all platforms. The browser and the application talking to it should be able to run on different platforms and/or hosts.
Ease of use: The API should be as simple as possible to use and not require much code support from the API user. The scripting language used should be well documented, well designed and mature. In particular it should have good string handling (strings are used heavily in a web browser - e.g. urls, html).

Architecture

The API should be implemented as a library that can be used by both the application and the browser. Communication consists of three phases.

For the application, this consists of locating the browser and initialising the connection to it. A handle is returned which is then used by the API user to provide the API with a context. In the second phase, the application can send data to the browser to be executed, and take action on the results. Finally, the connection may be closed by either end.

Locate browser Initialise connection Send data to be executed

Process results

* ... repeat

Close connection

The browser uses the API in a similar way, but has a different first phase. It consists of advertising itself as providing a service to the application, and waiting for incoming connections. When a connection is made, it is initialised and a handle returned in the same manner. Incoming requests are then taken from the connection,and processed. The result is sent back if required. The connection may then be closed by either end.

Advertise browser
Listen for incoming connections

For each connection
        Initialise connection and request queue
        For each request
                Process request
                Send back result if necessary
        * ... repeat
* ... repeat

Close connection

Transport

Communication

The communication method chosen was Transmission Control Protocol/Internet Protocol (TCP/IP). This provides a reliable byte stream over a variety of physical network layers, and is very widely supported by most operating systems and World Wide Web browsers running on them.

Service location

The browser that is providing a service for the API will be awaiting connections on a particular port. In order for more than one browser to run on a machine, this port cannot be fixed, but will need to be determined at run time.

Typically the browser and the application will be accessing the same display. A service name is used to identify the browser broadcasting its availability. This can be a simple word which has a default value of WWW_BROWSER.

The API will then handle finding out which machine the browser is located on, and which port it is listening on.

Security

Security is partially addressed at the transport level. An implementation can provide security in two ways :-

The first is to only allow connections from known trusted hosts and ports.
The second method, used by our implementation is to require the application to provide a string or key which it obtained while doing service location. In the UNIX/X windows implementation, it relies on xauth and xhost.

Protocol

The protocol was designed to be as simple as possible. The information is sent as packets which contain two parts: a header, and any accompanying data.

Header

The header contains:-

The packet serial number
A code to represent what type the accompanying information is
The interpreter the information is going to or coming from
The length of the accompanying data.

All the integer values are represented as strings of ASCII digits, thereby avoiding issues such as word size and byte ordering at the protocol level.

Packets

A typical communication between the application and the browser consists of the application sending the browser a set of commands string and requesting that the results be returned. When the commands have been processed, the browser will send back a packet containing the results.

It is possible that the stream of commands may have caused the interpreter to generate an error (for example no such variable). These are distinguished by having different types, as shown in the table below:-

Type      Meaning

0          No operation - ignored by both ends

1          Ping - requests that other end send a ping reply.
           This can be used to check the other end is still present.

2          Ping Reply - sent in response to a ping.

Sent by application

Type      Meaning

3          Process data and send back response.

4          Process data, and dont send back any response.

5          Identification of client (this must be the first packet sent when
           identification is being used)

Sent by browser

Type      Meaning

10         Results of executing script.

11         Script execution generated an error in the interpreter
           - the data is the error message.

12
           Asynchronous data - this is sent with a serial of zero, and is not
           in response to any particular packet. An example of its use would
           be to notify the application that the user has clicked on a
           hyperlink.

13         The user cancelled the execution of the script.

Synchronicity

Protocol

There are two methods by which the protocol could be designed: synchronous and asynchronous. With the synchronous method, each packet sent by the application would require the next packet returned by the browser to be a response to that packet.

Using an asynchronous method, the results can be sent back when they are ready. This would be of use if multiple interpreters were present in the browser, or if it was multi-threaded. The synchronous method can be emulated by never having more than one outstanding request.

The asynchronous method was chosen for future extensibility.

API

The API could block in each call until a response was received. However, this wouldnt allow use of the asynchronous communications. It would also prevent the application from updating its own display.

The API is therefore implemented asynchronously, where a user specified function is called when the results of the specified request arrive.

Language

It was necessary to build some intelligence into the browser when handling requests from the application. A simple example would be - if url1 is not found, then display url2.

This is done by adding some sort of interpreter to process the incoming requests. Engineering your own is time consuming and takes a lot of resources. Instead we used a publicly available one that met all our requirements (particularly portability and ease of use), Tool Command Language.

TCL was developed in 1986, and now has a large world-wide user base. It provides a very easy interface to the C programming language, and is easy for new users to learn. It is also very extensible from both a scripting and a C programmers point of view. TCL has been used as an embedded language for several applications, and easily allows the user to transfer their skills from one application to another without having to learn a new language and accompanying idiosyncrasies.

Bindings

The following table shows the areas that we bound TCL commands to Mosaic features via the C interface. Further commands have been builtin using these as building blocks in TCL. These commands added to wide variety of commands builtin to TCL make for a very powerful scripting capability.

Time

after: do a script after a period of time has elapsed

Urls

cannon

cannonicalise a URL (e.g. generate full url from a relative one).

currenturl

return url currently being displayed.

Environment

getresource: gets X resource value.

Documents

gettext: gets html source of currently displayed document.
parsetag: parses an html tag.
reload: reload currently displayed document.
showurl: display this url
saveurltofile: saves contents of a url to a file.
sendurlcontents: ends contents of url down the connection.
settext: sets the body, header or footer html text being displayed.

User interface

miscellaneous: Most of the dialog boxes can be popped up.
wmhints: cause windows to be iconified/deiconified, and return X window information.

API

Design

The API provides a W3BTransportHandle for both the application and the browser to use once a connection has been established. This handle is then specified in all following API calls. Its actual implementation is a pointer to an opaque structure, thereby allowing the underlying implementation to change freely without affecting the API user.

Errors that are detected by the API routines (i.e. not generated by the interpreter) are returned to the caller. All functions return an integer indicating whether the operation was successful, or an error code if it was unsuccessful.

Portability

To ensure portability, all types used by the API are replaced by abstract typedefs. This ensures that the library can use the same source code with a platform specific header file filling in the correct types for that platform.

The only part of the API which is platform specific concerns service location and advertising, although their prototypes should be similar on all platforms. All other functions in the API have the same prototypes on all platforms.*

Synchronicity

To handle the asynchronous nature of the API and communications, the API relies heavily on callbacks. These are routines that are called when a certain event has happened. Each routine that sends information can specify a callback that will be called when a response is received. The callback can be different for each piece of information sent.

Miscellaneous

In order to aid the API user, and not require that files get loaded into memory, the API also has the ability to send file contents, and to store results to a file.

Each platform also has a set of utility functions provided that help to >interface to the operating and windowing system.

In use

Security

Security is a very important issue, particularly as the browser is taking actions based on what it has been sent. It should be noted that the security is not the most rigorous possible (e.g. there is no authentication).

However , this is a not a drawback, as the API is an enabling technology for users , his desktop , their applications and their web browser . Heavy handed security could prove very annoying.

The browser implementer can add extra security in the interpreter itself. One method is to modify the language and is builtins so that it cant do any harm (e.g. Safe-TCL). A second method would be to display a dilaog box with the code about to be executed, that would allow the user to choose whether or not it gets executed.

Localisation

It is important that the application and browser can be localised. Therefore, the API should not impose any internationalisation constraints on the API user. This is achieved by the API passing data through in binary mode, and placing no interpretation on the data going through.

It should be noted that most programming languages are not localised (e.g. in C, switch is always spelt like that even if you are a French speaker). The scripting language author should therefore provide access to some form of message catalogue or resources so the user can get messages in their own locale.

Reliability

The API has been found to be reliable. This is mostly due to the simplicity, and also because the networking complexity is handled by the operating environment.

Enablement

It has been very beneficial to put some processing logic into the browser. It allows compound actions and decisions to be taken local to the browser. For instance if a url is not found, it can decide what to do next. The results can also be pre-processed before being returned.

The API has also allowed us to closely integrate our X.desktop , the IXI Panorama window manager, and Mosaic as a help engine. A typical example: X.desktop can ask Mosaic to display help on a book and topic, and to return the X window id. It then asks IXI Panorama to bring the window corresponding to that id to the top of the stacking order in the user's current view area.

It has also been found that the API can be of use for communication amongst other applications. This is for two reasons :-

Generality
The lack of any requirement for the browser to actually be a World Wide Web browser - it can be any application.

Conclusion

The API has fulfilled all of the original objectives. The sample implementation of this demonstrates :-

Simplicity
Portability
Extensibility of Control
Reliability
Some security

References

Ousterhout, TCL