An API To Mosaic
Guy Singh
Roger Binns
IXI Ltd
Abstract
The Mosaic browser has been very sucessful as a standalone product. In fact as
people learn about all
the features that Mosaic has to offer, they wish to combine all or a subset of
them into applications
specific to their needs.
It is therefore proposed that there should be a standard way of incorporating
this technology into
existing software. This would lead to existing applications being able to
utilise Mosaic and it rich
WWW functionality from their software in a relatively easy manner.
The proposed mechanism for implementing this is to use currently available
mature technologies that
were built for exactly this purpose. There are a number of different issues
that need addressing:
-
The language
-
The protocol
-
The transport layer
In addition to this the solutions to these different issues must be available
in a cross platform manner.
This paper presents the issues and the proposed solutions as well as
demonstrating a working
implementation.
Introduction
This document discusses an API to be used in programs that wish to communicate
with a World Wide
Web browser. The need for this API arose from the requirement to more tightly
couple Mosaic
(running as a help engine) and other programs running on the same display.
Part of this coupling involved placing a scripting language into Mosaic
itself, and being able to send
commands to the interpreter, and get their results. Our implementation uses
the portable TCL
interpreter.
Note: This API is only intended to be used to communicate in the
context of user actions with
a WWW browser. A better interface to the actual resources of the web is to
use libwww or to talk to a
proxy http server.
We set ourselves five objectives:
-
Transport independence
-
The API should hide as much as possible of the underlying method used to
communicate with the browser.
-
Access to the scripting language
-
It should be possible to send arbitrary scripts to the browser which executes
them and returns the results. The API should be independent of any
particular interpreter.
-
Binary communication
-
This will allow file contents to be transferred across the connection,
as well as languages that may be byte compiled.
-
Platform independence
-
The API should be as similar as possible on all platforms.
The browser and the application talking to it should be able to run on
different platforms and/or hosts.
-
Ease of use
-
The API should be as simple as possible to use and not require much code
support from the
API user. The scripting language used should be well documented, well
designed and mature.
In particular it should have good string handling (strings are used heavily in
a web browser -
e.g. urls, html).
Architecture
The API should be implemented as a library that can be used by both the
application and the browser.
Communication consists of three phases.
For the application, this consists of locating the browser and initialising
the connection to it. A
handle is returned which is then used by the API user to provide the
API with a context. In the
second phase, the application can send data to the browser to be executed, and
take action on the
results. Finally, the connection may be closed by either end.
Locate browser
Initialise connection
Send data to be executed
Process results
* ... repeat
Close connection
The browser uses the API in a similar way, but has a different first phase.
It consists of advertising
itself as providing a service to the application, and waiting for incoming
connections. When a
connection is made, it is initialised and a handle returned in the same
manner. Incoming requests are
then taken from the connection,and processed. The result is sent back if
required. The connection may
then be closed by either end.
Advertise browser
Listen for incoming connections
For each connection
Initialise connection and request queue
For each request
Process request
Send back result if necessary
* ... repeat
* ... repeat
Close connection
Transport
Communication
The communication method chosen was Transmission Control Protocol/Internet
Protocol (TCP/IP).
This provides a reliable byte stream over a variety of physical network
layers, and is very widely
supported by most operating systems and World Wide Web browsers running on
them.
Service location
The browser that is providing a service for the API will be awaiting
connections on a particular port. In
order for more than one browser to run on a machine, this port cannot be
fixed, but will need to be
determined at run time.
Typically the browser and the application will be accessing the same display.
A service name is used to
identify the browser broadcasting its availability. This can be a simple word
which has a default value
of WWW_BROWSER.
The API will then handle finding out which machine the browser is located on,
and which port it is
listening on.
Security
Security is partially addressed at the transport level. An implementation can
provide security in two
ways :-
-
The first is to only allow connections from known trusted hosts and ports.
-
The second method, used by our implementation is to require the application to
provide a string or
key which it obtained while doing service location. In the UNIX/X windows
implementation, it
relies on xauth and xhost.
Protocol
The protocol was designed to be as simple as possible. The information is
sent as packets
which contain two parts: a header, and any accompanying data.
Header
The header contains:-
-
The packet serial number
-
A code to represent what type the accompanying information is
-
The interpreter the information is going to or coming from
-
The length of the accompanying data.
All the integer values are represented as strings of ASCII digits, thereby
avoiding issues such as word
size and byte ordering at the protocol level.
Packets
A typical communication between the application and the browser consists of
the application sending
the browser a set of commands string and requesting that the results be
returned. When the commands
have been processed, the browser will send back a packet containing the
results.
It is possible that the stream of commands may have caused the interpreter to
generate an error (for
example no such variable). These are distinguished by having different
types, as shown in the
table below:-
Type Meaning
0 No operation - ignored by both ends
1 Ping - requests that other end send a ping reply.
This can be used to check the other end is still present.
2 Ping Reply - sent in response to a ping.
Sent by application
Type Meaning
3 Process data and send back response.
4 Process data, and dont send back any response.
5 Identification of client (this must be the first packet sent when
identification is being used)
Sent by browser
Type Meaning
10 Results of executing script.
11 Script execution generated an error in the interpreter
- the data is the error message.
12
Asynchronous data - this is sent with a serial of zero, and is not
in response to any particular packet. An example of its use would
be to notify the application that the user has clicked on a
hyperlink.
13 The user cancelled the execution of the script.
Synchronicity
Protocol
There are two methods by which the protocol could be designed: synchronous and
asynchronous. With
the synchronous method, each packet sent by the application would require the
next packet returned by
the browser to be a response to that packet.
Using an asynchronous method, the results can be sent back when they are
ready. This would be of use
if multiple interpreters were present in the browser, or if it was
multi-threaded. The synchronous
method can be emulated by never having more than one outstanding request.
The asynchronous method was chosen for future extensibility.
API
The API could block in each call until a response was received. However, this
wouldnt allow use of
the asynchronous communications. It would also prevent the application from
updating its own
display.
The API is therefore implemented asynchronously, where a user specified
function is called when the
results of the specified request arrive.
Language
It was necessary to build some intelligence into the browser when
handling requests from the
application. A simple example would be - if url1 is not found, then display
url2.
This is done by adding some sort of interpreter to process the incoming
requests. Engineering your own
is time consuming and takes a lot of resources. Instead we used a publicly
available one that met all our
requirements (particularly portability and ease of use), Tool Command
Language.
TCL was developed in 1986, and now has a large world-wide user base. It
provides a very easy
interface to the C programming language, and is easy for new users to learn.
It is also very extensible
from both a scripting and a C programmers point of view. TCL has been used
as an embedded
language for several applications, and easily allows the user to transfer
their skills from one application
to another without having to learn a new language and accompanying
idiosyncrasies.
Bindings
The following table shows the areas that we bound TCL commands to Mosaic
features via the C
interface. Further commands have been builtin using these as building blocks
in TCL. These
commands added to wide variety of commands builtin to TCL make for a very
powerful scripting
capability.
Time
-
after
-
do a script after a period of time has elapsed
Urls
-
cannon
-
cannonicalise a URL (e.g. generate full url from a
relative one).
-
currenturl
-
return url currently being displayed.
Environment
-
getresource
-
gets X resource value.
Documents
-
gettext
-
gets html source of currently displayed
document.
-
parsetag
-
parses an html tag.
-
reload
-
reload currently displayed document.
-
showurl
-
display this url
-
saveurltofile
-
saves contents of a url to a file.
-
sendurlcontents
-
ends contents of url down the connection.
-
settext
-
sets the body, header or footer html text being
displayed.
User interface
-
miscellaneous
-
Most of the dialog boxes can be popped up.
-
wmhints
-
cause windows to be iconified/deiconified, and
return X window information.
API
Design
The API provides a W3BTransportHandle for both the application and the
browser to use once
a connection has been established. This handle is then specified in all
following API calls. Its actual
implementation is a pointer to an opaque structure, thereby allowing the
underlying implementation to
change freely without affecting the API user.
Errors that are detected by the API routines (i.e. not generated by the
interpreter) are returned
to the caller. All functions return an integer indicating whether the
operation was successful, or an error
code if it was unsuccessful.
Portability
To ensure portability, all types used by the API are replaced by abstract
typedefs. This ensures that
the library can use the same source code with a platform specific header file
filling in the correct types
for that platform.
The only part of the API which is platform specific concerns service location
and advertising, although
their prototypes should be similar on all platforms. All other functions in
the API have the same
prototypes on all platforms.*
Synchronicity
To handle the asynchronous nature of the API and communications, the API
relies heavily on
callbacks. These are routines that are called when a certain event has
happened. Each routine
that sends information can specify a callback that will be called when a
response is received. The
callback can be different for each piece of information sent.
Miscellaneous
In order to aid the API user, and not require that files get loaded into
memory, the API also has the
ability to send file contents, and to store results to a file.
Each platform also has a set of utility functions provided that help to
>interface to the operating and
windowing system.
In use
Security
Security is a very important issue, particularly as the browser is taking
actions based on what it has been
sent. It should be noted that the security is not the most rigorous possible
(e.g. there is no
authentication).
However , this is a not a drawback, as the API is an enabling technology for
users , his desktop , their
applications and their web browser . Heavy handed security could prove very
annoying.
The browser implementer can add extra security in the interpreter itself. One
method is to modify the
language and is builtins so that it cant do any harm (e.g. Safe-TCL). A
second method would be to
display a dilaog box with the code about to be executed, that would allow the
user to choose whether or
not it gets executed.
Localisation
It is important that the application and browser can be localised. Therefore,
the API should not impose
any internationalisation constraints on the API user. This is achieved by the
API passing data through
in binary mode, and placing no interpretation on the data going through.
It should be noted that most programming languages are not localised (e.g. in
C, switch is always
spelt like that even if you are a French speaker). The scripting language
author should therefore
provide access to some form of message catalogue or resources so the user can
get messages in their
own locale.
Reliability
The API has been found to be reliable. This is mostly due to the simplicity,
and also because the
networking complexity is handled by the operating environment.
Enablement
It has been very beneficial to put some processing logic into the browser. It
allows compound actions
and decisions to be taken local to the browser. For instance if a url is not
found, it can decide what to
do next. The results can also be pre-processed before being returned.
The API has also allowed us to closely integrate our X.desktop , the IXI
Panorama window manager,
and Mosaic as a help engine. A typical example: X.desktop can ask Mosaic to
display help on a book
and topic, and to return the X window id. It then asks IXI Panorama to bring
the window
corresponding to that id to the top of the stacking order in the user's
current view area.
It has also been found that the API can be of use for communication amongst
other applications. This is
for two reasons :-
-
Generality
-
The lack of any requirement for the browser to actually be a World Wide Web
browser - it can be
any application.
Conclusion
The API has fulfilled all of the original objectives. The sample
implementation of this demonstrates :-
-
Simplicity
-
Portability
-
Extensibility of Control
-
Reliability
-
Some security
References
Ousterhout, TCL