Mobile GUI On The Web.

Daniel Dardailler

Abstract:
This article presents an architecture allowing network graphical applications, such as X Window clients, to be activated and render themselves directly onto World Wide Web browser screens. The intention is both to offer Web information providers better and finer control over the exact presentation of their documents, and to provide the ability to demonstrate real applications or products over the Web. After presenting some rationales and requirements for this work, the paper explores the set of issues that need to be resolved to implement such a system.
Keywords:
X Window System, Embedding, Remote Execution, RX Document type.

Introduction

One of the major characteristics of the World Wide Web (WWW) system as it operates today is that document providers have very little freedom over the final presentation of their documents. This happens because the large majority of documents presented to the end-user are expressed using a simple SGML-based markup language, the hypertext markup language (HTML), whose only function is to convey structural content information.

As an author of an HTML document, you may say: This is a section title, This is a table, and so on. With HTML Forms in place, you can even say: This is an active option menu or This is a text entry area. However, HTML alone doesn't let you draw simple objects like pie charts or line histograms, with text at a specific location and orientation, or run some animation directly in the browser window.

Putting the emphasis on the logical content and leaving the final presentation details (e.g., exact layout, font, color) to the browser and the user works just fine for certain classes of documents. However, this situation is not well-suited for a whole different set of materials. There is a lot of information out there for which the exact presentation, or "look," matters more than the words: advertisement pages, advanced live GUI, visual art, structured graphics, active multimedia, etc (just look at the nature of the graphics found on CD-Rom versus your basic GUI environment).

Work has been going on for several years in the SGML community to provide authors with better control over the final presentation using Style Sheet [1]specifications. Such presentation languages attached to HTML would improve the information provider control over the document visuals, but still have intrinsic limitations due to their declarative approach.

Another path people have taken is to complete or replace HTML with a richer language, changing the once static Web documents into programs that get downloaded and executed at the user's site. This is usually refer to as the Mobile Code [2] paradigm. The problem with this solution is that existing applications have to be rewritten in the "applet" language, Java for instance, they have to be downloaded, and they might have to be installed/de-installed as well.

This paper presents an architecture that is somewhat similar to the Mobile Code approach, since a real program complements or replaces HTML, but in our case, the code doesn't move, only the graphical layer does. It can therefore be referred to as the Mobile GUI approach.

A low-level UI protocol, such as the X11 Protocol [3], effectively solves the presentation problem by offering providers of information the "pixel-level" functionality they sometimes desire. Of course, X also solves the "remote demo/try before you use" case, since it is already the natural vehicle of hundreds of graphical-networked applications. Since X operates at the same network Internet transport level as the HTTP protocol, TCP/IP, it is also a natural fit for this task.

Although X11 is the UI network protocol of choice in the industry today, others, based on Microsoft Windows, OS/2, and NeXT also exist, and one requirement of this design is to allow for their inclusion. That being said, the rest of the document focuses mainly on X and less on the other potential UI protocols.

Technical Issues

The main goal of this article is to break the overall problem into smaller pieces. If we identify the problem as how to let a remote application, such as an X client, render itself directly onto a Web browser screen, then we can break the problem into the following five subproblems:

As you can see, this is a very broad set of issues, having to deal with a number of areas, including existing Web protocols and formats, X security and X embedding techniques, and more generic network protocols for remote execution. The rest of this paper describes some of the available options for each of the issues listed above.

The Web

There are basically three different ways of expressing a new kind of datum on the Web:

Let's look at all three in turn.

In HTML, the language in which most Web documents are written, URLs are usually used in anchors, as with the HREF parameter. From the user's perspective, clicking on the associated anchor text causes the browser to fetch the data to be displayed. One approach for describing a new remote execution paradigm would be to introduce a new URL scheme, RX (for Remote eXecution), that would look like this in an HTML document:

    <A HREF = "rx://www.x.org/programs/pix">
    Click here for a demo of the Pix editor</A>
This has the effect of starting pix remotely, with the display done locally in a separate top-level window on the user's screen. This scheme can also be supported in the embedded case, where the program runs directly in the browser's window, as follows:

    <IMG SRC = "rx://www.x.org/nice_home"
    ALIGN=top  ALT="X">
This code has the effect of starting the program referred to as nice_home remotely, with the display embedded in the local browser viewing window.

The problem with creating a new URL scheme is that it requires creating a new associated protocol and a new scheme registration, which is in our opinion changing too much of the current infrastructure. It is also not necessary. The same is true for extending the HTML syntax to support remote execution by supporting a new RX tag. For example:

    <RX src="some_program" EMBEDDED=YES>
The third option, creating a new RX document type, seems to be the easiest to introduce, and the rest of this article will only concentrate on this approach.

The purpose of an RX document is to provide the Web browser with information on how to activate the remote program, issues like the UI protocol to choose, embedding support, and size hints.

In HTML, RX document pointers can be used in the same context as any type of image file (e.g., .gif, .xbm), namely with the ANCHOR and IMG tags. Using an RX document with ANCHOR results in the program being started in its own top-level window, while using it with IMG results in the program being embedded in the browser window if both the browser and the remote client support embedding. The following HTML code shows the nonembedded case:

    <A HREF = "http://www.x.org/rx/pix.rx">
    Click here for a demo of the Pix editor</A>
In this case, when the user clicks on the anchor, pix appears as a separate application. Here's the embedded case:

    <IMG SRC = "http://www.x.org/rx/ico.rx" ALIGN=top  ALT = "ICO.RX">
If the remote client and the browser support embedding, the user should see an ico animation running in a browser sub-window. If embedding is not supported, the user sees an image/logo that represents ico; when the user clicks on this image, ico is activated in a separate top-level window.

In both cases, the RX document contains information about the type of remote activation supported by this server. The syntax we use contain simple FIELD_NAME/value lines.

The first line is mandatory and describes the location of the server side activation script, for instance a perl script activated thru the Common Gateway Interface (CGI). This line uses the ACTION field name:

    ACTION URL
The ACTION field is followed by an optional logo line:

    LOGO URL
pointing to an image file (.gif, .xbm) used to represent the remote program "inline" if embedding is not supported. A list of data lines for each display protocol supported by the application follows; the UI_PROTOCOL is required, while the GEOM_HINT and EMBEDDABLE fields are optional:

    UI_PROTOCOL x11 | ica | news | etc.       
    GEOM_HINT  width x height       
    EMBEDDABLE no | yes       

For example (ica is the name of the network protocol used by WinDD), the RX document shown in Figure 1 represents an application that can be activated using either the X11 Window Protocol or WinDD, and for which there is a default logo provided and some different size hints depending on the type of UI selected. Figure 1 also shows the general architecture of the system.

Figure 1. General RX architecture

The sequence of actions shown in Figure 1 is initiated when a browser, such as Mosaic or Netscape, finds an RX reference in an HTML document. Essentially, the browser sees the RX document link in the HTML file, so it asks the http server for the document. The httpd daemon serves the document and the browser spawns the rx agent. The rx agent then asks the httpd daemon to process the action script. When the script is activated, it uses the information in the RX document to activate the remote client, which is rendered in the browser display window.

In the above picture, the RX document itself (anim.rx) is represented as a separate entity from the RX script (anim.pl). In our current implementation, we usually carry only one CGI script on the server side, which generates either the RX syntax (if requested via HTTP GET without argument) or activate the remote program (when requested a second time with arguments for Display, Geometry, etc). Storing the information about what UI_PROTOCOL is supported and how to activate programs for each protocol in one single file is easier to maintain from an administrative point of view.

Example 1: "Duo" RX script (http://x.org/cgi-bin/xclock.pl)

   #!/usr/local/bin/perl

   if (!$ARGV[0]) { 
       print"Content-type: application/x-rx
   
             ACTION http://x.org/cgi-bin/xclock.pl   # myself
             UI_PROTOCOL x11
             GEOM_HINTS 200x200
             EMBEDDABLE yes
             LOGO http://x.org/icons/xclock.xbm";
   } else {
       system ('xclock -window '.$ARGV[2].' -g '.$ARGV[1].' -d '.$ARGV[0].' &');
   }

Embedding

The nonembedding case of handling RX documents, that is, activating the X client in a separate top level, can be implemented without any changes to the browser or the http server code. If separate RX documents are present on the httpd side (versus CGI scripts that generate RX Content-type on the fly--see previous section), one would need to extend the file suffix mapping (i.e. the .mime.types) to bind .rx file extensions to the MIME content type application/rx (or rather application/x-rx since it's not a registered IANA type yet). On the browser side, nonembedding support only requires you to add the following entry into the .mailcap file:

    application/x-rx; rx %s
This binds the RX type to an external program, the rx agent. When the .rx document is passed to this program, it performs the remote activation in its own top-level window, as described in detail below.

For the embedding case, there is a clear need for support in a Web browser beyond the current case of remote execution. The problem is that this support just hasn't taken real shape yet. For instance, it would be appropriate to have MPEG animations rendered directly in the browser window, in the context of an IMG tag. This rendering would be handled by a separate process that is local to the browser, i.e., an external viewer, whose output would be redirected to the browser window instead of being displayed in a separate top level window. If that mechanism were in place, then RX would become just another instance of an external-for-embedding viewer handling remote execution.

Recently, new mechanisms called "Plug-Ins" have appeared in popular browsers such as Netscape [5], allowing for embedding of arbitrary external applets via dynamic loading. The RX agent would just have to provide an external plug-in API to be integrated in such a Web browser.

The RX document also contains fields that provide information about embedding. The EMBEDDABLE field provides information to the browser about whether the remote client is capable of being embedded in one of the browser subwindows. For instance, it might tell the browser that the specified X client can be given a top-level window ID at startup time (as a -window ID command-line option). A browser can use this information (directly or indirectly thru the applet) and the file pointed to by LOGO to show a press-here-to-activate icon to replace a nonembeddable program in the IMG context. A browser can also elect to defer any activation of an embedded client until the user requests activation, in which case it can also use the LOGO icon by default.

GEOM_HINT gives information to the browser about the preferred geometry of the remote client. This is mostly useful in the case of an embedded program, where the browser needs to create an embedded window of the correct size up front. Note that GEOM_HINT is not required with HTML 3.0, where the IMG tag can include WIDTH and HEIGHT attributes. In any case, if the size information is not present, and embedding is still desired, the original size of the embedded window is browser-dependent. The browser can also deny any geometry changes thereafter, or it can allow for dynamic reconfiguration.

There are sundry technical issues related to designing general embedding in X, such as geometry negotiation, menu inclusion, and focus and session problems. In the Web browser case, though, one easy path would be to consider solving only a subset of the problems. Geometry negotiation, for instance, is not a necessity, given that browsers usually never resize their embedded graphics. The same principle could be enforced for more dynamic graphics, such as X applications. Menubar extension and command support could also be ignored for clients embedded in a Web browser.

Session management, however, is still a real issue that needs to be solved in this context. If you think about going back and forth between pages that provide "live" embedding, you need to consider what should happen to the active programs while you are surfing. This issue is true regardless of our Mobile GUI study: if embedding of MPEG is the goal (i.e., having mpeg_play displaying its output in a browser subwindow), then a way to stop the rendering and restart it in the same context later is clearly needed.

An embedded architecture is reminiscent of systems like OpenDoc and OLE, so we should not go into the details of solving this particular aspect of the larger problem without considering what is being done by these two systems.

Remote Execution

As mentioned in the previous section, the architecture presented here uses a program, rx, which runs on the browser side and handles all of the logic of activation and security. Depending on the browser, and if the result is embedded or not, this program could be activated through the .mailcap file, through a Plug-In API or some other mechanisms, but in all cases, the new RX MIME content-type will be used to do the binding.

The job of the rx agent, when resolving an RX link, is to remotely activate the specified program in the context of the current GEOM_HINT and EMBEDDABLE states. In order to do so, it uses the ACTION pointer and the UI_PROTOCOL information. ACTION is the activation script itself. It is either a valid URL or a local path, in which case the http server host is assumed. This activation script is common to all UI protocols; it is the sole entry point to the real execution of the remote program.

The rx agent, or applet, on the browser side, must send the necessary information to this CGI script for the remote execution to happen. It must first decide which UI protocol to use. This is simple and usually found in the environment (i.e., the browser knows if it is an X browser or a Windows browser).

UI_PROTOCOL specifies the types of remote display protocols the program can speak. It's an indication for the browser side of what kind of information must be passed back in the remote execution protocol communication. If the browser determines that it cannot deal with any of the UI protocols specified, it needs to display an error message, or a specific image in case of embedding, as in the case when the IMG context specifies an incorrect image path.

In the case of an X11 UI_PROTOCOL connection, typical information to be sent back to the activation script includes:

It is the responsibility of the ACTION script to determine which program to activate, depending on the UI connection chosen by the browser. The flexibility in the architecture allows it to work with minimal modification of the existing software. In more advanced implementations, the rx agent work could be done by the browser, just as Mosaic and Netscape can treat GIF natively.

The rx agent also has to deal with security (see below). In the case of X, this may involve a firewall proxy server and/or use of the xhost program.

Security

Security here is only a concern for the browser side, not the http server side. It is up to the server host side, where the remote X client runs, to disallow things like a plain, unrestricted emacs or xterm to be started with access to the server's file system and display and control given to some rude netsurfer somewhere in cyberspace. In our case, we're only worried that a remote X client could "spy" on the user's keyboard and screen in some way, or somehow destroy the user's critical resources.

In the most trivial case, the browser X display can be accessed by the remote X client with no restriction. The browser side has only to authorize the connection for the remote host at large (e.g., xhost +hostname) and pass the X protocol information to the remote host so that it can execute the X client on the browser's display. This simple case probably covers a small part of today's Web usage--LAN internal connections or a company-wide network--but I think it is likely to grow in the future (note that there is still an issue with using xhost, related to the fact that the X browser might not be run on the X Server host itself, but as a remote client already. Some sort of "transitive" X authentication might be needed in that case).

In the generic WAN case, direct X connectivity is not usually allowed between two given hosts and a firewall machine may be present. In this case, a firewall proxy X server is probably needed to guarantee a secure X connection between the remote X program and the browser session, where private information usually resides.

The X Consortium security working group is currently studying the X security proxy case. Several companies have already made experiments based on the xscope architecture, and new work in on its way in this area, so I won't elaborate further on that part. The basic idea is to look closely at the X protocol and determine which requests should be blocked, what interclient interchange can be allowed, etc.

Once such an infrastructure exists, the rx agent will just take advantage of it in the process of indicating to which X Display id the remote program must connect.

Performance

One advantage of low-level UI protocols is that they give their users a lot of freedom over the presentation, the obvious price being greater network traffic. In order to address this bandwidth issue with X, one option is to use LBX (Low Bandwidth X), to minimize the flow of X requests and their size over slow connections. LBX could be used bundled with new X servers as a new proxy or together with the secure proxy architecture (i.e., have a single proxy X server that handles security and "decompression" of X requests at the same time). Although LBX was originally designed with serial line X connections in mind, it would also gracefully solve the performance problems that arise over today's Wide Area Network.

You'll note that the architecture presented doesn't preclude using a higher level UI protocol, such as a UIMS. This is just a matter of having the correct UI server, such as a widget server, in place on the browser side, and specifying the correct UI protocol in the RX document.

Further Considerations

One important issue that needs to be considered is whether or not specialization of the remote X client is acceptable. For example, the client may need to be modified to speak the X proxy server protocol to handle security, or to deal with the embedding case. In other words, is it valid to require that existing X clients that want to be included in Web pages be relinked/recompiled/changed in some ways? Without extending the X server itself, that might be necessary at least to handle embedding (since the -window option is not supported by today's Xt or Motif libraries).


Last, and pragmatically, because they outnumber X Window browsers, we need to think about resolving issues surrounding non-X Window browsers (MS/Windows, Mac) so they may use this architecture with remote X Window clients (which in turn outnumber the non-X network-graphic applications).

The basic idea is to use PC-XServer as add-on applets for PC browsers.

For the nonembedding case, there are several PC-XServer software packages on the market that can provide X services integrated with native Windows service and therefore would easily solve the problem of creating top level X windows side by side with the PC browser window.

The embedding case is trickier, since the level of PC-XServer integration would have to move down to the subwindow level, and none or few of the existing systems currently allow for it. Another approach would be for the Windows or Mac Web browsers to understand the X protocol natively, and become virtual X servers for their own viewport window. Since the X server code for Intel and Mac is freely available from the X Consortium, this might actually be a viable solution for the browser companies.

Conclusion

Network-based graphic applications should become an integral part of the Internet of tomorrow.

In this paper, we presented an architecture that utilizes the X Window System fundamental client-server nature in the context of the World Wide Web. To become truly operational, however, such a system requires the availability and the integration of several independent functionalities:

The X Consortium is highly aware of the opportunity the growth of the Internet and the World Wide Web presents for the X community, and has recently announced a new major release of the X Window System, code name "Broadway," that will provide the necessary enhancements allowing for X to really fly over the Web!

Prototype

During the summer of 1995, a student intern joined the X Consortium team to work on a prototype of the RX system. Figure 2 shows the result of these experiments: an Xt/Motif application running on the http server side and displayed embedded inside a modified XMosaic browser window.

Figure 2. Embedded Xt/Motif application on the Web

In this example, the Motif application is taking charge of the complete browser page content, so the source for the underlying HTML document is quite minimal:

      
    <IMG ALIGN=MIDDLE src="xm20demo.pl"> </A>
The associated xm20demo.rx httpserver-to-browser stream is listed below:

    ACTION http://fedora:8001/cgi-bin/xm20demo.pl
    UI_PROTOCOL x11 
    GEOM_HINTS 600x600
    EMBEDDABLE yes

Acknowledgments

I'd like to thank Ellis Cohen, Bob Scheifler, Kaleb Keithley, Ian Jacobs, and Paula Fergusson for the feedback received. I'd also like to thank Helene Veslot, our graduate student working on the implementation prototype, for helping out in the making of this project.

References

0. This paper is at http://www.x.org/people/daniel/mobgui.html

1. Web Consortium site, Style Sheet for HTML, http://www.w3.org/hypertext/WWW/Style.

2. Web Consortium site, Mobile Code model, http://www.w3.org/hypertext/WWW/MobileCode/

3. Bob Scheifler, Jim Gettys, The X Window System, Digital Press, ISBN 1-55558-088-2. (also http://www.x.org)

4. NCSA site, Common Gateway Interface (CGI), http://hoohoo.ncsa.uiuc.edu/cgi/overview.html

5. Netscape site, Netscape Client Plug-In API, http://www.netscape.com/comprod/development_partners/plugin_api

About the Author(s)

Daniel Dardailler [ http://www.x.org/people/daniel ]
X Consortium
daniel@x.org

Daniel Dardailler holds a Ph.D. in CS from the University of Nice Sophia-Antipolis (France). He joined the X Consortium staff in 1994 to act as a software architect for the Common Desktop Environment project (CDE). Prior to that, he spent four years at the OSF as a principal software engineer in the area of the Motif widget set.