Daniel Dardailler
As an author of an HTML document, you may say: This is a section title, This is a table, and so on. With HTML Forms in place, you can even say: This is an active option menu or This is a text entry area. However, HTML alone doesn't let you draw simple objects like pie charts or line histograms, with text at a specific location and orientation, or run some animation directly in the browser window.
Putting the emphasis on the logical content and leaving the final presentation details (e.g., exact layout, font, color) to the browser and the user works just fine for certain classes of documents. However, this situation is not well-suited for a whole different set of materials. There is a lot of information out there for which the exact presentation, or "look," matters more than the words: advertisement pages, advanced live GUI, visual art, structured graphics, active multimedia, etc (just look at the nature of the graphics found on CD-Rom versus your basic GUI environment).
Work has been going on for several years in the SGML community to provide authors with better control over the final presentation using Style Sheet [1]specifications. Such presentation languages attached to HTML would improve the information provider control over the document visuals, but still have intrinsic limitations due to their declarative approach.
Another path people have taken is to complete or replace HTML with a richer language, changing the once static Web documents into programs that get downloaded and executed at the user's site. This is usually refer to as the Mobile Code [2] paradigm. The problem with this solution is that existing applications have to be rewritten in the "applet" language, Java for instance, they have to be downloaded, and they might have to be installed/de-installed as well.
This paper presents an architecture that is somewhat similar to the Mobile Code approach, since a real program complements or replaces HTML, but in our case, the code doesn't move, only the graphical layer does. It can therefore be referred to as the Mobile GUI approach.
A low-level UI protocol, such as the X11 Protocol [3], effectively solves the presentation problem by offering providers of information the "pixel-level" functionality they sometimes desire. Of course, X also solves the "remote demo/try before you use" case, since it is already the natural vehicle of hundreds of graphical-networked applications. Since X operates at the same network Internet transport level as the HTTP protocol, TCP/IP, it is also a natural fit for this task.
Although X11 is the UI network protocol of choice in the industry today, others, based on Microsoft Windows, OS/2, and NeXT also exist, and one requirement of this design is to allow for their inclusion. That being said, the rest of the document focuses mainly on X and less on the other potential UI protocols.
In HTML, the language in which most Web documents are written, URLs are usually used in anchors, as with the HREF parameter. From the user's perspective, clicking on the associated anchor text causes the browser to fetch the data to be displayed. One approach for describing a new remote execution paradigm would be to introduce a new URL scheme, RX (for Remote eXecution), that would look like this in an HTML document:
<A HREF = "rx://www.x.org/programs/pix"> Click here for a demo of the Pix editor</A>This has the effect of starting pix remotely, with the display done locally in a separate top-level window on the user's screen. This scheme can also be supported in the embedded case, where the program runs directly in the browser's window, as follows:
<IMG SRC = "rx://www.x.org/nice_home" ALIGN=top ALT="X">This code has the effect of starting the program referred to as nice_home remotely, with the display embedded in the local browser viewing window.
The problem with creating a new URL scheme is that it requires creating a new associated protocol and a new scheme registration, which is in our opinion changing too much of the current infrastructure. It is also not necessary. The same is true for extending the HTML syntax to support remote execution by supporting a new RX tag. For example:
<RX src="some_program" EMBEDDED=YES>The third option, creating a new RX document type, seems to be the easiest to introduce, and the rest of this article will only concentrate on this approach.
The purpose of an RX document is to provide the Web browser with information on how to activate the remote program, issues like the UI protocol to choose, embedding support, and size hints.
In HTML, RX document pointers can be used in the same context as any type of image file (e.g., .gif, .xbm), namely with the ANCHOR and IMG tags. Using an RX document with ANCHOR results in the program being started in its own top-level window, while using it with IMG results in the program being embedded in the browser window if both the browser and the remote client support embedding. The following HTML code shows the nonembedded case:
<A HREF = "http://www.x.org/rx/pix.rx"> Click here for a demo of the Pix editor</A>In this case, when the user clicks on the anchor, pix appears as a separate application. Here's the embedded case:
<IMG SRC = "http://www.x.org/rx/ico.rx" ALIGN=top ALT = "ICO.RX">If the remote client and the browser support embedding, the user should see an ico animation running in a browser sub-window. If embedding is not supported, the user sees an image/logo that represents ico; when the user clicks on this image, ico is activated in a separate top-level window.
In both cases, the RX document contains information about the type of remote activation supported by this server. The syntax we use contain simple FIELD_NAME/value lines.
The first line is mandatory and describes the location of the server side activation script, for instance a perl script activated thru the Common Gateway Interface (CGI). This line uses the ACTION field name:
ACTION URLThe ACTION field is followed by an optional logo line:
LOGO URLpointing to an image file (.gif, .xbm) used to represent the remote program "inline" if embedding is not supported. A list of data lines for each display protocol supported by the application follows; the UI_PROTOCOL is required, while the GEOM_HINT and EMBEDDABLE fields are optional:
UI_PROTOCOL x11 | ica | news | etc. GEOM_HINT width x height EMBEDDABLE no | yes
For example (ica is the name of the network protocol used
by WinDD),
the RX document shown in Figure 1 represents an application that can
be activated using either the X11 Window Protocol or WinDD, and for
which there is a default logo provided and some different size hints
depending on the type of UI selected. Figure 1 also shows the general
architecture of the system.
Figure 1. General RX architecture
The sequence of actions shown in Figure 1 is initiated when a browser, such as Mosaic or Netscape, finds an RX reference in an HTML document. Essentially, the browser sees the RX document link in the HTML file, so it asks the http server for the document. The httpd daemon serves the document and the browser spawns the rx agent. The rx agent then asks the httpd daemon to process the action script. When the script is activated, it uses the information in the RX document to activate the remote client, which is rendered in the browser display window.
In the above picture, the RX document itself (anim.rx) is represented as a separate entity from the RX script (anim.pl). In our current implementation, we usually carry only one CGI script on the server side, which generates either the RX syntax (if requested via HTTP GET without argument) or activate the remote program (when requested a second time with arguments for Display, Geometry, etc). Storing the information about what UI_PROTOCOL is supported and how to activate programs for each protocol in one single file is easier to maintain from an administrative point of view.
Example 1: "Duo" RX script (http://x.org/cgi-bin/xclock.pl)
#!/usr/local/bin/perl if (!$ARGV[0]) { print"Content-type: application/x-rx ACTION http://x.org/cgi-bin/xclock.pl # myself UI_PROTOCOL x11 GEOM_HINTS 200x200 EMBEDDABLE yes LOGO http://x.org/icons/xclock.xbm"; } else { system ('xclock -window '.$ARGV[2].' -g '.$ARGV[1].' -d '.$ARGV[0].' &'); }
application/x-rx; rx %sThis binds the RX type to an external program, the rx agent. When the .rx document is passed to this program, it performs the remote activation in its own top-level window, as described in detail below.
For the embedding case, there is a clear need for support in a Web browser beyond the current case of remote execution. The problem is that this support just hasn't taken real shape yet. For instance, it would be appropriate to have MPEG animations rendered directly in the browser window, in the context of an IMG tag. This rendering would be handled by a separate process that is local to the browser, i.e., an external viewer, whose output would be redirected to the browser window instead of being displayed in a separate top level window. If that mechanism were in place, then RX would become just another instance of an external-for-embedding viewer handling remote execution.
Recently, new mechanisms called "Plug-Ins" have appeared in popular browsers such as Netscape [5], allowing for embedding of arbitrary external applets via dynamic loading. The RX agent would just have to provide an external plug-in API to be integrated in such a Web browser.
The RX document also contains fields that provide information about embedding. The EMBEDDABLE field provides information to the browser about whether the remote client is capable of being embedded in one of the browser subwindows. For instance, it might tell the browser that the specified X client can be given a top-level window ID at startup time (as a -window ID command-line option). A browser can use this information (directly or indirectly thru the applet) and the file pointed to by LOGO to show a press-here-to-activate icon to replace a nonembeddable program in the IMG context. A browser can also elect to defer any activation of an embedded client until the user requests activation, in which case it can also use the LOGO icon by default.
GEOM_HINT gives information to the browser about the preferred geometry of the remote client. This is mostly useful in the case of an embedded program, where the browser needs to create an embedded window of the correct size up front. Note that GEOM_HINT is not required with HTML 3.0, where the IMG tag can include WIDTH and HEIGHT attributes. In any case, if the size information is not present, and embedding is still desired, the original size of the embedded window is browser-dependent. The browser can also deny any geometry changes thereafter, or it can allow for dynamic reconfiguration.
There are sundry technical issues related to designing general embedding in X, such as geometry negotiation, menu inclusion, and focus and session problems. In the Web browser case, though, one easy path would be to consider solving only a subset of the problems. Geometry negotiation, for instance, is not a necessity, given that browsers usually never resize their embedded graphics. The same principle could be enforced for more dynamic graphics, such as X applications. Menubar extension and command support could also be ignored for clients embedded in a Web browser.
Session management, however, is still a real issue that needs to be solved in this context. If you think about going back and forth between pages that provide "live" embedding, you need to consider what should happen to the active programs while you are surfing. This issue is true regardless of our Mobile GUI study: if embedding of MPEG is the goal (i.e., having mpeg_play displaying its output in a browser subwindow), then a way to stop the rendering and restart it in the same context later is clearly needed.
An embedded architecture is reminiscent of systems like OpenDoc and OLE, so we should not go into the details of solving this particular aspect of the larger problem without considering what is being done by these two systems.
The job of the rx agent, when resolving an RX link, is to remotely activate the specified program in the context of the current GEOM_HINT and EMBEDDABLE states. In order to do so, it uses the ACTION pointer and the UI_PROTOCOL information. ACTION is the activation script itself. It is either a valid URL or a local path, in which case the http server host is assumed. This activation script is common to all UI protocols; it is the sole entry point to the real execution of the remote program.
The rx agent, or applet, on the browser side, must send the necessary information to this CGI script for the remote execution to happen. It must first decide which UI protocol to use. This is simple and usually found in the environment (i.e., the browser knows if it is an X browser or a Windows browser).
UI_PROTOCOL specifies the types of remote display protocols the program can speak. It's an indication for the browser side of what kind of information must be passed back in the remote execution protocol communication. If the browser determines that it cannot deal with any of the UI protocols specified, it needs to display an error message, or a specific image in case of embedding, as in the case when the IMG context specifies an incorrect image path.
In the case of an X11 UI_PROTOCOL connection, typical information to be sent back to the activation script includes:
The rx agent also has to deal with security (see below). In the case of X, this may involve a firewall proxy server and/or use of the xhost program.
In the most trivial case, the browser X display can be accessed by the remote X client with no restriction. The browser side has only to authorize the connection for the remote host at large (e.g., xhost +hostname) and pass the X protocol information to the remote host so that it can execute the X client on the browser's display. This simple case probably covers a small part of today's Web usage--LAN internal connections or a company-wide network--but I think it is likely to grow in the future (note that there is still an issue with using xhost, related to the fact that the X browser might not be run on the X Server host itself, but as a remote client already. Some sort of "transitive" X authentication might be needed in that case).
In the generic WAN case, direct X connectivity is not usually allowed between two given hosts and a firewall machine may be present. In this case, a firewall proxy X server is probably needed to guarantee a secure X connection between the remote X program and the browser session, where private information usually resides.
The X Consortium security working group is currently studying the X security proxy case. Several companies have already made experiments based on the xscope architecture, and new work in on its way in this area, so I won't elaborate further on that part. The basic idea is to look closely at the X protocol and determine which requests should be blocked, what interclient interchange can be allowed, etc.
Once such an infrastructure exists, the rx agent will just take advantage of it in the process of indicating to which X Display id the remote program must connect.
You'll note that the architecture presented doesn't preclude using a higher level UI protocol, such as a UIMS. This is just a matter of having the correct UI server, such as a widget server, in place on the browser side, and specifying the correct UI protocol in the RX document.
Last, and pragmatically, because they outnumber X Window browsers,
we need to think about resolving issues surrounding non-X Window
browsers (MS/Windows, Mac) so they may use this architecture with
remote X Window clients (which in turn outnumber the non-X
network-graphic applications).
The basic idea is to use PC-XServer as add-on applets for PC browsers.
For the nonembedding case, there are several PC-XServer software packages on the market that can provide X services integrated with native Windows service and therefore would easily solve the problem of creating top level X windows side by side with the PC browser window.
The embedding case is trickier, since the level of PC-XServer integration would have to move down to the subwindow level, and none or few of the existing systems currently allow for it. Another approach would be for the Windows or Mac Web browsers to understand the X protocol natively, and become virtual X servers for their own viewport window. Since the X server code for Intel and Mac is freely available from the X Consortium, this might actually be a viable solution for the browser companies.
In this paper, we presented an architecture that utilizes the X Window System fundamental client-server nature in the context of the World Wide Web. To become truly operational, however, such a system requires the availability and the integration of several independent functionalities:
Figure 2. Embedded Xt/Motif application on the Web
In this example, the Motif application is taking charge of the complete browser page content, so the source for the underlying HTML document is quite minimal:
<IMG ALIGN=MIDDLE src="xm20demo.pl"> </A>The associated xm20demo.rx httpserver-to-browser stream is listed below:
ACTION http://fedora:8001/cgi-bin/xm20demo.pl UI_PROTOCOL x11 GEOM_HINTS 600x600 EMBEDDABLE yes
0. This paper is at http://www.x.org/people/daniel/mobgui.html
1. Web Consortium site, Style Sheet for HTML, http://www.w3.org/hypertext/WWW/Style.
2. Web Consortium site, Mobile Code model, http://www.w3.org/hypertext/WWW/MobileCode/
3. Bob Scheifler, Jim Gettys, The X Window System, Digital Press, ISBN 1-55558-088-2. (also http://www.x.org)
4. NCSA site, Common Gateway Interface (CGI), http://hoohoo.ncsa.uiuc.edu/cgi/overview.html
5. Netscape site, Netscape Client Plug-In API, http://www.netscape.com/comprod/development_partners/plugin_api
Daniel Dardailler [
http://www.x.org/people/daniel ]
X Consortium
daniel@x.org
Daniel Dardailler holds a Ph.D. in CS from the University of Nice Sophia-Antipolis (France). He joined the X Consortium staff in 1994 to act as a software architect for the Common Desktop Environment project (CDE). Prior to that, he spent four years at the OSF as a principal software engineer in the area of the Motif widget set.