David Chancogne
INRIA - Rocquencourt - France
ISR - University of Maryland - USA
doc@isr.umd.edu
Mark Austin
ISR - University of Maryland - USA
austin@isr.umd.edu
Authoring on the Web with HTML allows for the creation of content rich documents containing text, images, sounds, and so forth. The traditional approach for creating documents that can be dynamically adapted to a user's specific needs is Common Gateway Interface (CGI) programming [6]. By using CGI programs written in Perl, for example, you can program `on-the-fly' documents that respond to user demands. Generally speaking, GGI programming requires a detailed knowledge of at least one computer language, so it is therefore not everybody's cup of tea. A second closely related problem faced by many system administrators is how to create and maintain a Web server containing a large number of multilanguage documents. This paper describes a family of `DietWeb' tools that simplify the creation of dynamic documents by moving the programming components of traditional development to authoring of html pages with an extended tag language.
Keywords: CGI programming, HTML, Dynamic documents, Interactive documents, virtual environments.
The World Wide Web [1] is growing faster every day, and the time where universities and research labs were the only ones with a Web server is long gone. Today retail outlets see a need for web sites for on-line shopping, museums want to install virtual tour web servers for their fast changing collections, and schools look for interactive on-line class materials. In each of these web application developments, there is a strong need for sites having: (a) rich graphical content (including animated images and Java applets), and (b) mechanisms that provide users with the ability to become actively involved in the way they visit a Web site, including what they see, and how and when they see it. A good example of the latter requirement is choice of language within a single document (e.g. English, French or Spanish).
This paper describes a family of DietWeb tools currently in development. The development objective is easy creation of dynamic documents by moving the programming components of traditional CGI development to authoring of html pages with an extended tag language.
The DietWeb tools project dates back to March 1995 when the first author was working on the development of Siddhartha (http://ottawa.ambafrance.org/Siddhartha), a virtual game environment hosted at the French Embassy in Ottawa. Siddhartha was developed entirely in Perl [5], and interfaced to the Web using CGI [6]. So-called `on-the-fly' HTML documents corresponding to a mixture of static information (fragment of HTML) and computed information are sent to the user's browser. Because no easy-to-use tools were available to create the programs and static HTML pages at that time, we slowly built our own. The resulting product is a set of editors, Perl programs and libraries that facilitate fast and easy development and integration of interactive and dynamic environments on the Web, using a browser (like Netscape) as the graphical interface.
Traditional approaches to dynamic document creation require HTML authoring and CGI programming development, where a specific program to handle a problem is written and connected to the Web through a CGI interface [2,6]. People who are familiar with computers, but do not have the skills to write programs in scripting languages such as Perl, are locked out of this development pathway because they cannot write CGI program components. Our goal is to mitigate this problem by transferring most of the process of creating interactive and dynamic documents to the authoring phase, but in such a way that a writer's ability to process data is not compromised.
The DietWeb development philosophy is based on extensions to basic HTML [3], with a set of new tags that follow the familiar markup syntax but provide users with the ability to control and process data inside the HTML document. The HTML tag extensions are:
The benefits of this approach include: (1) the extended HTML syntax is very easy to understand, and is therefore accessible by people already familiar with HTML; (2) development teams do not need to be composed of authoring and CGI groups, and (3) the need to install CGI scripts is bypassed completely. A potential long-term benefit is that these HTML tag extensions can also be incorporated into next-generation browsers containing DietWeb libraries.
So how does DietWeb compare to other development environments? While the MAWL [4] (http://www.cs.utexas.edu/users/cpg/mawl/) environment certainly allows for the development of more powerful applications, the details of implementation are really quite complicated. It is therefore less accessible by general HTML authoring audiences. For the development of most dynamic and interactive documents, DietWeb has enough capability, and it is easy to use.
Figure 1 is a schematic of the current DietWeb architecture.
The DietWeb toolkit contains:
Collectively, the DietWeb system components allow developers to focus on authoring, a process that passes through three key steps: (1) editing, (2) parsing, and (3) compiling or interpreting.
DietWeb comes with an editor called Jiva (written in Java) for easy insertion of standard HTML tags, DietWeb HTML extensions, and variables. The primary purpose of Jiva is management of variables (the concept of variables need not be known to Jiva users, however). Variables are needed whenever you use the notation %variable%, and the <COND>, <LOOP> and <LANG> tags. The editor will ensure that when a variable is used for the first time, its declaration is properly included in the output file (i.e. by including a <VAR> tag). The Jiva editor creates a native DietWeb file called a diet file, marked by the file extension .dt.
Here is an example of a diet file. Note that the user only sees the part that is not hidden by HTML comments (<!-- -->).
<!-- # File Name # Author --> <!-- Variables : --> <VAR NAME="%document_title%"> <VAR NAME="%color%> <VAR NAME="%username%> <VAR NAME="%lang%> <!-- --> <HTML> <HEAD> <TITLE>%document_title%</TITLE> </HEAD> <BODY> <H1 ALIGN=center>%document_title%</H1> <H2 ALIGN=left>Welcome %username% !</H2> <COND NAME="%variable%" VALUE="red"> Red is a nice color ! <DEFAULT> Why didn't you chose red? I like red ... </COND> <LANG=fr> Cette partie est affichée pour les documents Français. </LANG> <LANG=us> This part is printed for English documents. </LANG> </BODY> </HTML> <!-- # End of diet file -->
Control in the authoring process is achieved with the group of tags
<COND NAME="" VALUE=""> <DEFAULT> </COND>,
which allow the value of variables to be tested. In most applications, these variables will come from user input provided via HTML FORMs. The tag syntax
<LANG=lang_code> </LANG>
simply provides a way for blocks of text in a variety of languages to be included in the same document.
Some variables, like %document_title%, are available in every diet file.
The Jivapas parser (a Perl program) transforms the tags <COND>, <LOOP> and <LANG> into Perl instructions embedded within an HTML document. The transformation process is fully automated -- the developer of DietWeb files doesn't need to know how it works. The following substitutions occur in the parsing process:
After the parsing phase the previous diet file would become:
<!-- # File Name # Author --> <!-- Variables : --> < . local(%document_title%)=\$document{document_title}; --> < . local(%color%)=\$document{color}; --> < . local(%username%)=\$document{username}; --> < . local(%lang%)=\$document{lang}; --> <!-- --> <HTML> <HEAD> <TITLE>%document_title%</TITLE> </HEAD> <BODY> <H1 ALIGN=center>%document_title</H1> <H2 ALIGN=left>Welcome %username% !</H2> <!-- . if (%variable% eq "red") { --> Red is a nice color ! <! -- . } else { --> Why didn't you chose red ? I like red ... <! . -- } --> <-- . if (%lang% eq "fr" ) { --> Cette partie est affichée pour les documents Français. <-- . } --> <-- . if (%lang% eq "us") { --> This part is printed for English documents. <-- . } --> </BODY> </HTML> <!-- # End of diet file -->
The Jivapas parser transforms the diet files into light files marked by the extension .lt. Light files are a mixture of HTML and Perl. In the next step they will be transformed into a Perl script. Readers should observe that Perl line expressions have the character '.' prefix, and are usually embedded within HTML comments (<!-- --> ) so that the resulting file is regular HTML, and can be viewed by any browser.
Depending on the type of server being developed, at this point light files can be either compiled or interpreted.
The DietWeb compiler transforms light files into one or more static HTML files. The principal application for compiled light files is multilingual documents. The compiler will create one HTML file for each language represented inside the original light file. The resulting language-specific HTML files would then be available to the Web.
The second approach, similar to the classical CGI method, is used for dynamic and interactive documents. Links to light files are made through a CGI Perl script called cgi-www, as shown in Figure 1. This script will create a Perl program from the light file and interpret it. The interpreted result will be an HTML document that is sent back to the Web browser through the CGI mechanism [6].
The syntax for an interpreted call is:
<A href="/cgi-bin/cgi-www?file=next.lt">Click here for next</A>
Table 1 shows the lexical elements that you will find in a light file. The Perl regular expression is the one used by the compiler and cgi-www to process the files.
What | Syntax | Perl Regular expr | Use |
Variables | %name% | %[a-zA-Z0-9]+_*[a-zA-Z0-9]*% | Expand by perl |
Comments | # text <!-- # text --> |
^\s*#.*$ ^\s*<--!\s*#.*\s*$--> |
Ignored |
Perl lines | . perl instruction <!-- . perl instruction --> |
^\s*\..*$ ^\s*<--!\s\..*-->\s*$ |
Evaluated by perl |
HTML lines | <!-- text --> Anything else |
Anything else | Print out as HTML |
Both the compiler and cgi-www Perl script use the following algorithm to read the light files.
foreach line of light file : if <B>"comment" then null end if if "perl line" then suppress leading . or suppress <-- . and --> transform "variables" (%name%) in $name add line to perl_stack end if if "HTML line" then transform "variables" (%name%) in $name # This the Perl protection mechanism for special characters transform " in \" transform @ in \@ line = print "line\n" add line to perl_stack end if end for eval perl_stack
In a typical user session, information will be provided to HTML forms located on a Web browser. After the CGI interface has passed the value of all fields of the HTML form to the general purpose cgi-www script, the value of %variables% will be available for light file use. This relationship of components is established in the FORM:
<FORM ACTION="/cgi-bin/cgi-www?file=result.lt" METHOD="POST"> <INPUT TYPE="text" NAME="username"> <INPUT TYPE="submit"> </FORM>
We assume in this example that result.lt will use the variable %username%, whose value will be specified via user input. Both the POST and GET methods are supported by cgi-www. We also note that the cgi-www program has been designed to take advantage of Netscape Navigator Cookies, thereby allowing information to be stored for future sessions.
One of the earliest applications written with DietWeb is Alf (http://ottawa.ambafrance.org/ALF), an interactive French course where the user travels on a small adventure through Paris. In this Environment, the time left to the user to accomplish his mission is always displayed on the interface. The hardest part of this project was developing the scenario. Once the scenario was written however, it took one person five days to build the application using DietWeb (at that time we only had a pre-version of DietWeb, meaning that today the required development time would be even shorter!).
A list of on-line demos can be found at http://www.isr.umd.edu/~doc/Diet
DietWeb can also be used for the creation of multilingual documents (e.g. English and French).
<!-- File news.dt --> <HTML> <HEAD> <TITLE> <LANG=fr>Nouvelles</LANG> <LANG=us>News</LANG> </TITLE> </HEAD> <BODY> <A href="http://www.sponsor.com"> <H4 ALIGN=right> <LANG=fr>Notre Sponsor</LANG> <LANG=us>Our Sponsor</LANG> </H4> </A> <HR ALIGN=left> <LANG=fr> <H1 ALIGN=center>Nouvelles de France</H1> <P ALIGN=right> <A LANG="us" href="news.lt">English version</A> </P> </LANG> <LANG=us> <H1 ALIGN=center>News from France</H1> <P ALIGN=right> <A LANG="fr" href="news.lt">Version Française</A> </P> </LANG> <HR ALIGN=center SIZE=5 WIDTH="50%"> <LANG=fr> <P ALIGN=left> <B>Voici quelques nouvelles de France :</B></BR> Vous trouverez ici la <A href="list.lt">liste</A> des journaux Français on-line </P> </LANG> <LANG=us> <P ALIGN=left> <B>Here are some news from France :</B><BR> Look at our <A href="list.lt">list</A> of on-line French news papers. </P> </LANG> </BODY> </HTML>
<!-- File : list.dt --> <HTML> <HEAD> <TITLE> <LANG=fr> Liste </LANG> <LANG=us> List </LANG> </TITLE> </HEAD> <BODY> <LANG=fr> <H1 ALIGN=center> Liste des journaux Français : </H1> </LANG> <LANG=us> <H1 ALIGN=center> French news papers </H1> </LANG> <HR ALIGN=center> <P ALIGN=left> <OL> <LI><A href="http://www.lemonde.fr">Le monde</A></LI> <LI><A href="http://www.lefigaro.fr">Le Figaro</A></LI> </OL> </P> </BODY> </HTML>
This script shows how blocks of French language can call other diet files for information in the French language, and when specifically indicated, information in another language such as English.
The DietWeb toolkit development is work in progress -- the latest information may be found at http://www.isr.umd.edu/~doc/Diet. Our future work will address two concerns. First, we will closely examine `speed of the process'. We are concerned that the cost of running a CGI script for every document could slowdown the browsing process. One potential solution to this problem lies in servers like Apache [7], which come with modules that allow you to load some Perl source at server startup (Module mod_fastcgi at http://www.fastcgi.com/servers/apache/apache-fastcgi/mod_fastcgi.html ). By pre-loading cgi-www we can reduce the respond time of the server.
A second problem is `security of the process.' In our prototype implementation, users are allowed to virtually any piece of Perl code in a light file, possibly forcing cgi-www to execute unwanted operations. A potential solution to this problem lies with the Safe module of Perl5 [5], which restricts the access of certain functions to the user. By including this module in cgi-www we can define a set of Perl functions available to the user.
The first author would like to acknowledge financial support from INRIA, FRANCE. We also acknowledge use of computer equipment in the Systems Engineering and Integration Laboratory, Institute for Systems Research, University of Maryland.
David Chancogne, Mark Austin INRIA, Rocquencourt, Associate Professor, Paris, FRANCE Visiting Research Scientist, Institute for Systems Research, Institute for Systems Research, University of Maryland, University of Maryland, USA. College Park, MD 20742, USA. E-mail : doc@isr.umd.edu E-mail : austin@isr.umd.edu