Providing Data on the Web: From Examples to Programs
Carlos A. Varela,
Caroline C. Hayes
Department of Computer Science
University of Illinois at Urbana-Champaign
cvarela@cs.uiuc.edu, hayes@cs.uiuc.edu
URL: http://fiaker.ncsa.uiuc.edu:8080/WWW94-2/paper.html
Abstract
The World-Wide Web provides access to a global information universe
using available technology [Berners-Lee et al. 1992].
In order to fully realize the benefits of this information system, we are
developing a system, Zelig, to provide on-the-fly access to databases
and dynamic information through effective user interfaces
[Varela and Hayes 1994].
In this paper, we have extended Zelig to generate code for
performing conversions from fixed data formats into hypertext.
Consequently, information providers only need to give
examples of their current database reports and the desired hypertext
to be generated for those particular examples. Zelig produces
the program to extract relevant data from the reports and the
schemata to drive the hypertext generation process. We include
as an example,
an interface to ph/qi, the CCSO nameserver software providing data
for academic institutions around the world.
1. Introduction
The World-Wide Web offers easy access to a universe of information by
providing links to documents stored on a world wide network of machines
in a very simple and understandable fashion. Much of its
success is due to the simplicity with which it allows users to provide,
use and refer to information distributed geographically around the globe.
Another
important feature is its compatibility with other existing protocols,
such as gopher, ftp, netnews and telnet. Furthermore, it provides users
with the ability to browse multimedia documents independently
of the computer hardware being used.
The World-Wide Web is based on the HyperText Transfer Protocol, HTTP, and
the HyperText Markup Language, HTML. HTTP is a generic object-oriented
stateless protocol to transmit information between servers and clients
[Berners-Lee 1992].
HTML is a simple, yet powerful platform-independent document language
[Berners-Lee and Connolly 1993].
When the documents to be published are dynamic, like those resulting
from queries to databases, the hypertext needs to be generated. For this
purpose, there are scripts, which are programs that perform conversions
from different data formats into HTML on-the-fly. Even tough for fixed
data formats these scripts may be simple, providers need them to be
able to publish their data on the Web. Furthermore, even
basic changes to the data formats or the generated HTML, imply changes to
these scripts.
To overcome these problems, Zelig generates scripts that base their HTML
generation on schemata [Varela 1994]. In this research
we extended Zelig, to additionally
produce code for performing conversions from fixed data formats into
HTML. There are
two main stages in this conversion process: extracting database,
record and field information from your traditional database report;
and instantiating that categorized information along with the query
information into a particular schema.
Using Zelig to provide access to dynamic information in a fixed format,
providers only need to give examples of their current text
reports and the desired hypertext
to be generated for those particular examples. Our system, Zelig,
produces the program to extract relevant data from the reports and
the schemata to drive the hypertext generation process.
Thus, it becomes easier to provide effective user interfaces to dynamic
information in the World-Wide Web.
In the next section, we elaborate more on the server-client model used
by the World-Wide Web, and the functionality of scripts.
In section 3, we highlight the problems faced by providing WWW access
to dynamic data. In section 4, we explain the architecture of Zelig, our
system that performs schema-based HTML generation. In section 5, we
demonstrate the ideas presented with a gateway to the CCSO ph/qi
nameserver databases.
Finally, in section 6, we give some conclusions and results.
2. Background
2.1. The World-Wide Web: A Server-Client Model
The World-Wide Web consists of a network of computers which can act in two
roles: as servers, providing information; or as clients,
requesting for information.
Fig 2.1. Server-Client Architecture [Berners-Lee and Cailliau 1992].
This communication is performed under the stateless HTTP protocol.
In a stateless protocol, connections are created, processed and
closed without keeping state information. The server actions depend
on predefined methods such as GET, POST, PUT, and DELETE.
The resulting information can be served in different
format types and it is the client's responsibility to present it in a
consistent and clear manner. The most common format is HTML, which
contains information, and its logical structure; but leaves out those
details particular to specific browser implementations.
It is important to note that a server can provide static documents
to the clients, but it could also provide transparent access to
databases or other information sources. In other words, the clients
can also request for specific queries that should be processed by the
server. Scripts or gateways take care of this processing. These are
programs that communicate with the WWW server software under a predefined
interface. The most common currently used interface is the NCSA's Common
Gateway Interface, CGI [McCool 1993].
2.2. Scripts: WWW Gateways to Databases
Scripts are CGI compliant programs that act as clients to the applications
owning the data and produce the corresponding hypertext for the requested
information. They communicate with the WWW servers through an
interface, in this case CGI, which establishes how to pass the
information from the WWW client to the script and from the script
back to the WWW server and subsequently to the WWW client.
Fig 2.1. Purpose of a script and a WWW server [Berners-Lee and Cailliau 1992].
These scripts are written in any programming language (like C, C++ or
PERL) and their main functions together with the WWW server are:
- To receive the information from the WWW client under
the hypertext transfer protocol, HTTP.
- To perform a query for the database server, allowing the
WWW server to act as a database client.
- To parse the database server results.
- To generate an HTML document and send it to the WWW client.
3. Providing Dynamic Data on the Web
After the short introduction in the previous section to the mechanisms
under the Web, let's see why we want to automate the script creation
process:
- To provide access to many information sources that are currently
using non-hypertext formats. There are many fixed data formats
for which we would need to create different scripts. For example:
phone information,
bibliographic databases (BibTeX), papers (LaTeX), electronic mail,
server administration usage statistics (logs), UNIX manual pages,
file directories.
- To more easily design hypertext interfaces to databases
by making changes at the level of schemata, as opposed to
modifying, recompiling and retesting scripts.
- To create evolving user interfaces, by instantiating
schemata to most common accessed fields for queries, and changing
the order or level of the different user interface actions.
- To increase the functionality of the data management system.
For our phone example, in section 5, we have included sorting
records, which was not in the original ph/qi functionality.
- To reuse the schemata across different databases to provide
similar look-and-feels for users.
4. Zelig: From Examples to Programs
Figure 4.1 shows a general framework for Zelig. Scripts generate
HTML reports based on instantiating schemata to the query info
and the categorized database output. The schemata can be taken
from a library, or generated from HTML report examples. The query
info is created by the HTML Query Form, which is provided by the
application designer. This information is given to the traditional
database manager system which returns a report in a fixed
format. This report is parsed and relevant information is
extracted and categorized. The resulting HTML Report can contain
links to more information on particular records, or even additional
HTML Query Forms for more database processing.
Fig 4.1. A General Framework for Zelig.
In section 4.1, we see how
the HTML instantiation process takes place. In section 4.2, we see how
to extract Query Info from the HTML Query Form, how to extract Categorized
Output from the traditional DB Report, and how to abstract a schema from
user-given HTML Examples.
4.1. Schema-Based HTML Generation
In traditional HTML generation, user interfaces are created by scripts
directly. This implies that changes to interfaces have to be performed
at the level of the source code of the script. We present a methodology
based on schemata, to allow designers to debug and maintain the user
interfaces without directly changing the scripts.
In this section, we will explain the information that is provided by the
schemata to the scripts for document generation. Then, we will give a
description of ZHTML, the language to write these schemata, which is an
enhancement to HTML incorporating directives for database interface
generation.
4.1.1. Instantiating schemata
The scripts base their hypertext generation not only on the current parsed
database query results, but also on existing ZHTML schemata.
We can further categorize this information as:
- Current database transaction results.
These are the results generated by a query to a database. We will see
in section 4.2 how to interact with the database server, parse its
results, and standardize the query information to the following taxonomy:
- On an application level.
The most general database information. For example: name, number
of tables, default query table.
- On an object level.
The information for a specific object of our application. For
example the table PEOPLE may have: name, number of records,
number of fields, default field for queries.
- On a field level.
The most specific information about a query for a given
record. For example: field names, default field values, current field
values, length, type, and access policy.
- Existing ZHTML schemata.
These are generated either from HTML report examples given by the
interface designer, or taken from a library. Some library examples are:
- ISINDEX-based user interface schema.
This schema is based on the HTML <ISINDEX> tag. It is important
to note that there are many indexed databases in the Web that use WAIS
as the basis for database search.
- Forms-based user interface schema.
This schema makes use of the forms-based WWW browsers to provide a more
friendly user interface with menus and widgets to perform the most
common database operations.
- Application-specific user interface schema.
This schema is usually an advanced interface tailored to the specific
needs of a database user interface. It still allows evolution, in the
sense that certain constructs don't imply any order or level of access in
the generated HTML documents.
4.1.2. ZHTML Language Description
A ZHTML schema is an HTML document which has been annotated with
comments, which are used as directives for the script. These comments
are parsed and executed by the script, and the resulting text is placed
instead of the original comment. This is performed at run-time,
using the current database query results. Future work includes writing
script code generators departing from these schemata.
The ZHTML comments are similar to HTML constructs. They are generally
of the form:
<!ZTAG>
ZHTML body
<!/ZTAG>
There are several Zelig directives with different functionality, including:
print a variable value, run an external function printing its
output, conditionally include the ZHTML body, traverse all the
current database records invoking Zelig recursively on the ZHTML body
and traverse all the fields in a specific object.
Following are the main constructs of this Document Type, even though a
formal Document Type Definition (like the DTD shown for HTML in [Berners-Lee
& Connolly 1993]) is still in progress:
<!ZPRINT variable>
returns the current value of an HTML form variable, or an
application-level variable. If the variable is object-level
or field-level, then it needs to be in the scope of a
ZFOR
tag.
<!ZRUN external_fn>
runs the script function external_fn and returns its output.
<!ZIF cond-expr>
ZHTML body
<!/ZIF>
returns the output from ZHTML body if cond-expr is true.
It returns the null string otherwise.
<!ZFOR TYPE=traversal_type>
ZHTML body
<!/ZFOR>
traverses all the tables, records or fields of the current query,
depending on the traversal_type (which can be the value
TABLES, RECORDS, or FIELDS) and returns the output from ZHTML body
instantiated to each of the loop elements in the query.
4.2. Automating the Database Report Extraction
4.2.1. Database Output Categorization
We'll concentrate on database manager systems that produce reports
with a fixed format. These reports usually contain tabular information,
where application-level data is in the beginning and end of the
produced report. For example, the directory being listed, or the
university being accessed for phone information. In the middle, we
often find repetitive information in a structured fixed way. It's
repetitive because there is one entry for each record matching the
original query. These entries are usually separated by a record
separator, which allows us to differentiate among records. Finally,
we also have a field separator. which allows us to divide record
information into yet more specific detail.
In a file listing example, the first line has
application-level information,
the total space occupied by the directory. Then, we see
records (files) that in turn can be divided into fields (name, size,
owner, date...) What we do, is to guess where these separators lie
and confirm them with the user, prompting her for any unknown
information. Then, we proceed to generate the data structure, necessary
to instantiate the ZHTML file once new queries get requested.
4.2.2. Generating the Query Info from the HTML Form
HTML forms contain a name and a value pair. For example,
a form may contain three variables: directory, mask, and
sort-by which have default values and get instantiated to the
user-given values when the form is submitted.
Note that the query info described above, can contain
information that will not be processed by the DBMS, but instead
it is functionality provided additionally by Zelig, such as sorting
by a specific field.
Here in this subsection, we work on generating the database query
from the form variable bindings and the given query example.
In our examples, we mainly have to create:
- % ls <directory>/<mask> or
- % ph <name> return <fields>
4.2.3. Generating a Schema from a Given Hypertext Example
Once we know for a particular example, how we want our hypertext to
look like; i.e. we have HTML files for specific queries; we can
abstract those ZHTML schemata to be instantiated to other queries
as well.
We do this by querying the user when we aren't sure if the information
parsed is relevant (needs to be categorized to subsequently be used by
the schema instantiation algorithm) or it is just a separator.
In the following section, we show an example illustrating a schema,
and a couple of its possible instantiations depending on the database
query.
5. A Running Example: The CCSO Phone Nameserver Database
The CCSO nameserver software provides a server-client model for
accessing phone directory information from academic institutions
[Dorner 1992]. The figures in this section
have been created browsing HTML files in NCSA Mosaic for X
[NCSA 1993].
The following is an
HTML Query Form to access
those databases:
The following link contains a schema
for instantiating the categorized database information, once a query
has been made. We will show two different
instantiations depending on two different user queries for this
same schema:
6. Conclusions
The success of a distributed information system lies heavily on the
simplicity for generating, providing, using and referring to
information. The World-Wide Web is composed by excellent
protocols, tools and languages to perform these actions for
static information. We have designed an extension to this
technology to easily provide access to dynamic information,
such as the result of queries to existing databases.
The functionality for our system, Zelig, was described in this
paper. Its main improvements over previous technology include:
- providing code generation for converting fixed data formats
into hypertext;
- allowing evolving user interfaces for more effective
human-computer interaction;
- increasing the functionality of applications owning the data,
by offering additional operations such as sorting; and
- reusing HTML schemata to provide similar look-and-feel
interfaces across different applications.
We provided a hyperlinked example giving WWW access to the CCSO
ph/qi nameserver software. This gateway running at NCSA,
as of September 1994, provides phone directory information for about
250 academic institutions around the world, and receives more than a
thousand queries per day.
Ultimately, Zelig offers the user an effective way to generate
fully customized interfaces to dynamic data, further closing
the gap between information generation, provision and use.
Acknowledgements
Thanks to the NCSA Software Development Group for their helpful comments
on this paper and their excellent research and working environment.
Additional thanks to Professor Dershowitz, for his comments and
motivating research [Dershowitz 1983].
References
[Berners-Lee 1992]
T. Berners-Lee.
Hypertext Transfer Protocol Requirements.
Internet Working Draft. CERN. Work in progress.
http://info.cern.ch/hypertext/WWW/Protocols/HTTP.html
[Berners-Lee et al. 1992]
T. Berners-Lee, R. Cailliau, J. Groff, B. Pollermann.
World-Wide Web: The Information Universe.
Electronic Networking: Research, Applications and Policy, 2(1),
pp. 52-58, Meckler Publications, Westport CT, Spring 1992.
ftp://info.cern.ch/pub/www/doc/ENRAP_9202.ps
[Berners-Lee and Cailliau 1992]
T. Berners-Lee, R. Cailliau.
World-Wide Web. Submitted to Computing in High Energy Physics 1992.
ftp://info.cern.ch/pub/www/doc/chep92www.ps
[Berners-Lee and Connolly 1993]
T. Berners-Lee, D. Connolly.
Hypertext Markup Language: A Representation of Textual Information and
Metainformation for Retrieval and Interchange.
Internet Working Draft. CERN, Atrium Technology Inc. Work in progress.
http://info.cern.ch/hypertext/WWW/MarkUp/HTML.html
[Dershowitz 1983]
N. Dershowitz.
The Evolution of Programs.
Birkhauser, Boston, 1983.
[Dorner 1992]
S. Dorner.
The CCSO Nameserver, Server-Client Protocol.
Computing and Communications Services Office. University of Illinois at
Urbana-Champaign. July 1992.
[McCool 1993]
Rob McCool.
National Center for Supercomputing Applications, University of
Illinois at Urbana-Champaign.
Common Gateway Interface Overview. Work in progress.
http://hoohoo.ncsa.uiuc.edu/cgi/overview.html
[NCSA 1993]
National Center for Supercomputing Applications, University of
Illinois at Urbana-Champaign.
NCSA Mosaic. A WWW Browser. Work in progress.
http://www.ncsa.uiuc.edu/SDG/Software/Mosaic/Docs/help-about.html
[Varela 1994]
C. Varela.
Zelig: Automating Database Provision for the World-Wide Web
Ninth International Symposium on Information Systems,
Kobe, Japan, Oct 11-13, 1994. Invited talk.
http://fiaker.ncsa.uiuc.edu:8080/IT94.html
[Varela and Hayes 1994]
C. Varela, C. Hayes.
Zelig: Schema-Based Generation of Soft
WWW Database Applications. First International Conference on the World
Wide Web, Geneva, Switzerland, May 25-29, 1994.
http://fiaker.ncsa.uiuc.edu:8080/WWW94.html
Carlos A. Varela (cvarela@cs.uiuc.edu)
Received his B.S. in Computer Science
(CS) at the University of Illinois at
Urbana-Champaign, where he is currently a M.S./Ph.D. student.
His research interests include integrating formal methods of artificial
intelligence in software engineering, specially information systems.
Carlos has also been a research assistant at the National Center for
Supercomputing Applications (NCSA)
since 1991. At NCSA he has worked
in different projects including an alpha shapes visualizer (NCSA
Walvis), a World-Wide Web browser (
NCSA Mosaic for X/Windows), and a World-Wide Web server (NCSA httpd for Unix).
In the past, Carlos has been a Math and Computer Science teaching
assistant for classes up to differential equations and information
systems at the
University of Los Andes, Bogota, Colombia. He has
also been a consultant for Arthur Andersen & Co., and an Artificial
Intelligence fellow at the
Beckman Institute for the Advancement of Science and Technology.
Caroline C. Hayes (hayes@cs.uiuc.edu)
Received her B.S. in Math, M.S. in Knowledge-Based Systems, and Ph.D.
in Robotics, all from Carnegie Mellon
University.
Currently she is an assistant professor at the
Department of Computer
Science and at the Beckman Institute of the University of Illinois
at Urbana-Champaign.
Her research interests include artificial
intelligence, specially planning, design, abstraction, and
knowledge-based systems; as well as computer-aided manufacturing
and design. Professor Hayes is particularly interested in
tools evaluating, criticizing and optimizing designs in areas
from machined parts, intersection design, roofing design and
software design.