Plume Project,
Australian National University
Steve.Ball@tcltk.anu.edu.au
Scripting languages, for example Tcl and Perl, generally process XML using regular expressions. Regular expressions are used in order to improve the performance of the scripting language by processing the text "in bulk". This is necessary since using a scripting language to write a lexical analyzer which iterates over the characters of the text would be extremely slow. However, the bulk-processing technique makes it difficult to cope with certain irregularities in the language, and some nesting constructs. The regularity of XML will determine how easily it can be processed by a scripting language using regular expression based techniques. This poster presents how the TclXML package is able to support access to XML documents using only the Tcl scripting language, and assesses how successful XML is in its aim of achieving ease-of-processing by scripting languages.
TclXML parses documents into a format known as XAPI-Tcl, which a Tcl application can then use to perform processing on the parsed document. XAPI-Tcl presents a tree-structured hierarchical list (a grove), which the Tcl application can process using Tcl list commands. For example the command:
xml::parse {<MEMO PRIORITY="Important"><TO>All WWW7 Attendees</TO><FROM>Steve Ball</FROM> <MESSAGE>XML is terrific!</MESSAGE></MEMO>}returns the following Tcl list:
parse:element MEMO {PRIORITY Important} { parse:element TO {} { parse:text {All WWW7 Attendees} {} {} } parse:element FROM {} { parse:text {Steve Ball} {} {} } parse:element MESSAGE {} { parse:text {XML is terrific!} {} {} } }
The parser accepts options to configure how it constructs the grove. For example,
the markers "parse:element" or "parse:text" may be configured to be
other values. Normally the parser transparently takes care of removing comments and expanding
character entity references, but instead they may be included in the returned structure by
using the -commentcommand
or -entitycommand
respectively.
In other words, the configuration options given to the parser are its grove plan.
The handling of errors in the document is under the control of the application.
By default, TclXML will attempt to recover from document errors and produce a best-guess
approximation of the document structure. However, an application callback script may be
specified to modify this behaviour by using the -errorcommand
.
The Tcl application may manipulate the parsed structure as a list, or it can evaluate
the parsed structure as a script to provide an alternate, convenient processing method.
By applying appropriate special character quoting, TclXML ensures that this is safe.
To process a document using the evaluation method, the application simply defines a
procedure for each of the markers used in the XAPI-Tcl structure,
such as parse:element
.
The TclXML document generation facility allows Tcl scripts to emit XML documents in a convenient manner, with an optional feature of on-the-fly validation of the document as it is generated. This facility works by parsing the document's DTD and then creating Tcl commands for each element and entity that is defined in the DTD. The Tcl application can then create a document by invoking these commands. Element attributes are given as arguments to the commands. Say goodbye to angle brackets!
For example, to generate the XML document given above, the following commands would be used:
xml::generate [xml::parseDTD $dtd] set xmlDocument [MEMO priority Important { TO {xml::text "All WWW7 Attendees"} FROM {xml::text "Steve Ball"} MESSAGE {xml::text "XML is terrific!"} }]
Apart from tracking the development of the XML language specification, further work will extend the definition of XAPI-Tcl and functions will be created for searching and manipulating the parsed document structure, in order to support standards such as XLL [2] and DOM [3].
The TclXML package will feature in the next release of the Plume World Wide Web browser, since XML is now used in many Web standards, such as RDF [5] and SMIL [6], as well as Microsoft's CDF. Plume [Ball] will also support the display of XML documents, and there is progress on the use of stylesheets, such as XSL [4], as well as document scripting using Tcl/Tk.