This poster describes the DQL language : a new XML data manipulation language. DQL is an extension of OQL for the manipulation of tree-like and forest-like data. It integrates XPath location paths and provides us with the classical SQL operators and some specific tree transformation operators. The first version of DQL has been implemented in Java using the DOM and the SAX API's.
Semi-structured data, XML, XPath, Query language.
XML can serve as a simple exchange format, or as a data model. In both cases, we need to have a manipulation language at our disposal. Developing such languages is a hard task, due to the complexity of the structures which are involved (e.g. trees and graphs) and the impossibility of relying on a schema. Over the past years, numerous data manipulation languages have been developed for SGML, HTML, XML or semi-structured data ([1], [2], [3], [4], [5], [6], [7]). Recently, a W3C working group has started to specify a new XML data query language : the Xquery language [9]. XQuery is derived from Quilt [2], which in turn borrowed features from several other languages cited above. Despite their intrinsic qualities, none of the above languages has been unanimously accepted, since none of them has been recommended by the W3C.
In this poster, we propose a new XML data manipulation language: the DQL language which is an extension of OQL for the manipulation of tree-like and forest-like data. DQL integrates Xpath location paths [8] and provides us with the classical SQL operators (select…from…where, group…by, sort, etc) and some specific tree transformation operators which give to DQL the power of XSLT[10].
DQL works with two types of value: trees and forests. The type of a tree can be text, number, boolean, attribute or element. A forest consists of a list of trees. A query is a functional expression which is composed of predefined operators and user-defined functions. DQL includes four kinds of operators : (1) extraction of fragments from documents, the location of each fragment is defined with an Xpath expression, (2) construction of elements, (3) select...from...where, group...by and sort...by operators, (4) transformation of document fragments by adding or removing sub-components.
Moreover, it is possible to define local and global variables. The use of XPath to extract fragments from documents and the transformation operators are the two original features of DQL. We give examples of DQL queries which are evaluated on the document of figure 1, which is bound to the $mybib variable.
<bib> <book year="1992"> <title>Advanced Programming in the Unix environment</title> <author><last>Stevens</last><first>W.</first></author> <publisher>Addison-Wesley</publisher><price>65.95</price> </book> <book year="2000"> <title>Data on the Web</title> <author><last>Abiteboul</last><first>Serge</first></author> <author><last>Buneman</last><first>Peter</first></author> <author><last>Suciu</last><first>Dan</first></author> <publisher>Morgan Kaufmann Publishers</publisher> <price>39.95</price> </book> <book year="1999"> <title>The Economics of Tech. and Content for Digital TV</title> <editor><last>Gerbarg</last><first>Darcy</first> <affiliation>CITI</affiliation></editor> <publisher>Kluwer Academic Publishers</publisher> <price>129.95</price> </book> </bib>Figure 1 : A well-formed XML document
<catalog count = count($MyBib//book)>$MyBib//book/title</>
select [$a//last/*, $a//first/*] from $a in distinct($mybib//book[@year = 2000]/author) where count($mybib//book[author = $a]) > 1
select $b/title from $b in $mybib//book[author/last/preceding-sibling::first]
The predicate last/preceding-sibling::first is true if a first element precedes a last element and if both are siblings. This kind of query is useful to detect irregularities in XML documents when they are not valid.
replace //name as $n by <name>[$n/last/*, " ", $n/first/*]</> from $mybib
Without the replace operator, this query would need a complete reconstruction for each book. This would be a complex solution when books have irregular structures.
remove //affiliation from $mybib
replace //book as $b by <book> <year>$b/@year</> content($b) </> from $mybib
replace //name as $n by <name>[$n/first, $n/last]</> //affiliation by void //book as $b by <book><year>$b/@year</> content($b)</> from $mybib
let $bks_MK = $mybib//book[publisher="Morgan Kaufmann"] in <stat count = count($bks_MK) average_price = avg($bks_MK//price) />
Over the past years, numerous query languages for semi-structured or XML data have been proposed. Four of these languages have emerged : Lorel [1] ; XML-QL [5] ; YATL [4] and XQuery [9]. They are very complete and powerful. They have a functional semantics and they use tree or path pattern-matching to locate fragments from documents. In this poster, we compare DQL with these languages, highlighting its originality.
We consider, for this comparison, the following features : (1) the select…from…where operator, (2) the pattern-matching of fragments of documents, (3) the preservation of reading order and (4) the transformation operators.
DQL is a SQL-like functional language which integrates (1) Xpath location paths, (2) traditional SQL operators and (3) powerful tree transformation operator. The first version of DQL has been implemented in Java using the DOM and the SAX API’s. In the future, we will focus on the two following points : (i) making DQL fully compliant with XML query requirements and (ii) working on the query optimisation issue.