Although the Semantic Web is meant to help machines interpret data on the Web, end-users still need to view and query the data. Ideally, also non-experts should be empowered to formulate queries. State of the art retrieval systems for semistructured data such as [1] that are targeted towards end-users still use keyword-based search interfaces. We believe that there is a need for graphical interfaces that enable non-expert users to formulate complex queries over semistructured data.
Specifying a graphical notation for a query language involves a trade-off between user interface complexity and language expressiveness. At one end of the spectrum are systems such as Magnet [2] and multi-faceted browsing and navigation systems such as Flamenco [3]. In both system one uses so-called facets as restrictions to filter the data set. These systems are limited in that they do not allow object nesting; only filters pertaining to one item can be specified. In addition they need domain specific customizations tailored towards the schema definition of the data set.
On the other end of the spectrum is Query-by-Example (QbE) [4] known from databases. QbE is a complete query language for the relational calculus, can express a wide range of relational queries, and includes insert and update operations. There is no need to customize that language to a specific domain, since the queries are constructed based on the database schema. However, since semistructured data typically comes without a fixed schema, it is not possible to directly apply the QbE paradigm here.
The goal of this poster is to present a graphical notation for queries over semistructured data that identifies a middle ground between the two paradigms which we believe is useful and easy to use -- and provides sufficient expressiveness to cover a wide range of queries.
We introduce our graphical notation by means of examples. The exemplary data set is expressed in RDF (Resource Description Framework) using Dublin Core and FOAF vocabularies.
In the following, we introduce the basic ideas for the representation. We assume that the reader has rudimentary knowledge of RDF, a language to represent semistructured data on the Web.
The atomic unit of RDF are RDF triples, which consist of subject, predicate, and object.
We use the Notation3 (N3) syntax that introduces variables to the RDF data model to be able to specify queries.
The basic notion of an RDF query is an RDF triple pattern.
A triple pattern is a triple where subject, predicate, or object can be a variable.
An N3 query consists of a ql:where
clause with one or more triple patterns specifying the selection criteria, and a ql:select
clause which specifies the format of a query result.
In the following, we define the notion of an RDF facet which is the main element of a graphically constructed query.
A facet can be seen as a filter condition over an RDF graph. Multiple facets can be combined on subject and object positions by using the same variable, which amounts to a join.
In this section, we introduce the different building blocks of our graphical notation by means of examples. We begin each example with a textual description of the query, then show the corresponding graphical representation, and finally present the query expressed in N3 query syntax. In the queries we omit namespace declarations for brevity.
Single Facet.
foaf:name
and object "Andreas Harth"
.
<> ql:select {
?subject1 ?p ?o .
}; ql:where {
?subject1 foaf:name "Andreas Harth" .
?subject1 ?p ?o .
} .
Keyword Search.
dc:title
the keyword computer
.
<> ql:select {
?subject1 ?p ?o .
}; ql:where {
?subject1 dc:title ?keyword .
?keyword yars:keyword "computer" .
?subject1 ?p ?o .
} .
Multiple Facets.
RDF
in dc:title
and dc:subject
equals "Query formulation"
.
<> ql:select {
?subject1 ?p ?o .
}; ql:where {
?subject1 dc:title ?keyword .
?keyword yars:keyword "RDF" .
?subject1 dc:subject "Query formulation" .
?subject1 ?p ?o .
} .
Specifying Results.
"Andreas Harth"
knows.
<> ql:select {
?object1 ?p ?o .
}; ql:where {
?subject1 foaf:name "Andreas Harth" .
?subject1 foaf:knows ?object1 .
?object1 ?p ?o .
} .
Object Nesting.
"Andreas Harth"
knows and that are interested in DERI
.
<> ql:select {
?object1 ?p ?o .
}; ql:where {
?subject1 foaf:name ``Andreas Harth'' .
?subject1 foaf:knows ?object1 .
?object1 foaf:interest ?keyword .
?keyword yars:keyword ``DERI'' .
?object1 ?p ?o .
} .
Please observe that the queries return all RDF triples, that is, the language has closure in a sense that results of one query can serve as input of another query. In addition, only dealing with RDF graphs facilitates displaying the results.
We have identified a subset of RDF queries that can be expressed in graphical notation. We believe that the notation is simple enough to enable end-users to graphically construct sufficiently complex queries.
We intend to verify our notation by implementing a user interface that allows users to formulate the queries presented in the poster. The graphical representations shown here are encoded in Scalable Vector Graphics (SVG) format. We plan to include in the system client-side scripts to be able to interactively compose queries within a Web browser.