Integrating multiple applications into the ANSWER system with XML

Junbiao Zhang, Maximilan Ott
C&C Research Lab, NEC, Princeton, NJ 08540,USA
{junzhang, max@ccrl.nj.nec.com}


Introduction

ANSWER [1] is an active network[2] based information system capable of semantic routing.  Information contents and user queries are injected into the ANSWER system and encapsulated into active packets, which are then processed and  routed by the ANSWER backbone according to their content or interest specifications.  These packets may include customized decision making routing codes tailored to individual applications. In most cases, these codes are composed of calls to the library functions pre-installed in the ANSWER network nodes, and thus are quite succinct. ANSWER is envisioned to be an enhancement to the current information discovery model in the world wide web. A wide spectrum of applications ranging from e-commerce transactions to software distributions can be supported by the ANSWER system and new applications can be easily added into the system.  The strength and flexibility of the system come from the programmability of the underlying network and the ontology based semantic structuring of the application data.

Simply speaking, an ontology[3] defines a common vocabulary that may  be shared by a group of software agents communicating with each other in a consistent way without sharing a common knowledge base.  In the ANSWER system, each application has its own ontology structure which is shared among all the instances of the application. The ANSWER system integrates and manages a multitude of  application ontologies by maintaining ontology trees at its network nodes. Ontology trees from different interfaces of a network node may be merged and forwareded to its neighbors. A generic framework is provided by the system to query and process the tree nodes. However, it is up to the applications themselves to define the attributes inside each node and specify the way in which these attributes can be operated upon. Once these are defined, they should be applied to all the instances of the application, i.e. the data set belonging to the application will be processed by the ANSWER system consistently across all the instances of the application. Further, these instances should be shielded from any details of the ANSWER system and only need to be concerned with a uniform and easy to use format specific to the application.

With all these constraints, an important issue in the system design is how applications of the ANSWER system specify their application data sets to the system. We choose to use XML[4] as the interface language for this purpose. In this poster, we explain how XML is used to seamlessly integrated multiple applications into the ANSWER system.

ANSWER semantics and XML

We use XML as the data integration language in ANSWER based on the following considerations:

To clarify the last point, we shall first explain what we mean by ANSWER semantics. As noted earlier, the core of the ANSWER operation model is the ontology trees.  Although there are generic rules in searching and merging ontology trees,  special rules are required when handling application specific data attributes in the ontology tree nodes. For example, when merging several ontology trees belonging to an e-commerce application, do we create a new node when the same type of node appears in one of the trees, any of the trees, or all of the trees? Suppose we are creating a new node from two nodes, how do we initialize the attributes in the new node? Do we add the attribute values together, or take the maximum of the two, or their average? These are the kind of semantic information that an ANSWER application must provide in order for its data to be handled properly.  We can see that such information can be expressed  through XSL. However, since they are simple enough, for our prototype implementation, we opt to embed them in the DTD of the application through the use of hidden attributes, i.e. the #FIXED attribute default.  When users inject application specific XML data into the ANSWER system, the ANSWER processing module will be able to extract these hidden attributes and  analyze the semantic information contained in them.  To better explain this process, we shall use an example e-commerce application and show its DTD as well as a simple XML data instance.  Due to lack of space, we will only show a small portion of the DTD which describes TV products:

.....
<!ELEMENT TV (SET+) >
.....
<!ELEMENT SET EMPTY >
<!ATTLIST SET BRAND      CDATA #REQUIRED
              SIZE       CDATA #REQUIRED
              PRICE      CDATA #REQUIRED
              COUNT      CDATA #REQUIRED
              MERGESTYLE CDATA #FIXED "ANY"
              ATTRDEF    CDATA #FIXED
                         "BRAND:KEY;SIZE:IGNORED;PRICE:MINMAX;COUNT:SUM" >
....
and a sample XML instance,
    .....
    <TV>
       <SET BRAND="RCA" SIZE="27" PRICE="299" COUNT="10"/>
       <SET BRAND="SONY" SIZE="20" PRICE="259" COUNT="15"/>
       <SET BRAND="ZENITH" SIZE="50" PRICE="1299" COUNT="4"/>
       <!-- etc. -->
    </TV>
    .....
    Let us take a look at the attributes of the "SET" ELEMENT and explain some of the semantic definitions used there. Note that all but two of the  attributes are regular attributes used in the XML data. The two hidden attributes "MERGESTYLE" and "ATTRDEF" define the semantic behaviors of the node and all the attributes in the node. For example, they specify that "BRAND" is the primary key in the ontology node. When merging ontology trees, as long as any of the trees contains a "SET" node with a certain key, a new node is created in the resulting tree. Also, when merging  several "SET" nodes with the same "BRAND" field, their "COUNT" fields are aggregated additively while the "SIZE" field will be omitted from the new node.  The aggregation of the "PRICE" fields is quite different. Informally, it takes the lower and upper bounds from the two "PRICE" fields and use them as the new range .  A set of aggregation methods are defined by the ANSWER system and this set can be flexibly extended by the applications through the use of active instructions of the underlying active network.

When an XML data set such as the one above is submitted to the ANSWER system, the attribute information are used to translate the data into ontology nodes and some of the information are stored in the resulting nodes to control the processing of these nodes. We have explained how tree merging is controlled by these attributes. In fact, they can also be used in controlling the searching behaviour of the query packets. For example, at an ANSWER network node, a query packet looking for a "ZENITH" television set less than $400 will search the "BRAND" fields of the "SET"  tree  node and may use "COUNT" as a metrics  in making its routing decisions.

Staus and future work

Currently, we are finalizing the details of the semantic definition used inside the ANSWER DTDs. The ANSWER system core has been completed and work is underway to build the ANSWER XML processing module. Various applications are planned to be tested under the new ANSWER/XML framework. These include: web content query and distribution, e-commerce applications, technical paper search and software distributions.

References

[1]    Junbiao Zhang, Maximilian Ott, "ANSWER: Information Routing Based on Active Networks",
              The 5th Asia Pacific Conference on Communications, Beijing, China, October, 1999

[2]    David L. Tennenhouse , Jonathan M. Smith, W. David Sincoskie, David J. Wetherall, Gary J. Minden,
              "A Survey  of Active Network Research", IEEE communications, Vol. 35, No. 1, pp80-86, January 1997

[3]    T.R. Gruber, "Toward principles for the design of ontologies used for knowledge  sharing",  Padova workshop on
              Formal Ontology, Mar, 1993

[4]    W3C, "Extensible Markup Language(XML) 1.0"