A Service-Based Approach to Semantic Mapping and
Data Access Across Diverse Resources

Kenneth J. Laskey
Science Applications International Corporation (SAIC)
4001 Fairfax Drive, Suite 300
Arlington, Virginia USA 22203
01-703-276-4804
kenneth.j.laskey@saic.com

ABSTRACT

A set of services are defined to facilitate data access and semanticnegotiation between data sources based on independentvocabularies. The services separate the mapping function betweenvocabularies from the retrieval function for a data source andthese from the actual transformation mechanism between sets ofdata elements from the vocabularies. The services are designed toleverage existing efforts and products generated as part of atypical integration, with a goal of maximizing the visibility of andflexibility to substitute between semantic relationships whileminimizing additional work needed to implement such a system.

Keywords

data access, semantic mapping, service architecture

1. INTRODUCTION

An application developer creates the algorithms and dataprocessing necessary to accomplish a task and then the next leveldeveloper or user must satisfy the implied or documentedinterfaces through which the capabilities of the application can beutilized. Concentrating on the data access process (although theconcepts can be extended to data write and direct communicationsbetween components), if we refer to the data user as the target andthe source is the resource that supplies information which whenprocessed will satisfy the target, then in order to accessinformation to support an application a data user must define

o the data needs in the target vocabulary,

o a source which can provide values to meet the data needs,

o the specific source data elements (in the source vocabulary)whose values are needed,

o the means to retrieve these data element values from the source,

o the processing which the source values must undergo togenerate the required target values.

While in theory this process is made easier by a commonvocabulary, in practice if the source and target were developedindependently, the negotiation process is initially done manuallyto ensure that the semantic details are consistent at both ends. Thefollowing describes an architecture which is designed to enableintegration of components and data sources which are described interms of multiple, overlapping vocabularies and to demonstrate amodular use within a service-based paradigm. The goals of thearchitecture are (1) to enable transparent access to content withoutrequiring user processing of semantic interchange or the specificsof content access, (2) to accomplish the access using mechanismswhich are externally visible to inference engines, and (3) toimplement the access in a way which provides value to the userwith a minimum amount of additional effort.

2. DATA ACCESS AND SEMANTIC MAPPING SERVICES

The following assumes that the integration steps outlined aboveare routinely done and the goals of an improved process are tominimize the effort to make data connections while maximizingthe amount of interface details which can be consistently reusedand captured for use by external inference engines. Thecomponents and their corresponding data flows are shown inFigure 1 and described in the following subsections.

Process flow forsimple data access

Figure 1. Process flow for simple data access

2.1 Definitions

As a precursor to component descriptions, several terms aredefined to establish concepts which are used to describe theworkings of the basic system shown in Figure 1.

Data user: an entity (human, machine, software) which has a needfor data and a context in which that data will be used. Data usersmay share common vocabularies or portions of vocabularies ormay have independent vocabularies whose data elements arerelated through semantic associations.

Semantic Association: a relationship between named dataelements, especially data elements from different vocabularies. Inits simplest form (and the one illustrated in Figure 1), a semanticassociation can define arithmetic and/or logical operations to beperformed on the values corresponding to one set of names inorder to generate values corresponding to a second set of names.The processing which makes up a semantic association is notconstrained to being one-to-one equivalences between dataelements, and may include algebraic, conditional, and/orprobabilistic relationships. The association is the single placewithin the architecture where domain semantic relationships areexplicitly defined. Note, in the following, the use of "association"refers to semantic association unless otherwise specified.

Metadata: that set of descriptive properties which (1) uniquelycharacterize an object and allows a user (human or machine) todiscriminate between one object and another, and (2) describehow the object and its contents can be accessed in either a read orwrite mode. Metadata includes what the object is, where it islocated, and how to make use of it. It may include the callingargument to methods which act on the content of an instance ofthe object, including accessing it from its native storage format.

2.2 Components

The Data Access Service is the central component in mediatingand coordinating a data access request. The service (1) takes asinput a list of names corresponding to target entities for whichvalues are needed, (2) callsupon the Semantic MappingService to identify (one ormore) associations to (one ormore) data sources and thecorresponding data sourceelements needed to generatethese values, (3) retrieves dataelement values from the datasource, (4) invokes theprocessing of the associations(including passing of dataelement values retrieved fromthe data source) to generatevalues for each name on thetarget request list, and (5)returns the target values to theentity which invoked the DataAccess Service. The DataAccess Service will invokeData Access Methods toextract values fromcorresponding data sources foruse as input to theassociations.

The Semantic Map is arepository of semanticassociations between named data elements. It is assumed that avalue information source (identified by a source identifier and theappropriate data elements at that source) is processed by anassociation to generate values for some target set of data elements(identified by a target identifier and the appropriate data elementsof the target). In general, this can be represented as

association
(source:source_elements) --------------> (target:target_elements)

For the simple data access case currently being described, thetarget and target_elements are the search criteria and the source,source_elements, and the association are the return values. Inother uses of the Semantic Map information, other combinationsof entities in this relation can be supplied as search criteria withthe remaining entities comprising the return information. The dataelement names themselves are not required to convey anysemantic content and the Semantic Map knows no details of anyassociation, other than that one exists. Associations (or pointers toassociations) are information inputs to the Semantic Map, not apart of its structure or implementation. Thus, associations and theassociated data sources can be created or modified withoutrequiring modification of the Semantic Map infrastructure itself.

The use of the Semantic Map allows each data user and each datasource to maintain its own vocabulary as best suited for its ownobjectives. By identifying "who I am" (see Figure 1), the data userspecifies which vocabulary (or for XML purposes, whichnamespace) is of interest, and by specifying "what I need", thedata user identifies specific data elements within that vocabulary(namespace). At this point, the Semantic Map will not containinformation about a data source's structure, only the names of thedata elements which map to other data sources and thecorresponding names of data elements at those other sources. (Seebelow for discussion of the Data Access Method as relates toknowledge of the data source structure.) For the Semantic Map, itmakes no difference if the mapping is to a single object model ora dozen. The association maintains considerable flexibility inrelating different vocabularies and encapsulates the changesneeded to capture modifications to vocabularies or data fieldrelationships.

A Data Access Method is an executable corresponding to a datasource and which extracts data values from the data source. AData Access Method understands the means by which data valuesare stored and/or generated at the data source and can accomplishaccess of values for a given list of data elements known to thedata source. It is assumed that the Data Access Method takes asits arguments a list of data element names and returns valuescorresponding to those names. The actual processing may be asimple SQL call to a database or it may be significant processingand authentication to satisfy access privileges. In any case, theaccess is accomplished without any knowledge of how theaccessed values will eventually be used, unless this is required aspart of user authorization.

3. CONTINUING WORK

An initial prototype of the Data Access Service and SemanticMap has been built and it adequately demonstrates feasibility ofthe concepts. Design has proceeded for dealing with multiplemappings, multiple Data Access Methods, and multipleassociations, and the concepts have also been extended to datawrite scenarios. A next generation prototype is being plannedwhich incorporates independent variables for use in differentiatingspecific data element values (e.g. which database record) withinthe data source and use of IDs to indicate user preferences in datasources, semantic mappings and Data Access Methods. Work isalso planned for investigating whether the architecture has anylimitations when dealing with complex data types. Finally,preliminary design has looked at using the Semantic Map tofacilitate component-to-component communications when thecomponents have their own respective vocabularies.