Metadata Mediation : Representation and Protocol
Tsuyoshi Sakata, Hiroyuki Tada, Tomohisa Ohtake
Digital Vision Laboratories
7-3-37, Akasaka, Minato, Tokyo, Japan
sakata@dvl.co.jp,
tadaa@dvl.co.jp,
otaket@dvl.co.jp
Abstract
We are developing an electric commerce mediator(ECM), a system which enables a consumer to retrieve data about merchandise sold on the WWW by designating the features of the merchandise.
In this paper we will present
(1) Multi-Schema Metadata Format (MMF),
a logical structure for metadata sets ,
and
(2) Metadata Mediation Protocol (MMP),
a protocol set for interchanging metadata, which are designed for the ECM.
1. Introduction
When a consumer is looking for a merchandise on the WWW,
he has the characteristics or the constraint of what he wants in his
mind,
such as "the color should be red", "the size should be XL",
or "the price should be less than 50 dollars".
This suggests that the retrieval service based on characteristics is
desired.
In fact, many electric commerce groups are following this way.
For example, the Electronic Commerce Promotion Council (ECOM)[1] of Japan has a working group
to establish a set of the characteristics of merchandise
for electric commerce.
However,
there is and shall be "No" standard schema of metadata which covers
all sorts of merchandise, although several schemata will be designed
and used by groups of industries and manufacturers.
Moreover, shops may add their own attribute to the original schema
in order to appeal to their customers.
This requires a retrieval system to transform retrieval formulae
based on the schemata.
To accomplish such inter-transformability of metadata,
we have designed a logical form and syntax of metadata,
Multi-Schema Metadata Format (MMF).
We have also designed a metadata-exchange protocol for agents,
Metadata Mediation Protocol (MMP), which enables distributed agents
to store metadata, to resolve retrieval formulae cooperatively,
and to maintain consistency of metadata among agents.
We are planning to start our commerce mediation service
based on MMF and MMP by April 1997.
We will provide a metadata input system, which simplifies
the effort to put metadata in MMF for the merchandise pages.
In this paper, the Multi-Schema Metadata Format will be described in
Chapter 2, the Metadata Mediation Protocol will be described in
Chapter 3 and the structure of the Commerce Mediator that we are
implementing now will be described in Chapter 4.
2. Multi-Schema Metadata Format
In this chapter, we describe the structure and syntax of the Multi-Schema Metadata Format (MMF).
In the case of Online Shopping, schema of metadata of merchandise will be diversified.
This means that the capability of transforming a schema of metadata and retrieval formulae to other schemata is required.
To accomplish this inter-transformability of metadata, we have decided to add the ontology of attributes of schemata and the machine-readable definition of the schema onto the module of the metadata instance.
We propose the following as the metadata structure, "Multi-Schema Metadata Format (MMF)". MMF consists of four parts, "metadata instance", "schema definition", "schema ontology", and "core ontology", as shown in Fig. 1.
Figure 1. The structure of MMF.
Each part may be distributed in the network, and they are linked to each other.
2.1 Metadata Instance
The metadata instance contains metadata of an object such as a merchandise.
Metadata is represented by a set of attributes and their
value.
Note that a value here may have a strucuture and/or may be coded based on
the existing coding system.
Such a structured value is represented by a set of values with
"aspects".
A value coded based on an existing coding system,
such as a currency unit or RFC1738,
is represented with a "system" identifier.
Fig. 2 describes an example of the metadata instance in HTML format.
In this example, the value for a "maker" attribute has two aspects,
"name_of_company" and "telephone_number".
The value for a "price" attribute has a value of 55 with system "USD".
The metadata instance is described in a HTML header with META-tag. The
formats of the metadata of the resources on the Internet are proposed,
as the IAFA template[2], URC[3] and SOIF[4]. Recently workshops
supported by OCLC proposed Dublin Core[5] and the Warwick
Framework[6].
Since the Warwick Framework deals with metadata in
multi-schema, we adopted the Warwick Framework as a syntax of the
metadata instance.
The extentions we have made are
- delimiter-line, and
- aspect and system described above.
When two or more merchandise are described in a HTML page,
metadata of each merchandise are separated by delimiter-lines.
"Aspects" and "systems" are introduced as the qualifier in the Dublin Core.
The syntax of MMF can be referred in
http://www.dvl.co.jp/mediator/index.html.
<meta name="-----" content="begin">
<link rel=SCHEMA.CN href=http://www.dvl.co.jp/tags/cn.scm>
<meta name="CN.category" content="()Game Playstation">
<meta name="CN.name" content="()Formula 1">
<meta name="CN.purchase_page" content="(system=RFC1738) http://www.dvl.co.jp/purchase/aaa.htm">
<meta name="CN.sample" content="(system=RFC1738) http://www.dvl.co.jp/sample/aaa.htm">
<meta name="CN.maker" content="(aspect=name)DVL">
<meta name="CN.maker" content="(aspect=tel)+81-3-5411-9800">
<meta name="CN.price" content="(system=USD)55">
<meta name="-----" content="separate">
<link rel=SCHEMA.CN href=http://www.dvl.co.jp/tags/cn.scm>
<meta name="CN.category" content="()Game Playstation">
<meta name="CN.name" content="()Rage Racer">
<meta name="CN.purchase_page" content="(system=RFC1738) http://www.dvl.co.jp/purchase/bbb.htm">
<meta name="CN.sample-1" content="(system=RFC1738) http://www.dvl.co.jp/sample/scene1.htm">
<meta name="CN.sample-2" content="(system=RFC1738) http://www.dvl.co.jp/sample/scene2.htm">
<meta name="CN.maker" content="(aspect=name)DVL">
<meta name="CN.maker" content="(aspect=tel)+81-3-5411-9800">
<meta name="CN.price" content="(system=JPY)55000">
<meta name="-----" content="end">
|
Figure 2. An example of the metadata instance in HTML header.
|
|
2.2 Schema Definition
The schema definition gives a frame of metadata which is given by
a set of attributes and their aspects.
Restrictions of "systems" may be declared in the schema definition
when necessary.
The schema definition is described in
SOIF (the Summary Object Interchange Format)[4].
An example of the schema definition is shown in Fig. 3.
We suppose that the schema definition is written by a schema designer,
who may be different from a metadata author.
For example, an expert in an industry, a mall or a shop designs schemata,
and authors of metadata select an appropiate schema from them.
Hence, the schema definition need not employ the same syntax of the
metadata instance.
@SCHEMADEFINITION { http://cm.dvl.co.jp/schema/default.scm
Schema-ontology{x}: http://cm.dvl.co.jp/ontology/default.sot
Number-of-entries{x}: 6
Attribute-1{x}: name
Description-1{x}: name of the merchandise
Attribute-2{x}: purchase_page
Description-2{x}: url of the page selling the merchandise
System-2{x}: RFC1738
Attribute-3{x}: price
Description-3{x}: price of the merchandise. it should be JPY or USD.
System-3{x}: JPY
System-3{x}: USD
Attribute-4{x}: maker
Description-4{x}: data of the maker of the merchandise
Number-of-aspects-4{x}: 2
Aspect-5{x}: name
Parent-attribute-5:{x} maker
Description-5{x}: name of the maker of the merchandise
Aspect-6{x}: tel
Parent-attribute-6:{x} maker
Description-6{x}: telephone number of the maker of the merchandise
}
|
Figure 3. An example of the schema definition. |
|
2.3 Schema Ontology
The schema ontology contains conceptual relations between attributes
in different schemata for the inter-transformability of the
metadata and retrieval formulae among schemata.
For example, suppose there are two video rental shops.
The first, shop A, selects schema A. The second, shop B, selects schema B.
The attribute
"cast" is defined in the schema A.
The value of "cast" is a list of
the names of persons playing in the video movie. On the other hand, the
attribute "leading_actress", more detailed than "cast", is defined in
the schema B.
A movie, in which Ms. Maedchen Amick plays, will be retrieved by
a retrieval formula "A.cast == Maedchen Amick" at shop A,
while it is not valid in shop B.
Conversely, the movie will be retrieved by a retrieval formula
"B.leading_actress == Maedchen Amick" at shop B,
while it is not valid in shop A.
This suggests it is desirable to
transform a retrieval formula in a schema to valid formulae in other
schemata.
In the case of searching with a retrieval formula
"A.cast == Maedchen Amick" based on the schema A, it is desirable to
rephrase the formula to the other formula "B.leading_actress ==
Maedchen Amick" to search in shop B.
Conversely, in the case of searching with a retrieval formula
"B.leading_actress == Maedchen Amick" based on the schema B,
it may be desirable to stretch to the other formula "A.cast ==
Maedchen Amick" to search in shop A.
The conceptual relation between "A.cast" and "B.leading_actress"
is described in the schema ontology to accomplish these transformation.
The relations used in the schema ontology are following two relations.
- Equality
In the case wherein two attribute means the same relation between the object and the value.
Ex. : "manufacturer" and "maker". Described as "manufacturer = maker"
- Inclusion
The relation meant by the attribute A is a special one included in the relation meant by the attribute B.
Ex. : "cast" and "leading_actress". Described as "cast > leading_actress"
In the schema ontology, attributes of the more than two schemata can be defined.
An example of the schema ontology is shown in Fig.4.
@SCHEMAONTOLOGY { http://cm.dvl.co.jp/ontology/movie.sot
Last-modified{x}: Wed, 11 Dec 1996 17:26:00 GMT
MMF-version{x}: 1.0
Description-of-schema{x}: Ontology for the movie schema
Schema-definition-1{x}: http://cm.dvl.co.jp/schema/movie.scm
Id-of-schema-1{x}: MVS
Schema-definition-2{x}: http://cm.dvl.co.jp/schema/image.scm
Id-of-schema-2{x}: PCS
Schema-definition-3{x}: http://cm.dvl.co.jp/schema/defaults.scm
Id-of-schema-3{x}: CMS
Parent-attribute-4{x}: PCS.cast
Child-attribute-4{x}: MVS.leading_actress
Equal-attribute-5{x}: PCS.title
Equal-attribute-5{x}: CMS.name
}
|
Figure 4. An example of the schema ontology. |
|
There are two processes for transforming a retrieval formula, consist
of "Rephrase" and "Stretch". "Rephrase" means a transformation of an
attribute in the formula to another attribute of equal or narrower meaning. "Stretch"
is a transformation of an attribute to a wider attribute which might have
a value designated in the formula.
The conceptional relation for the preceding example is
"A.cast > B.leading_actress".
A merchandise of
which metadata has a record "B.leading_actress : Maedchen Amick" in
shop B is adequate for the retrieval formula "A.cast == Maedchen
Amick".
This means that the formula "attribute1 == value" can be transformed to
the formula "attribute2 == value" when "attribute1 > attribute2" or
"attribute1 = attribute2".
This transformation is called "Rephrase".
On the other hand, it is unclear whether a
merchandise of which metadata has a record "A.cast : Maedchen Amick"
in shop A is adequate to the formula "B.leading_actress ==
Maedchen Amick".
Such a process of transforming the formula
"attribute1 == value" to "attribute2 == value" when "attribute1 < attribute2"
is called "Stretch".
"Stretch" is useful for a search
service to get more results when the merchandise items adequate for
the retrieval formula designated by a user is few.
2.4 Core Ontology
In the case wherein a number of schemata is a few, the relations
between schemata can be defined.
While schemata increases, such becomes improbable since
relations increase in a speed of a square.
The reason why a number of schemata increases is that
shops expand attributes or design their own schema to draw
an attention to their own shop amongst the many other shops.
Therefore we propose the core ontology as a central and standard
ontology of attributes.
An author who designs a new schema has only to define
a set of relations from new schema to the core ontology.
This avoids defining each relations from new
schemata to the other schemata in designing a new schema.
3. Metadata Mediation Protocol
3.1 Overview
We have proposed that a metadata mediator be used as means which
stores metadata and which supplies the metadata upon request.
Metadata Mediation Protocol(MMP) is utilized to achieve communication
between metadata mediators. A metadata mediator will be provided as a
stand-alone unit, a part of a WWW server, or a part of a commerce
management data base. It responds to a request for the registration,
revision and deletion of metadata and also replies queries. As shown in
Fig. 5, metadata mediators cooperate to reply queries and maintain
metadata. To achieve a cooperation, a metadata mediator needs
to inform its own capability to other mediators and to request
other mediators to transfer some kind of metadata to it. In this
chapter, MMP will be described which
enables the metadata mediators to perform their various functions.
Figure 5. Cooperation between metadata mediators.
An example of a system wherein metadata repositories cooperate to
manage metadata is Harvest[7]. In Harvest, gatherers collects
metadata, a broker compiles the metadata collected.
SOIF[4] is proposed as a format of metadata exchanged between the
broker and each gatherer. Netscape also proposes the RDM[8], Resource
Description Messages, based on Harvest. Using the RDM, the broker can
exchange metadata, schema description and taxonomy description, with
the gatherers or the other broker.
However, since our metadata mediator
works not only as a manager of metadata but also as a distributed
problem solver,
we need more functions for MMP than the RDM, such as a telling own
ability or interest to the other mediators.
MMP is used to achieve communication between metadata mediators.
KQML[9] is proposed as an inter-agent communication language. KQML
has a protocol set which enables agents to converse, thereby
cooperating to solve tasks. We referred to KQML while designing the
metadata mediation protocol. In the syntax of protocol, a message is
described in a format almost identical to SOIF and transmitted,
wrapped with http. MMP we have
designed is based on the following scenarios:
- Table 1. MMP Design Scenarios
Metadata Retrieval |
To retrieve the metadata from the receiver which accords
with the given retrieval formula. |
Metadata Manipulation |
To request the receiver to manipulate the metadata (registration,deletion, revise). |
Metadata Maintenance |
To request the receiver to transmit past or future changes in the metadata. |
Mediator Cooperation |
To inform the receiver of the features of the metadata which the sender knows. |
3.2 Messages in MMP
Messages used in each scenario are described in the following tables.
- Table 2. Metadata Retrieval
request message |
functions |
response message |
query-request |
To send the retrieval formula and request the receiver to send the corresponding metadata back to the sender. A lifetime can be given to the query-request. |
query-response |
query-cancel-request |
To cancel the query-request. |
query-cancel-response |
reply-request |
To send back the results of the retrieval requested by the query-request. |
reply-response |
- Table 3. Metadata Manipulation
metadata-registry-request |
To request the receiver to register the metadata enclosed. |
metadata-registry-response |
metadata-deletion-request |
To delete any metadata that accords with the given retrieval formula. |
metadata-deletion-response |
metadata-revise-request |
To make designated alteration on any metadata that accords with the retrieval formula. |
metadata-revise-response |
- Table 4. Metadata Maintenance
update-request |
To request the receiver to inform the sender of any registration that the receiver has accepted since designated past-time and that accords with the retrieval formula. |
update-response |
maintenance-request |
To request the receiver to inform the sender of any future changes in any metadata that the receiver has and that accords with the retrieval formula. |
maintenance-response |
maintenance-cancel-request |
To cancel the maintenance |
maintenance-cancel-response |
subscription-request |
To request the sender to register the contents of any metadata that the receiver will have and that accords with the retrieval formula. |
subscription-response |
|
unsubscription-request |
To cancel the subscription. |
unsubscription-response |
- Table 5. Mediator Cooperation
advertisement-request |
To inform the receiver of the features of the metadata the sender knows. |
advertisement-response |
unadvertise-request |
To cancel the advertisement |
unadvertise-response |
3.3 Query Brokering
Mediator A shown in Fig. 6 may need to know metadata which have
specific features. In this case, mediator A sends a query-request to
mediator B the existence of which mediator A recognizes. Mediator B
sends the query-response back to mediator A as a receipt
acknowledgement , after adding query-id
to the query-request. If mediator B
has no metadata with the specific features and if the number of
allowable hops written in the query-request, is positive value,
mediator B forwards the query-request to mediator C, based on the
knowledge that mediator C may have metadata suitable for the
query, learnt from the advertising message from mediator C. The
query-request has an attribute "from" which indicates the mediator to
which a response must be sent. The value of the attribute "from" at
the time of forwarding the query-request depends on the value of the
attribute "transport-policy" of the original query-request. If this
value is a "broker," mediator B should enter its address as the value
of "from" in the forwarded query-request. If an answer is found in
mediator C, the message flows as illustrated in Fig. 6. If the
attribute "transport-policy" of the original query-request has the
value of "recruit," the same value as "from" of the original
query-request, i.e., mediator A, enters into "from" of the new
query-request, and the message flows as depicted in Fig. 7.
|
|
Figure 6. A route of messages in "broker." |
Figure 7. A route of messages in "recruit." |
3.4 Transmission of Changes in Metadata
Each mediator can transmit to any other mediator an object in which it
is interested, in the form of a subscription request. For instance, while
mediator A is collecting metadata about a particular feature
(e.g., monochrome movies made in 1940s), it can request mediator
B to transfer such metadata to it, by using the subscription message.
Upon receiving the subscription-request, mediator B registers in mediator
A such metadata it has, by using registry message. If the date
of "time-to-kill" in the subscription message is set, until the date,
mediator B transfers registered metadata which has the feature
to mediator A.
Using a maintenance
message, a mediator can request another to transfer any changes happen
on some metadata item. Assume that mediator A has some metadata item
transferred from mediator B and mediator A needs to know any changes
on the original metadata item on mediator B. Mediator A can request
mediator B to do this with sending the maintenance message. If the
metadata designated by a maintenance-request are revised or
deleted by the data set by the "time-to-kill", the data representing
this change will be transferred from mediator B to mediator A.
3.5 Implementation
MMP is implemented on http since it has an affinity with the
request/response model of http. Content-type of http, i.e.,
application/x-mmp, is written in the body, thereby to transmit a
message wrapped with http. Each of the messages constituting MMP has
three layers as shown in Fig. 8. A message relating to metadata
manipulation, e.g., a registry message, can have a plurality of
content-layers to transfer metadata in bulk-based form. Each layer of
a message is written in SOIF. An example is illustrated in Fig. 8.
@COMMUNICATION{ -
Sender{x}: 192.168.1.1:8000
Receiver{x}: 205.120.1.1:8000
Date{x}:
}
@MESSAGE{ -
Type{x}:metadata-registry-request
Content-number{x}: 2
Expire{x}:
}
@METADATAREGISTRYBODY{ -
Schema{x}:CN http://www.dvl.co.jp/tags/cn.scm *
Content{x}:
CN.category{x}:()Game Playstation
CN.name{x}:()Formula 1
}
@METADATAREGISTRYBODY{ -
Schema{x}:CN http://www.dvl.co.jp/tags/cn.scm *
Content{x}:
CN.category{x}:()Game Playstation
CN.name{x}:()Rage Racer
}
|
Figure 8. An example of the message "metadata-registry."
|
|
4. Future Plan
4.1 Commerce Mediator
Commerce Mediator enables
consumers to retrieve data of the merchandise available at Online
Shopping sites, based on the characteristics of the merchandise
written in MMF.
Commerce Mediator will come into service in April 1997.
A schematic view of Commerce Mediator is shown in Figure 9.
|
Figure 9. The flow in Commerce Mediator |
Commerce Mediator comprises, in the most simple form, a metadata
mediator and a commerce mediation proxy. The commerce mediation proxy
functions as the gateway between the metadata mediator and a WWW
browser.
Commerce Mediator System has three ways to gather metadata.
First is the WWW page, which is managed by The Commerce Mediator Proxy,
provided for inputting metadata.
Second is the Metadata-Input System.
The Metadata-Input System is an authoring tool which simplifies
the effort of a seller to add metadata to
HTML documents which a seller describes his merchandise
and to prepare files which describe the metadata in MMF.
The Metadata-Input System will be distributed free of charge.
Third is the web robot on
the site of Commerce Mediator. The Metadata-Description-Rule
Extracting System extracts the pattern of positions in an HTML
document where the features of merchandise are described.
The Automatic Metadata-Collecting System works as
a web robot and collects metadata from shops' HTML pages,
based on the metadata-description rules
extracted by the Metadata-Description-Rule Extracting System.
5. Conclusion
We have designed the metadata format(MMF) and the metadata mediation
protocol(MMP),
both are used in Commerce Mediator which enable consumers to
retrieve data about the merchandise available at the WWW Online Shopping
sites.
MMF has the metadata as well as the description of the relationship
between attributes and the description of attribute ontology,
which insures
the inter-transformability between different schemata.
MMP enables data communication among distributed
metadata mediators. The protocol makes it possible to register,
modify, delete and retrieve metadata at any one of the metadata
mediators. The metadata can therefore be managed in the same way at
all metadata mediators. As a result, the metadata mediators can
cooperate with one another.
The detailed specifications of MMF and MMP can be obtained from http://www.dvl.co.jp/mediator/index.html.
6. Acknowledgment
I would like to thank Hiroyuki Suzuki-san for his invaluable comments
on earlier drafts of this paper.
7. References
- Electronic Commerce Promotion Council (ECOM), (homepage), http://www.ecom.or.jp/eng/index.htm
- Deutsch P., Emtage A., "Publishing Information on the Internet with Anonymous FTP", http://info.webcrawler.com/mak/projects/iafa/iafa.txt
- Daniel R, "An SGML-based URC Service", http://www.nlc-bnc.ca/ifla/documents/libraries/cataloging/metadata/urc3.txt
- Hardy D., Schwartz M., Wessels D., "Harvest User's Manual", http://harvest.transarc.com/afs/transarc.com/public/trg/Harvest/user-manual/
- Weibel S, Godby J, et al., "OCLC/NCSA Metadata Workshop Report", http://www.oclc.org:5046/conferences/metadata/dublin_core_report.html
- Lagoze C, Lynch C, et al., "The Warwick Framework A Container Architecture for Aggregating Sets of Metadata ", http://cs-tr.cs.cornell.edu:80/Dienst/Repository/2.0/Body/
ncstrl.cornell%2fTR96-1593/html
- Bowman C., Danzig P., et al., "Harvest: A Scalable, Customizable Discovery and Access System", Technical Report CU-CS-732-94, Univ. Colorado(1995)
- Hardy D., "Resouce Description Messages (RDM)", http://www.netscape.com/people/dhardy/rdm.html
- Finin T., Weber J., "Draft Specification of the KQML Agent-Communication Language", http://www.cs.umbc.edu/kqml/kqmlspec/spec.html
Return to Top of Page
Return to Technical Papers Index