The explosion of the number of Web pages also leads to countless accessible multimedia objects. Their abundance makes the Internet an interesting application for multimedia retrieval systems. Many search engines are going about to supply some retrieval functionality for independent retrieval of these objects. However, most of these multimedia search engines aim at a fixed set of multimedia index attributes. The Acoi system [1] provides an extensible framework for retrieving multimedia objects of any type on basis of their content, based on both low-level features and high-level concepts, and context.
In the following sections, which describe different aspects of the system, this example grammar is used as an illustrative example:
%atom str url, content_type, title, section, word, alt;
%detector web_header(url); %detector page_type select true from web_object where content_type = "text/html"; %detector web_page(url);
%start web_object;
web_object : url web_header web_body?; web_header : content_type; web_body : page_type web_page; web_page : title? anchor*; anchor : web_object section? alt? word*;
Feature detectors are used to build a semantically rich index entry for the original multimedia object. They do this on two different levels:
web_header
detector sends a HTTP HEAD request to the specified HTTP server and extracts the content type from the respone.page_type
detector uses the content type to determine if an object is a page.
In the general case blackbox detectors will derive low-level feature data, e.g., the color distribution of an image. But they can also be used for more complex tasks, like finding a face in an image. The function of whitebox detectors is to relate low-level features to concepts, e.g., an image is a portrait because its color distribution classifies it as a photo and it contains exactly one face.
Feature Grammars
The foundation of the whole Acoi system is formed by the concept of feature grammars. A feature grammar is basically a context-free grammar extended with active non-terminals, i.e., the different types of detectors. The grammar plays the following roles in the system:
page_type
detector depends on the web_header
detector.web_objects
are related to a web_page
in the context of anchors
.web_object
table will contain an url
attribute and two foreign keys: one to a web_header
and, optionally, one to a web_body
.Multimedia retrieval is not yet a solved problem (and may never be), so the index should be easily extensible with new feature detectors. Feature grammars are quite easy to extend: just add new rules. The parser can then do an incremental parse: it takes a persistent stored parse tree and calls the new detectors to extend the branches. Example: the example grammar could be extended with these rules to support content-based retrieval for images:
%atom int width, height, depth, color, frequency;
%detector image_type select true from web_object where content_type = "image/gif"; %detector web_image(url);
web_body : image_type web_image; web_image : width height depth histogram; histogram : color* frequency;
The incremental parse would try to prove the validity of this new web_body
alternative and on success add new indexing information for images to the database.
System Architecture
The following image shows the current Acoi system architecture.
The Feature Detector Engine is the parser generated from the feature grammar. The parse trees it produces are stored in Monet, an extensible main memory database system. Queries, in a SQL-like syntax, are processed by the Feature Query Engine. XML is used as exchange format between the different tools, and XSL(T) is used to transform the XML document when another (proprietary) format is needed.
References