Towards a Multimedia World Wide Web Information Retrieval Engine

Sougata Mukherjea, Kyoji Hirata and Yoshinori Hara
C&C Research Laboratories,
NEC USA Inc.
sougata@ccrl.sj.nec.com, hirata@ccrl.sj.nec.com, hara@ccrl.sj.nec.com

Abstract

Search engines are useful because they allow the user to find information of interest from the World Wide Web. However, most of the popular search engines today are textual; they do not allow the user to find images from the Web. This paper describes a search engine that integrates text and image search. One or more Web sites can be indexed for both textual and image information, allowing the user to search based on keywords or images or both. Another problem with the current search engines is that they show the results as pages of scrolled list; this is not very user-friendly. Therefore our search engine allows the user to visualize the results in various ways. This paper explains the indexing and searching techniques of the search engine and highlights several features of the querying interface to make the retrieval process more efficient. Examples are used to show the usefulness of the technology.

1. Introduction

With the explosive growth of information that is available through the World Wide Web, it is becoming increasingly difficult for the users to find the information of interest. Therefore, various searching mechanisms that allow the user to retrieve documents of interest are becoming very popular and useful. However, most of the popular search engines today have several limitations:

The search engines are textual; given one or more keywords they can retrieve Web documents that have those keywords. Although most Web pages have images, there is no easy way for the user to specify an image and retrieve Web documents containing images similar to it.
The retrieved documents are shown as pages of scrolled lists. If lots of documents are retrieved, this is not very user-friendly.
Although the search engines retrieve the documents that match the user-specified keywords, there is no indication about the position of the documents in the Web site from which they were retrieved.

We have implemented a search engine, Advanced Multimedia Oriented Retrieval Engine (AMORE), that tries to remove these limitations. First, it integrates image and text search. Web sites can be indexed for both textual information and images contained in and referenced from the HTML documents. Like other search engines, our engine allows the user to search for Web documents based on keywords. Additionally, the user can also retrieve Web documents containing images similar to an user-specified image. The image and text search can be also combined by and or or. The retrieved documents are displayed to the user sorted by the number of keyword-matched lines in a scrolled list. The user can also view the retrieved images sorted by their similarity to the selected images. To enable the users to better understand the search results, we also provide various types of visualizations of the results. Moreover, we have developed a new kind of visualization called the focus+context view of Web nodes which helps the user to understand the position of the node in the Web site from which it was retrieved.

Our overall objective is to develop a multimedia information retrieval engine for the World Wide Web. The engine should integrate browsing and querying, allowing users to find the information they want using querying by successive refinement. The user should be able to search for documents and other media objects of interest. If lots of information is retrieved, browsing through the results using the visualizations will be helpful. On the other hand, if very little information is retrieved, views like the focus+context view will allow her to look at other related information. Then the user may issue a more appropriate query.

Section 2 surveys related work. Section 3 explains the indexing and querying mechanisms. Section 4 presents examples to illustrate the use of the search engine. Section 5 shows how visualization can be utilized to better understand the retrieved results. It also describes the focus+context view of Web nodes. Finally, section 6 is the conclusion.

2. Related Work

There are many popular Web search engines like Lycos [18] and Alta Vista [13]. These engines gather textual information about resources on the Web and build up index databases. The indices allow the retrieval of documents containing user-specified keywords. Another method of searching for information on the Web is subject-based directories which provide a useful browsable organization of information. The most popular one is Yahoo [20] which classifies documents manually and supports content-based access to the collection of documents gathered from either users' submission or Web robots. However, none of these systems allows image search.

Searching for similar images from an image database is a developing research area. Some popular image retrieval systems like QBIC [4] and Virage [1] have demonstrations running on the World Wide Web. These systems retrieve all images from an internal database that are visually similar to a user-specified image. The similarity is determined by properties like color and shape. QBIC integrates keyword search also. However, finding similar images from any arbitrary Web site is not possible. Yahoo's Image Surfer [17] has built a collection of images that are available on the Web. The collection is divided into categories (like automotive, sports, etc), allowing the users to find images that match the category keywords. However, HTML document search is not integrated. Moreover, although the user can retrieve images that are similar to a selected image, the image-searching mechanism is quite simplistic - it is based on just color histogram. Webseer [5] is a crawler that combines visual routines with textual heuristics to identify and index images on the Web. The resulting database is then accessed using a text-based search engine that allows users to describe the image that they want by using keywords. This approach of retrieving images is complementary to our approach. In AMORE the user can specify an image and retrieve all images similar to it. Of course, the user can also specify keywords to retrieve Web documents (and the images in them).

While none of the current search engines support visualization of the results, several visualization systems for information retrieval have been developed in recent years. Examples include Infocrystal [12], an extension of the Venn-diagram paradigm, which allows the visualization of all possible relations among N user-specified Boolean keywords, Tilebars [6], a visualization method for term distribution in Full Text Information Access and Butterfly [8], an information visualization application for accessing Citation databases across the Internet. These systems show the usefulness of visualization for better understanding of the results of search engines. Another interesting approach is the WebBook [3], which potentially allows the results of the search to be organized and manipulated in various ways in a 3D space.

3. Implementation Details

Implementing a search engine involves two phases: indexing and querying. Several systems exist that allow full-text indexing of Web sites. The indices allow the user to search the indexed sites using any combination of keywords. As stated in the previous section, systems that can index images in a database based on various image characteristics like shape and color, thus allowing the retrieval of images in the database that are similar to a user-specified image are also being developed. In AMORE we have integrated these into two kind of systems in a unique way. This allows the retrieval of Web documents containing one or more keywords and/or images similar to an user-specified image. While we use the Harvest Information Discovery and Access System [2] for text indexing and searching, we use the content-oriented image retrieval (COIR) library [7] for image retrieval. This section explains the indexing and querying techniques.

Figure 1. The indexing mechanism in AMORE. Harvest is used for indexing the data servers, a Web mirroring software downloads the images and then COIR indexes the images.

3.1 Indexing

Figure 1 describes the indexing mechanism. In the directory server, a Harvest gatherer is used to gather and index textual information from the user-specified sites. The user is allowed to specify one or more sites by listing their URLs. Moreover, Harvest allows the indexing of only a section of a site by various mechanisms (for example, limiting the URLs that are indexed by using regular expressions). After Harvest has finished gathering, information on the images that are contained in and referenced from each HTML page is available from the gatherer's database in the directory server. A PERL program extracts this information and creates a URL-image table. The images are then downloaded to the directory server using a Web mirroring software. Then COIR is used to index all these images. COIR uses a region-based approach and uses attributes like color, texture, size and position for indexing. The whole indexing procedure is automated requiring the user to just enter some required information (like the URLs of the sites that are to be indexed). Note that while the Harvest gatherer may take a long time for indexing large sites, creating the URL-image table and indexing the images take a comparatively shorter time. COIR requires about 5 seconds to index an image. Also note that the images can be removed from the directory server after the indexing if there are space constraints.

Figure 2. The querying mechanism in AMORE. The query may contain both keywords and an image. Harvest and COIR are used for retrieving the documents matching the specified query. Visualization of the search results are possible.

3.2 Querying

After the text and image indices have been built into the server, users on any client can issue queries. Figure 2 explains the querying phase. The queries can contain one or more keywords or an image. The text and image search can also be combined by and or or. The user can specify the image in various ways: for example, by specifying its URL or by drawing a rough sketch. The Harvest broker is used to retrieve documents that contain the user-specified keywords. The broker provides an indexed query interface to gathered information. To accommodate diverse indexing and searching needs, Harvest defines a general broker-indexer interface that can accommodate a variety of search engines. We are using glimpse as our search engine.

COIR is used to retrieve images available on the indexed Web sites that are similar to the user-specified image. Similarity is determined by both color (the general color impression of the image) and shape (the number of regions in the images that correspond to each other and the similarity of shape and position between the matched regions). Each indexed image is given a match rate which is a combination of its shape and its color match rate. Images with a total match rate that is greater than a threshold value are selected and the documents containing them are retrieved. The user has the option of changing the threshold value to control the number of images that are retrieved. She can also change the color-shape ratio which determines the importance that is given to color and shape similarity.

If the text and image search are combined by and, only documents that are retrieved by both the text and image search are displayed to the user; otherwise all retrieved documents are displayed. The retrieved documents are displayed sorted by the number of keyword matched lines. The user can also view the retrieved images sorted by the total match rate. Various ways of visualizing the search results are possible. These issues are discussed in the next two sections. Note that although a precise performance analysis of the querying phase is not possible because of potential network delays, we have found that image search is faster than text search and the overall performance is quite satisfactory.

4. Retrieval Scenarios

We have indexed several Web sites allowing us to query these sites based on both images and keywords. This section presents some sample retrieval scenarios from two Web sites to show the usefulness of the technology.

4.1.1 Georgia Tech College of Computing Web Site [16]

Figure 3. The right hand side shows the interface to the search engine. The user can specify keywords and the URL of an image. Various search options can also be changed. Here the user is trying to retrieve the homepage of the first author from the Georgia Tech College of Computing Web server. The search result is shown on the right.

4.1.1 Searching for a particular document. One common use of a search engine is to find someone's homepage. If we just use the person's name as keyword, sometimes lots of other pages are also retrieved. The person's photo can be used to reduce the number of irrelevant documents. The left-hand side of Figure 3 shows the interface to the search engine. The user is trying to retrieve the homepage of the first author of this paper from the Web site (he was a student there) using his name"sougata'' as the keyword and the URL of his photograph which is stored in the client site. Notice that there are various search options (like the color-shape ratio) that can be changed. The query retrieves the required homepage as shown in the right-hand side of Figure 3.

4.1.2 Searching for a class of documents. AMORE is also useful for gaining an understanding on a particular topic from a Web site. For example, suppose the user wants to find information about all information visualization projects in the college. Since many project descriptions may not explicitly use the words "information" and "visualization," using only the keywords may not be enough. Moreover, the user may want to also retrieve images of the screen dumps of the projects if they exist on the Web. An image of a window dump of an information visualization project may be helpful in this case. The left-hand side of Figure 4 shows the result of searching the Web server combining a text search (by the keywords "information visualization") and an image search (with an image of a window dump of an information visualization project) by "or." The retrieved documents are sorted by the number of keyword-matched lines. The user can also look at the retrieved images as shown in the right-hand side of Figure 4. They are sorted by their similarity to the given image. All of the images are window dumps of similar projects. (Note that only the first few documents and images are shown in this figure).

Figure 4: The left-hand side lists the documents retrieved by a text search with keywords "information visualization" combined by "or" with an image search of an image of a window dump of an information visualization project. The right-hand side shows the retrieved images.

4.2 Walt Disney Web Site [19]

4.2.1 Media-based Navigation: The user may select a sample image from the site and retrieve all other images similar to it. The interface for choosing the image is shown in the left-hand side of Figure 5; some images from the site are randomly selected and shown to the user. Here the user is trying to retrieve pictures of scenes from Walt Disney movies. Therefore the keyword "movie" and a similar image, that of a movie scene, is used. Some movie clips are retrieved as shown in the middle of Figure 5. The user can click on one of these images and retrieve more similar images. The images retrieved by the second query are shown on the right-hand side of Figure 5. This type of navigation where the user uses the same type of media for a query as the media they want to retrieve is known as media-based navigation [7]. Note that in this case the use of keywords in combination with the media-based navigation allowed the filtering of pictures similar to the query image from the image-processing point of view but which do not have anything to do with movies. (Although some irrelevant pictures may still be retrieved because the documents containing them haver the query keyword, the combination of text and image search helps to reduce the number of irrelevant documents that are retrieved).

Figure 5. Example of media-based navigation. The user is searching for movie clips from the Walt Disney Web site by using a similar image. Text search is also combined with the image search (by and).

4.2.2 Rough Sketch Input. Suppose an user wants to retrieve images of Mickey Mouse from the Disney Web site. If no picture of Mickey Mouse is available, the user can draw a rough sketch to retrieve the pictures. The rough sketch interface is implemented using Java and shown in the left-hand side of Figure 6. The right-hand side of the figure shows the images that are retrieved for the given rough sketch. Some images of Mickey Mouse are retrieved as the user wanted.

Figure 6. The left-hand side shows the interface for inputting a rough sketch. The right-hand side shows some images that are retrieved for the given rough sketch.

5. Visualizations of the Search Results

Unfortunately, most popular Web search engines do not provide any visualization; the retrieved records are displayed as one or more pages of scrolled lists. If many records are retrieved, scrolling through such lists is tedious. Visualization may be useful in this case. Visualization is especially useful for multimedia search because the searching is based on various criteria like the number of keyword-matched lines for text search and shape and color similarity for image search. Showing how the retrieved documents compare based on the various search criteria in a single screen is useful. Therefore, we provide various kinds of visualizations of the results of the search. The visualizations are generated from the search results on the fly by a perl program which generates Virtual Reality Modeling Language (VRML) code. They can be viewed in a browser that supports VRML. We have used SGI's Cosmo Player [15] both on a SGI workstation and a PC as the VRML browser. Various kinds of visualizations are available and some significant ones are discussed in this section.

Figure 7. A scatter-plot visualization of the search results. Shape and color similarity are mapped to the X and Y axis. Documents with same images are arranged in the Z axes. Size is mapped to keywords. The left-hand side is the overview. In the right-hand side the user has zoomed in to the document with the maximum keywords.

5.1 Scatterplot Visualization

The left-hand side of Figure 7 shows a scatter-plot visualization of the results of a search of the Georgia Tech College of Computing server. The search criterion was the one used for Figure 4. In this visualization a cube represents each retrieved image. The shape-match rate (how well the image matches the selected image based on just shape) has been mapped to the X axis and the color-match rate to the Y axis. Documents with no similar image are also shown; they are assumed to have the minimum values for x and y and are shown at the bottom right. In many cases, the same picture (for example logos of organizations) exists in many Web pages. Since all these images will have the same values for x and y, they are arranged in the Z dimension. The sizes of the cubes indicate the number of keywords in the corresponding documents and the colors represent the topic of the pages (the topic is determined from the URL; for example, a document with URL www.cc.gatech.edu/gvu/.... has topic gvu). The cubes also serve as links; moving the cursor over them displays the URLs of the corresponding documents and clicking on them retrieves the actual documents. The user can change the bindings between the visual and information attributes in a Forms interface.

The visualization gives the user a better understanding of the search results. The user can see how the retrieved documents compare based on the various criteria in the same screen without having to switch between two screens, one sorted by the text search criteria and one by image search criteria (as shown in Figure 4). Moreover, how well the retrieved images match the given image based on just shape or color is evident. Obviously, documents on the top-left corner are very similar images having large values for both x and y. The user can use the various tools provided by the VRML browser to navigate through the space. Various viewpoints are also provided to allow the user to zoom in on different important positions in the space. For example in the right hand side of Figure 7 the user has zoomed in to the document with the maximum number of keywords and is navigating through the 3D neighborhood. Note that the actual image which the cube represents is shown only when the user comes close to a cube (since texture mapping is expensive if it is not supported by the hardware).

Figure 8. A perspective wall visualization. Each wall contains documents with similar numbers of keywords. The sizes of the cubes represent the similarity of the images in the documents to the user-specified image.

5.2 Perspective Wall Visualization

Figure 8 shows a perspective wall [9] visualization of the results of a search. Here the Walt Disney Web site was searched for documents with the keyword "toy story" or images similar to a scene from that movie. The retrieved documents are represented by cubes and then grouped by the number of keyword-matched lines and arranged on walls. The walls are sorted by the number of keywords in the documents on them. One wall is in focus at a time and the user can click on any wall to bring it in to focus. Smooth animation is used for the state changes (using Javascript in combination with VRML 2.0). The sizes of the cubes are mapped to the total match-rate (that is, the overall similarity) of the image contained in the document and shown as a texture map. If there are no images, the cubes have minimum size. On the other hand, if there are n images, n cubes are used to represent the document.

5.3 Focus+Context Views of Web Nodes

While the current search engines show the Web pages that match the user's queries, they don't give any indication of the actual position of the page in the Web site from which it was retrieved. Therefore, we have developed a technique to show focus+context views of Web documents. The view shows the details of a particular node; nodes in the immediate neighborhood, those that can directly reach and can be reached from the document are shown. While the local detail is useful, the user also needs to understand the global context. Since any real-world Web site has a large and complicated network structure, to simplify the view the paths to and from the important (landmark) nodes in the Web site are only shown. This is similar to a common geographical navigation strategy: a lost person will try to find where she is using her immediate neighborhood and important geographical landmarks. The landmarks are determined using various heuristics like connectivity (how well a node is connected to other nodes) and access frequency (how many times the node has been accessed in recent times). The procedure for determining landmark nodes and developing the focus+context views is explained in detail in [11].

Figure 9. Focus+Context view of the homepage of Sougata Mukherjea in the College of Computing Web server. It is useful in understanding the first author's position in the college.

Figure 9 shows the focus+context view of the homepage of the first author of this paper which was retrieved by the query of Figure 3. This view shows the immediate neighborhood of the node as well as the shortest paths from several landmark nodes in the College of Computing Web site (depicted by red) like gvutop. Size indicates the importance of the node in the Web locality. The view is quite useful in understanding the position of Sougata in the College. Since there is a path from alumni, it shows that he is an alumnus of the College; a path from index.grad indicates that he was a graduate student and the path from gvutop means that his research was in the gvu (Graphics, Visualization & Usability) area. Moreover, some related nodes (like the homepage of his advisor, James.D.Foley, and pages describing some projects that he had worked on, Nvb and visdebug) can be seen. Note that clicking on the node whose view is being shown (depicted by a sphere) retrieves the actual page while clicking on some other node (depicted by cubes) retrieves the focus+context view of that page. This allows the user to quickly gain an understanding of the section of the Web that is of interest.

6. Conclusion and Future Work

In this paper we have presented AMORE, a Web search engine that provides several unique features not present in the current Web search engines.

We allow the automatic indexing of both text and image from one or more Web sites. This integration of text and image search is useful in various retrieval scenarios:
- When the user is searching for Web documents, the combination of text and image search by and can reduce the number of irrelevant documents that are retrieved.
- On the other hand, the combination of text and image search by or can increase the number of retrieved relevant documents.
- The user can also search for images on the Web by specifying a similar image. By combining text search also, the user may reduce the number of irrelevant images retrieved.
Visualizations of the search results allow the user to understand the results better.
The focus+context view of a Web node helps in understanding the position of the node in the Web site from which it was retrieved.

We are currently working on the first publicly available version of AMORE [14]. In this version, several Web sites containing various categories of images will be indexed. Our technique will allow the searching of these sites based on both images and keywords.

Future work is planned in various directions. The visualizations need to be improved based on usability studies. Then we plan to make the visualizations publicly available. At present we are also working on video retrieval. The architecture of AMORE will allow the integration of video indexing and searching with minimal effort. We also want to integrate a browsing environment like the one provided by Navigational View Builder [10] into our search engine. This will allow the user to browse through a Web site of interest to see what is available. We believe that a multimedia information retrieval engine will enhance the usefulness of the World Wide Web by reducing the lost in hyperspace problem.

References

1.: J. R. Bach, C. Fuller, A. Gupta, A. Hampapur, B. Horowitz, R. Jain, and C. Shu. The Virage Image Search Engine: An Open Framework for Image Management. In Proceedings of the SPIE - The International Society for Optical Engineering: Storage and Retrieval for Still Image and Video Databases IV, San Jose, CA, USA, Feb. 1996.
2.: C. Bowman, P. Danzig, D. Hardy, U. Manber, and M. Schwartz. The Harvest Information Discovery and Access System. In Proceedings of the Second International World Wide Web Conference, Chicago, Illinois, October 1994.
3.: S. Card, G. Robertson, and W. York. The WebBook and the Web Forager: An Information Workspace for the World Wide Web. In Proceedings of the ACM SIGCHI '96 Conference on Human Factors in Computing Systems, pages 112-117, Vancouver, Canada, April 1996.
4.: M. Flickner, H. Sawhney, W. Niblack, J. Ashley, Q. Huang, B. Dom, M. Gorkani, J. Hafner, D. Lee, D. Petkovic, D. Steele, and P. Yanker. Query by Image and Video Content: The QBIC System. IEEE Computer, 28(9):23-48, September 1995.
5.: C. Frankel, M. Swain, and V. Athitsos. WebSeer: An Image Search Engine for the World Wide Web. Technical Report 96-14, University of Chicago, Computer Science Department, August 1996.
6.: M. Hearst. Visualization of Term Distribution Information in Full Text Information Access. In Proceedings of the ACM SIGCHI '95 Conference on Human Factors in Computing Systems, pages 59-66, Denver, CO, May 1995.
7.: K. Hirata, Y. Hara, N. Shibata, and F. Hirabayashi. Media-based Navigation for Hypermedia Systems. In Proceedings of ACM Hypertext '93 Conference, pages 159-173, Seattle, WA, November 1993.
8.: J. Mackinlay, R. Rao, and S. Card. An Organic User Interface for Searching Citation Links. In Proceedings of the ACM SIGCHI '95 Conference on Human Factors in Computing Systems, pages 67-73, Denver, CO, May 1995.
9.: J. D. Mackinlay, S. Card, and G. Robertson. Perspective Wall: Detail and Context Smoothly Integrated. In Proceedings of the ACM SIGCHI '91 Conference on Human Factors in Computing Systems, pages 173-179, New Orleans, LA, April 1991.
10.: S. Mukherjea and J. Foley. Visualizing the World Wide Web with the Navigational View Builder. Computer Networks and ISDN Systems. Special Issue on the Third International World Wide Web Conference, Darmstadt, Germany, 27(6):1075-1087, April 1995.
11.: S. Mukherjea and Y. Hara. Focus+Context Views of World Wide Web Nodes. To appear in Proceedings of ACM Hypertext '97 Conference, Southampton, England, April 1997.
12.: A. Spoerri. InfoCrystal: A Visual Tool for Information Retrieval and Management. In Proceedings of Information and Knowledge Management '93, Washington, D.C., November 1993.

Relevant URLs

13.: Alta Vista http://www.altavista.com
14.: AMORE http://www.ccrl.neclab.com/amore
15.: Cosmo Player http://vrml.sgi.com
16.: Georgia Institute of Technology College of Computing http://www.cc.gatech.edu
17.: Image Surfer http://isurf.yahoo.com
18.: Lycos http://www.lycos.com
19.: Walt Disney http://www.disney.com
20.: Yahoo http://www.yahoo.com

Return to Top of Page
Return to Technical Papers Index