Dynamic Construction of Federated Digital Libraries
M. Zubair, K. Maly, I. Ameerally and M. Nelson
Department of Computer Science, Old Dominion University, Norfolk VA 23592, USA
Introduction
Digital libraries (DLs) have gained acceptance in many scientific and technical disciplines. However, most of these DLs are implemented in systems and protocols specific to the discipline they support. As such, interoperability between DLs has yet to be achieved on a large scale. The challenges to interoperability are: (a) the integration should be flexible enough to allow individual digital libraries to add/modify features and at the same time give the user an impression of a single library, and (b) relocation of individual digital libraries should be transparent to users.
Federation can be achieved in three ways: modifying the existing DLs to interoperate, extracting metadata from each of the DLs and indexing it as a separate DL; or treating each DL as a separate entity and performing distributed searches. The first approach to interoperability requires digital libraries to use the same DL protocol or software suite. However, there are enough significant DL systems in use to assume that the DL community will continue to support a collection of heterogeneous systems and protocols. The second method of metadata extraction has advantages, but it assumes that metadata can be extracted and reindexed with no technical or legal barriers, neither of which assumptions are often true in our experiences. The third method creates more work for the provider of the federated digital libraries (FDLs), but allows for the inclusion of a greater number of DLs.
Several researchers have looked into the issue of interoperability amongst heterogeneous digital libraries, for example GlOSS [3], Harvest [1], and STARTS [2]. These approaches either fall in the first category where an existing DL needs to be modified to interoperate, or in the second category that is based on extracting metadata from different DLs. A similar approach to ours is described in [4]. They define a Searchable Database Markup Language (SearchDB-ML) based on XML. This approach differs in that it is targeted for Web sites that support simple search interfaces rather than libraries with support for clustering and advanced searches, and it does not support dynamic discovery and integration of a digital library in the federation. Furthermore, it does little to present the search results in a uniform manner.
In this work we propose a data driven approach where each DL is registered with a central service, and an XML description of its capabilities is stored. Going to this federation of DLs, users can then choose which DLs they would like to search, and the interface, syntax, and presentation of results are dynamically constructed depending on the profile of the target DLs. We describe our experiences in building such a FDL.
Digital Library Definition Language
The Digital Library specification is based on a Digital Library Definition Language (DLDL) that is based on XML, which is capable of describing APIs for a large number of digital libraries. The richness of mark-up tags in a DLDL is determined by the user needs or expectations from a federated digital library. The proposed XML specification is divided into three sections: 1. The information that the digital library contains. 2. The access methods of the digital library. 3. The information to be retrieved from the digital library.
Query Mapping
A major challenge in any federation approach is to present users with a single interface that takes a user's selections for various fields in the search criteria and maps them to search queries of heterogeneous libraries. In general these libraries will have different metadata, in terms of number of fields, their syntax and semantics. What is the most flexible way of specifying this mapping consistent with our data-centered architecture approach? In our approach the XML specification of a digital library contains the mapping information. It is also necessary to have a generic meta-tag specification in XML. The FDL user interface is directly related to this specification. The generic specification along with a participating digital library's behavior is used to generate the digital library XML specification that includes the mapping information. In general, the FDL will generate a query to a specific library based on the user input, generic meta-tag specification, and XML specification of the digital library. The objective of using XML also for specifiying the generic meta-tag is to keep the user interface flexible. It is possible that in future we might need to federate a digital library with more levels in the subject filter than currently supported, yet we would like to support that at the user interface level. Note that, if we do not want the extra level to be shown in the user interface, we can still integrate the digital library without any modification in the generic meta-tag specification by simply ignoring those levels.
Existing Prototype
We have built a prototype implementation of the federated DL and specified four libraries in XML: ACM, IEEE, NCSTRL, and NEEDS. Initial user feedback is extremely encouraging to continue the research of this approach in making FDL richer in terms of functionality and usability. The FDL prototype is a Java based application that has three main components: XML Parser, Retrieval Engine, and Merge Result and Presentation Engine The prototype FDL uses the Sun XML Parser. Retrieval Engine: The retrieval engine is responsible for retrieving documents satisfying the search criterian from the selected digital libraries. Please visit the Interop Demo Site to try out the FDL prototype. (To see the complete working of the prototype you will need a user name and password. Please use "interop" for username and "interop" for password.
References
- C. M. Bowman, P. B. Danzig, D. R. Hardy, U. Manber, M. F. Schwartz, and D. P. Wessels (1994), "Harvest: A Scalable, Customizable Discovery and Access System," University of Colorado Computer Science Technical Report CU-CS-732-94, August 1994.
- L. Gravano, C.-C. K. Chang, H. Garcia-Molina and A. Paepcke(1997), "STARTS: Stanford Proposal for Internet Meta-Searching," Proceedings of the 1997 ACM Sigmod International Conference on Management of Data, pp. 207-218.
- L. Gravano, H. Garcia-Molina and A. Tomasic (1994), "The Effectiveness of GlOSS for the Text-Database Discovery Problem," Proceedings of the 1994 ACM SIGMOD International Conference of Management of Data, 1994, pp. 126-137.
- J. Powell and E. A. Fox (1998), "Multilingual Federated Searcing Across Heterogeneous Collections," D-Lib Magazine, September 1998.