Syracuse University,
School of Information Studies
Syracuse, NY, U.S.A.
ssutton@ericir.syr.edu
The four major objectives addressed by the GEM project were to: (1) define a semantically rich metadata profile and domain-specific controlled vocabularies necessary to the description of educational materials on the WWW; (2) develop a concrete syntax and well-specified practices for its application using current HTML specifications; (3) design and implement a set of harvesting tools for retrieving the metadata stored as HTML meta tags; and (4) encourage the design of a number of prototype interfaces to GEM metadata.
From the outset, GEM developed around emerging standards for networked information discovery and retrieval (NIDR). The Dublin Core Element Set (DC) became the base referent for the GEM element set. One of the underlying assumptions of the DC founders was that it would be extensible in two fundamental ways: (1) additional elements could be added to meet the needs of particular domains, and (2) its elements could be enriched through the use of a broad range of qualifying schemes and types (the Canberra Qualifiers [6]). The GEM element set is an example of both of these extensions.
A GEM package of 8 elements was added to the 15-element DC package: (1) Audience, (2) Cataloging Agency, (3) Duration, (4) Essential Resources, (5) Educational Level, (6) Pedagogy, (7) Quality Assessments, (8) Academic Standards. In addition, a number of GEM controlled vocabularies and form schemes were defined. Many of the elements have an array of types that modify element semantics. The Quality and Standards elements may exist independent of the descriptive metadata for a resource thus permitting third-party agencies with appropriate expertise to handle quality assessments and standards mappings.
2. Creating GEM metadata
While GEM metadata can be created with any text editor, a publicly available metadata-generating module called GemCat was developed to ease the process of creation by making it possible for the cataloger to focus solely on content. Currently implemented for Windows 95/NT, a cross-platform Java version of GEMCat is nearing completion.
In GEM, the storage of metadata is handled in one of two ways. First, where the resource being described is an HTML-tagged document, the GEM metadata can be saved within the resource as meta tags. Where internal storage of the metadata is either not possible or undesirable, it can be saved as meta tags in a separate HTML document that references the resource being described.
3. Syntax
Since HTML currently rests at the heart of the web, the GEM Working Group focused on both its evolution and on other relevant initiatives of the World Wide Web Consortium. The changes in HTMLs ability to effectively accommodate richly structured metadata through meta tags has been chronicled elsewhere [6]. From the beginning of the DC dialog, it has been recognized that only the simplest implementations of metadata can be accommodated effectively by HTML 2.0 meta tags (see [6]). Given the HTML 2.0 limitation of relevant meta elements to NAME and CONTENT, there was no other means for dealing with additional information (e.g., schemes and types) other than through what is called overloading content [6] of the form: <META NAME=GEM.subject CONTENT=(SCHEME=GEM) (TYPE=levelOne) Science>. The first generation syntax of GEMCat was based on content overload.
HTML 4.0 comes close to eliminating the content overload problem through the addition of a SCHEME element. When this addition is combined with the appending of type information to the NAME value, content overload is eliminated. The following GEM metadata example illustrates the integration of scheme and type information in HTML 4.0 meta tags:
<META NAME=DC.subject.levelOne.1 SCHEME=GEM CONTENT=Science>
<META NAME=DC.subject.levelTwo.1 SCHEME=GEM CONTENT=Biological sciences>
<META NAME=DC.subject.levelTwo.1 SCHEME=/GEM CONTENT=Life sciences>
<META NAME=DC.subject.levelTwo.1 SCHEME=GEM CONTENT=Technology>
The GEM Working Group is watching closely the World Wide Web Consortiums work on both the Resource Description Framework (RDF) (see [4]) and Extensible Markup Language (XML) (see [2]). Both of these initiatives promise a rich structural environment for GEM metadata and mark the migration path for GEMCat.
4. Distributing GEM metadata
As the web matures as a publishing environment and generally accepted metadata schemes serving specific subject and practice domains evolve, the existing (and future) WWW crawling programs such as Alta Vista, Excite, InfoSeek, Lycos and Webcrawler will provide increasingly efficient and effective access to information. In addition to these general retrieval services, a number of services fashioned to meet the needs of specific domains are also emerging [3]. Readily available tools such as Harvest make local harvesting of metadata possible and its extension to multiple web sites serving a specific community has been demonstrated [1]. Based on these models, GEM metadata can be distributed through two mechanisms: (1) through future harvesting by general purpose web crawlers, and (2) through harvesting of select repositories by means of a GEM harvester. The result of the latter is the GEM Union Catalog (GUC) that provides access to the collections of a consortium of high integrity repositories.
The rationale for the GUC can be found in the following observation of Lagoze, Lynch and Daniel in their exploration of issues surrounding the Dublin Core (1996, p. 6):
[T]he use of the Dublin Core in a limited context might produce very positive results. For example, assume a set of high-integrity sites. Administrators at such sites might tag their documents . . . with Dublin Core metadata elements using a set of well-specified practices that include relatively controlled vocabularies and regular syntax. Retrieval effectiveness across these high-integrity sites would probably be significantly better (assuming harvesting and retrieval tools that make use of the metadata) than the unstructured searches available now through Lycos and Alta Vista.
The growing GEM Consortium is just such set of high-integrity sites.
5. Resource discovery
In the projects first phase, two prototype interfaces to a test database of GEM metadata were developed. At Syracuse, a search and browse environment called GemAccess was built using PLWeb, a full-text, relevance ranking search engine by Personal Library Software. At the University of Washington, a relational database driven interface was developed.
6. Conclusion
As the World Wide Web grows exponentially, discovery and retrieval of useful educational materials grows more problematic. The GEM project seeks to meet the needs of educators, students and parents through development and wide deployment of the GEM standard in the form of a metadata element set, an accompanying array of controlled vocabularies, and a well-defined set of practices in their application. The developmental work of the first phase of the project is largely complete. Full scale application of GEM by Consortium members has begun.
References
[1] Beckett, D. and N. Smith, The ACademic DireCtoryAC/DC, Ariadne, 1996, http://www.ariadne.ac.uk/issue3/acdc/
[2] Bray, T., J. Paoli, and C. Sperberg-McQueen, Extensible Markup Language (XML): W3C Working Draft 07-Aug-97, 1997, http://www.w3.org/TR/WD-xml-lang
[3] Dempsey, L., Meta detectors, Ariadne, 1996, http://www.ariadne.ac.uk/issue3/metadata/
[4] Iannella, R., Application of RDF for extensible Dublin Core metadata, 1997, http://www.dstc.edu.au/RDU/RDF/rdf-dc-app-19970808.html
[5] Lagoze, C., C. Lynch, and R. Daniel, Jr., The Warwick framework: a container architecture for aggregating sets of metadata, TR96-1593, 1996, http://www.nlc-bnc.ca/ifla/documents/libraries/cataloging/metadata/tr961593.pdf
[6] Weibel, S. and R. Iannella, The 4th Dublin Core Metadata Workshop Report, D-Lib Magazine, 1997,http://www.dlib.org/dlib/june97/metadata/06weibel.html