Note: this document contains a HTML table; as of the time of writing, not all available Web clients are capable of formatting this properly.
A forms-based entry tool, Apprentice,® allows an information provider to remotely create a structured description of an information source. A human cataloger then adds index information using the NLM MeSH® vocabulary as well as the UMLS semantic network, and adds the description to the Information Sources Map (ISM) database.
Sourcerer is a Common Gateway Interface (CGI) application that acts as a client to four different types of servers:
A user query is interactively refined with the assistance of the UMLS Knowledge Source servers, which is also used to map query terms into high-level concepts from the UMLS semantic network. These high-level concepts are used to retrieve a list of potentially useful information sources from the database on the ISM server, returned in the form of URNs. URNs are then resolved to URLs by another server. The user can then screen the proposed sources or connect to them automatically. In one instance, the original user query is passed along to the TIS using TIS-specific syntax.
The primary constituency of the NLM, health care workers and biomedical reseach workers, generally approach MEDLINE with crisply defined information needs and tight time constraints. Accrued experience at the NLM suggests that retrieval systems based on the use of defined vocabularies are more successful in this setting than systems based on the natural language contents of documents [Lancaster & Warner, 1993].
The use of defined vocabularies has spread widely within medicine; in fact, the proliferation of multiple defined vocabularies is regarded by some workers as a impediment to the creation of univerally useable computerized medical records. To counter the centrifugal effects of multiple vocabularies, the NLM has been engaged in a major multi-year research and development effort known as the Unified Medical Language System (UMLS) Project [Humphreys & Lindberg, 1993]. The UMLS has three principal components, known collectively as the UMLS Knowledge Sources (the numbers below refer to the 1994 edition):
The UMLS Knowledge Sources are updated annually, and released on CD-ROM, under the terms of a beta test agreement, for purposes of research. A network-accessible server that provides access to UMLS data is under development, and is used by Sourcerer.
Abbrev. | Name | Owner/Originator | Purpose |
---|---|---|---|
ACR92 | Index for Radiological Diagnoses | American College of Radiology (1986) | Radiology and ultrasound |
AIR92 | AI/RHEUM | NLM (1992) | Rheumatology |
COS89 | COSTAR (COmputer-Stored Ambulatory Records) | Massachusetts General Hospital (1989) | Outpatient records |
COS92 | COSTAR (COmputer-Stored Ambulatory Records) | Massachusetts General Hospital (1992) | Outpatient records |
CSP93 | CRISP Thesaurus | NIH (1993) | Biomedical research |
CPT89 | Physicians' Current Procedural Terminology (CPT) | AMA (1989) | Physicians' billing |
CST93 | COSTART: Coding Symbols for Thesaurus of Adverse Reaction Terms | FDA (1993) | Drug reactions |
DOR27 | Dorland's Medical Dictionary, 27th Ed. | Saunders | General medical terminology |
DSM3R | Diagnostic and Statistical Manual of Mental Disorders (DSM) | American Psychiatric Association (1987) | Psychiatry |
DXP92 | DXplain, an expert diagnosis program | Massachusetts General Hospital (1992) | General medicine |
HHC93 | Home Health Care Classification of Nursing Diagnoses and Interventions (HHC) | Georgetown Univ. (1993) | Nursing |
ICD89 | International Classification of Diseases, 9th Rev., 3rd Ed. (ICD-9) | HCFA (1989) | General medicine |
ICD91 | International Classification of Diseases, 9th Rev., 4th Ed. | HCFA (1991) | General medicine |
INS94 | Thesaurus Biomedical Français/Anglais (French MeSH) | INSERM (1993) | Biomedical research and clinical |
LCH90 | Library of Congress Subject Headings (LCSH) | LOC (1989) | General knowledge |
MCM92 | List of Epidemiology Terms | McMaster University (1992) | Epidemiology |
MIM93 | Online Mendelian Inheritance in Man | Johns Hopkins Univ. (1993) | Human genetics |
MSH94 | Medical Subject Headings (MeSH) | NLM (1994) | Biomedical research and clinical |
MSH94 | Medical Subject Headings (Supplementary Chemical Terms) | NLM (1994) | Biomedical research and clinical |
MTH | Metathesaurus | NLM (1994) | Unites biomedical vocabularies |
NAN92 | Classification of Nursing Diagnoses | 9th Conference on the Classification of Nursing Diagnoses(1992) | Nursing |
NEU | Neuronames Brain Hierarchy | Univ. of Washington | Brain AnatomyTD> |
NIC93 | Nursing Interventions Classification | Iowa Intervention Project (1993) | Nursing |
PDQ93 | Physician Data Query Online System | NCI (1993) | Oncology |
SNM2 | Systematized Nomenclature of Medicine (SNOMED II) | College of American Pathologists (1979, 82) | Human pathology |
SNMI | SNOMED International | College of American Pathologists (1979, 82) | Human pathology |
SNM3 | Systematized Nomenclature of Human and Veterinary Medicine | College of American Pathologists, American Veterinary Medicine Association (1993) | Human and veterinary pathology |
UMS94 | Universal Medical Device Nomenclature System (UMDNS): Product Category Thesaurus | ECRI (1994) | Medical devices |
CRISP | U.S. P.H.S. Thesaurus for indexing scientific projects | U.S. Public Health Service | Biomedical research |
UWA92 | Primate Information Center Data | Univ. Washington (1992) | General medical |
The goal of a good search is to maximize precision and recall. The burgeoning number of biomedical information sources available via the Internet has broadened the search problem beyond that of maximizing precision and recall within a specific database; the problem now becomes two-tiered: first, identify appropriate information sources; second, search those sources effectively. The browsing paradigm and word-based indexing schemes offered by WWW, gopher, and WAIS in their currently most commonly encountered forms do not suffice to address the retrieval needs of clinicians and biomedical researchers. The Sourcerer Project is attempting to address this issue through the application of UMLS-based tools and appropriate Internet-based standards.
[intermediate; 460 x 324 pixels, 18339 bytes]
[full-size; 1152 x 812 pixels, 62433 bytes]
Sourcerer Architecture
Sourcerer is a Common Gateway Interface (CGI) application,
implemented behind the NCSA Web server, httpd.
It is written primarily in the perl scripting language,
with some modules written in the C programming language,
and runs on a Sun workstation under the Solaris 2.3 (UNIX) operating system.
The user interacts with it via any forms-capable Web client.
Sourcerer acts as a client to four different types of network-based
servers, as diagrammed in Figure 2:
[intermediate; 454 x 443 pixels, 9308 bytes]
[full-size; 858 x 836 pixels, 27275 bytes]
Prototype Walk-Through
The capabilities of the current Sourcerer prototype are best illustrated
by stepping through an actual search.
The prototype is intentionally didactic,
illustrating the intermediate stages of the search-and-retrieval process.
It is not presented as an example of optimal user interface design.
Figure 3 illustrates the top-level access document for Sourcerer, which allows access to servers for the ISM and semantic network, in addition to the Sourcerer prototype.
[intermediate; 456 x 447 pixels, 18645 bytes]
[full-size; 913 x 894 pixels, 26666 bytes]
In stage 1 of the process of using Sourcerer, illustrated in Figure 4, the user has entered a query into the Sourcerer search form. Striving for simplicity and following other design precedents within NLM, the form presents only three text subwindows, each of which can contain a multi-word string describing a single biomedical concept. Although the concepts windows are connected by radiobuttons for the Boolean operations AND and OR, the current prototype employs only AND in its searches.
The forms are designed so that they are used in a top-down fashion. The upper part of the form includes instructions to the user, and various form-based widgets in which information can be entered. This is followed by a section in which the user can specify what action is to be taken with the supplied information; in the example of Figure 4, the default action is to proceed with a search of the UMLS Metathesaurus using the user-entered concepts.
In the example, the word "aspirin" has been entered into the first concept window, and the phrase "bleeding time" into the second. The bleeding time [Rodgers & Levin, 1990]. is a medical diagnostic procedure in which a superficial incision of controlled dimensions is made on the forearm of a subject, and the time to cessation of bleeding recorded. It is thought to reflect both capillary physiology and the function of platelets, small cell products in the blood that participate in the early stages of clot formation. Platelet function is inhibited by aspirin; the user is trying to learn more about this interaction.
[intermediate; 456 x 447 pixels, 17690 bytes]
[full-size; 913 x 894 pixels, 23126 bytes]
Figure 5 presents Stage 2 in the use of Sourcerer, in which the user reviews the results of having consulted the UMLS Metathesaurus. In this instance, Sourcerer has found one and only one concept matching each of the two character strings entered in Stage 1. The Metathesaurus server returns matching concepts, as well as the semantic types associated with them (none of this is shown to the viewer at this stage; we will examine the information that came back from this search later in the walk-through). In requesting to proceed to Stage 3, the user triggers a consultation with the UMLS semantic network.
[intermediate; 456 x 447 pixels, 15879 bytes]
[full-size; 913 x 894 pixels, 22992 bytes]
In Stage 3 (Figure 6), Sourcerer has used the semantic types presented in Stage 2 to consult the UMLS semantic network. The user is presented with a list of non-verb-noun triples (known as semantic type relationships), based on allowed relationships between the semantic types associated with the original query concepts. For maximal clarity to the user, these are presented in terms of the original query strings rather than the underlying semantic types. The user selects any of the triples that are deemed applicable to the query. In this case, the phrase "bleeding time assesses_effect_of aspirin" has been selected. Because a variant of the bleeding time test determines the bleeding time both before and after a challenge dose of aspirin, the phrase "bleeding time uses aspirin" has also been selected. The default action is to proceed to stage 4 after a search of the ISM database.
[intermediate; 456 x 447 pixels, 16873 bytes]
[full-size; 913 x 894 pixels, 24197 bytes]
In Stage 4 (Figure 7), the user is presented with a list of information resources which have been deemed appropriate to the original query. This list is derived from a search of the ISM database using the semantic types, semantic type relationships, and MeSH headings derived from the user query in earlier stages.
[intermediate; 456 x 447 pixels, 18920 bytes]
[full-size; 913 x 894 pixels, 26169 bytes]
In Figure 8, the user has selected one of the sources from the list presented in (Figure 7), MEDLINE. This results in a formatted display of part of the ISM database record for MEDLINE.
[intermediate; 456 x 447 pixels, 14979 bytes]
[full-size; 913 x 894 pixels, 21097 bytes]
Figure 9 shows a continuation of the formatted display of the partial ISM database record for MEDLINE (in this case, the free-text Definition and General Description fields; most of the fields that occur in an ISM record are not shown in this display);
[intermediate; 456 x 447 pixels, 27385 bytes]
[full-size; 913 x 894 pixels, 37668 bytes]
We now return to exploring the information that was obtained in earlier stages through consulting the UMLS Metathesaurus and semantic network. Employing the "Back" button of the Web client to return to the list of sources first shown in Figure 7, then scrolling down to the bottom of the document, selecting "View Search Information" as the action to be performed (see Figure 10), and then clicking on the "Execute Action" button produces the display shown in Figure 11.
[intermediate; 456 x 447 pixels, 16112 bytes]
[full-size; 913 x 894 pixels, 23019 bytes]
This results in the display (Figure 11) of the information that was accrued in the earlier stages of Sourcerer, through consulting the Metathesaurus and semantic network. The user-entered string "bleeding time" matched one and only one Metathesaurus concept, the concept name for which is also "bleeding time." This display also shows the Metathesaurus unique identifier for this concept, its associated Semantic Type ("Diagnostic Procedure"), Mesh tree number (MeSH is organized as a set of topical hierarchical trees) and MeSH definition.
[intermediate; 456 x 447 pixels, 17061 bytes]
[full-size; 913 x 894 pixels, 23260 bytes]
Scrolling down in the display shown in Figure 11 (see Figure 12) reveals the UMLS information that was retrieved in connection with the user-entered string "aspirin." The Metathesaurus returned: one and only one concept (the name for which is "Aspirin"); an extensive list of synonyms; two associated Semantic Types ("Organic Chemical" and "Pharmacological Substance"); and, multiple Mesh tree numbers (this part of the output is truncated in Figure 12).
[intermediate; 456 x 447 pixels, 13425 bytes]
[full-size; 913 x 894 pixels, 19373 bytes]
The listed semantic types are associated with Web anchors. Selecting "Organic Chemical" from the display shown in Figure 12 leads to a display of information from the semantic network (including a unique identifier, name, tree number, and definition), as shown in Figure 13.
[intermediate; 456 x 447 pixels, 16558 bytes]
[full-size; 913 x 894 pixels, 23281 bytes]
Clicking on the "Back" button of the display shown in Figure 13 returns us to the summary of accrued search information already visited in Figures 11 and 12. Scrolling down further in this document (see Figure 14)) reveals not only the complete list of Mesh headings truncated in Figure 12, but also the list of potentially valid semantic relationships that was determined from consulting the UMLS semantic network. The semantic types shown here were mapped into the corrsponding user-specific strings for the display shown in Figure 6.
[intermediate; 456 x 447 pixels, 18151 bytes]
[full-size; 913 x 894 pixels, 25403 bytes]
Clicking on the "Back" button returns us to the display shown in Figures 7 and 10. Selecting the MEDLINE anchor (see Figure 7) returns to the display shown in Figure 8. Selecting the first of the two URLs thus displayed, results in the display of Figure 15.
[intermediate; 456 x 447 pixels, 8921 bytes]
[full-size; 913 x 894 pixels, 13726 bytes]
This is an experimental Web-based front-end to MEDLINE, the premier NLM bibliographic database. Known provisionally as NetCoach [Kingsland, Syed & Harbourt, 1994], this CGI application utilizes artificial intelligence methodologies developed for an earlier PC-based system, known as Coach [Harbourt, Syed, & Kingsland, 1993]. Netcoach uses UMLS and MeSH to provide assistance in searching the MEDLINE database. The initial Netcoach form requires entry of the user's account identifier and password (Figure 16). Selecting the button in Figure 15 leads to the NetCoach password entry form of Figure 16.
[intermediate; 456 x 447 pixels, 18933 bytes]
[full-size; 913 x 894 pixels, 27159 bytes]
Upon entering a user identification and password and selecting the "Proceed" button, the NetCoach query form (Figure 17) appears. Like Sourcerer, NetCoach employs three concept entry windows. The original user query has been automatically entered into the NetCoach form; the use of HTML forms-based hidden fields to pass query information enables this simple interprocess communication.
[intermediate; 456 x 447 pixels, 11567 bytes]
[full-size; 913 x 894 pixels, 18454 bytes]
Selecting the "Perform MEDLINE search" button results in the return of a summary of search results (Figure 18).
[intermediate; 456 x 447 pixels, 10379 bytes]
[full-size; 913 x 894 pixels, 15769 bytes]
Entry of the number of articles to be displayed in the appropriate field in the form of Figure 18 results in the display of matching bilbiographic citations (with abstracts when available).
[intermediate; 456 x 447 pixels, 25060 bytes]
[full-size; 913 x 894 pixels, 34430 bytes]
The first such article returned is shown in Figure 19. All 41 of the retrieved articles were found to be appropriate to the original user query.
To support the extended interactions between user and service of the sort demonstrated above, in which the current interaction builds upon the results of past ones, requires that the server maintain a historical record of prior transactions with this particular user. This was achieved by the creation of a simple server-side state engine, based upon earlier experience with a large Web-based catalogued image archive [Rodgers & Srinivasan, 1994]. State is maintained in two files: a session database file and a search component file. This former contains the client IP address, a unique session identification number, the time of the last communication from the client, and pointers to active search components, which are stored in the search component file. Each document returned to a client contains the session identification number, in a forms-based hidden field. This allows Sourcerer to check for expired sessions when a document is returned, and to obtain the state information for a session.
A number of challenging problems remain to be addressed in the next stage of this project:
[intermediate; 456 x 447 pixels, 32501 bytes]
[full-size; 901 x 900 pixels, 55626 bytes]
Acknowledgments
The authors thank their colleagues from the UMLS team,
and particularly
Anna Harbourt,
Bill Hole,
Betsy Humphreys,
Lawrence Kingsland,
Dan Masys,
Alexa McCray,
and
Edmund Syed,
for their collaboration and critical comments.