Associative Concept Navigation in MEDLINE and other NLM Databases via a Mosaic - Forms - WWW Interface Combining Natural Language Processing, Expert Systems and (un)Conventional Information Retrieval Techniques

Tamas E. Doszkocs, Seth B. Widoff, Bruno M. Vasta

National Library of Medicine

Specialized Information Services Division


We have implemented a dynamic associative hypertext reSEARCH prototype

WEBLINE

to search MEDLINE, TOXLINE and AIDSLINE, the National Library of Medicine's premier information retrieval databases on biomedicine, environmental health and AIDS-related information. MEDLINE and TOXLINE each contain approximately one and a quarter million journal article references in their respective multi-disciplinary domains, while AIDSLINE provides access to approximately 90,000 citations.

WEBLINE is the scientific literature component of NLM's experimental

Concept Map Interface to TOXNET and its diverse factual databases on toxicology and environmental health .

WEBLINE builds on earlier work in automated information retrieval [Doszkocs et al., 1980].

The WEBLINE prototype search interface uses Mosaic-Forms and PERL scripts to translate both structured user input (e.g. authors, chemical names, journal titles) and free-form natural language queries [Doszkocs, 1983] , into appropriate search statements and commands to be processed by ELHILL , NLM's conventional inverted file, Boolean logic search engine.

Other PERL scripts parse each retrieved record and meaningful data element into HTML documents with that facilitate full hypertext browsing throughout the database via authors, title phrases, medical subject headings, chemical names and registry numbers, and names of journals.

Meaningful noun phrases, such as "multiple disorder diagnosis" and "adaptive competitive neural networks", are automatically generated from document titles and abstracts using natural language processing techniques [Doszkocs, 1986].

As an important option, the user can mark relevant records and can freely construct a new or modified query by selecting multiple entries, e.g. medical subject headings, from a scrolling window of all the automatically generated "hot links".

Expert system techniques are employed to emulate the expertise of highly proficient trained human searchers in the automatic invokation of appropriate search strategies to be passed to the ELHILL search engine.

The WEBLINE prototype represents an extension of the WWW information processing paradigm to information retrieval systems in general.

As such, WEBLINE can serve as a useful model for implementing sophisticated value-added hypertext interfaces to large commercial retrieval databases on the Internet and the World Wide Web [Williams, 1994].

Future R & D enhancements include the use of a neural network [Doszkocs et al., 1990] and a large, automatically generated associative database of some 4 million natural language phrases linked to some 16 thousand medical subject headings to support associative database navigation. We also plan to integrate the Unified Medical Language System [Lindberg et al., 1993], and develop intelligent information agents utilizing these and other knowledge bases.


References

Doszkocs, T.E., Rapp, B.A., Schoolman, H.M., "Automated Information Retrieval in Science and Technology", Science, (208):25-30, 1980

Doszkocs, T.E., "CITE NLM: Natural Language Searching in an Online Catalog", Information Technology and Libraries, 2(4): 345-476, December 1983

Doszkocs, T.E., "Natural language Processing in Information retrieval", Journal of the American Society for Information Science, 37(4):191-196, 1986

Williams, M.E., "The Internet: Implications for the Information Industry and Database Providers", Online & CDROM Review, 18(3):149-156, June 1994

Doszkocs, T.E., Reggia, J., Lin, X., "Connectionist Models and Information Retrieval", In: Annual Review of Information Science and Technology (ARIST), 25: 209-260, 1990

Lindberg, D.A., Humphreys, B.L., McCray, A.T., "Unified Medical Language System", 32(4): 281-291, August, 1993