AnswerBus is an open-domain question answering system based on sentence level Web information retrieval. It accepts users' natural-language questions in English, German, French, Spanish, Italian and Portuguese and provides answers in English. Five search engines and directories are used to retrieve Web pages that are relevant to user questions. From the Web pages, AnswerBus extracts sentences that are determined to contain answers. Its current rate of correct answers to TREC-8's 200 questions is 70.5% with the average response time to the questions being seven seconds. The performance of AnswerBus in terms of accuracy and response time is better than other similar systems.
question answering, open-domain, QA specific dictionary, information retrieval
Research for automated Question Answering (QA) as a mechanism of information retrieval has been undertaken since 1960's. At the initial stage, the developed systems were often confined within specific domains ([6], [3]). Recently, researchers have been attracted to the task of developing open-domain QA systems based on collections of real world documents, especially the World Wide Web. Several examples of this kind of systems include LCC([2]), QuASM, INOAUT([1]), Mulder and Webclopedia ([4]). QA technologies are approaching rapidly to practical applications.
As an open-domain question answering system based on the Web, Answerbus endeavors to enhance existing techniques and also adopt new techniques to improve both the accuracy and speed of question answering. After receiving users' questions in natural language, it uses five engines and directories (Google, Yahoo, WiseNut, AltaVista, and Yahoo News) to retrieve Web pages that potentially contain answers. From the Web pages, AnswerBus extracts sentences that are determined to contain answers.
Figure 1 describes the working process of AnswerBus. A simple language recognition module will determine whether the question is in English, or any of the other five languages. If the question language is not English, The module will send the original question and language information to AltaVista's translation tool BabelFish, and obtain the question that has been translated into English.
Figure 1 Working process of AnswerBus
The rest of the process is comprised of mainly four steps: 1) select two or three search engines among five for information retrieval and form search engine specific queries based on the question; 2) contact the search engines and retrieve documents referred at the top of the hit lists; 3) extract sentences that potentially contain answers from the documents; 4) rank the answers and return the top choices with contextual URL links.
AnswerBus returns sentences as answers to user questions. As listed in Table 1, different QA systems return answers in different forms.
QA System | Output |
AnswerBus | Sentences |
AskJeeves | Documents |
IONAUT | Passages |
LCC | Sentences |
Mulder | Extracted answers |
QuASM | Document blocks |
START | Mixture |
Webclopedia | Sentences |
START returns a mixture of sentences, links and sometimes, images. This form may suit different user needs, nonetheless, it requires the support of a special knowledge base. To build and maintain a big knowledge base requires tremendous effort. In the meantime, a knowledge base seems to limit START's ability to answer questions outside the domain of its knowledge base.
Mulder tries to extract exact answers and it reached 34% correct rate in top-1 answers for TREC-8 questions. However, 34% may still be far away from real users' expectation. The extraction of exact answers can lead to precision loose, and leave out contextual information that is important for users to judge whether an answer is a correct one.
IONAUT uses passage list as its output form. It provides rich contextual information, nevertheless, demands large user efforts to dig out the answer from a whole passage.
QuASM returns large blocks of text with block boundaries not clearly defined. AskJeeves uses document list like general search engines do. Similar to passages, users for the two systems are required to extract answers themselves.
Based on these observations, Answerbus chooses to return sentences, with the goal of minimizing user effort to extract answers, and at the same time, providing enough contextual information for them to make a quick judgment on the validity of the answers. Several other QA systems, including Webclopedia, and LCC, use the output form as AnswerBus does. More detail description about sentence segmentation and extraction modules used in AnswerBus can be found in [7].
Most QA systems listed in Table 1 try to take advantages of the wealth of the Web. The most effective way to use Web resources for question answering may be to index the whole Web specifically for QA tasks. At the current stage, no one seems to have accomplished this.
START uses several selected web sites as part of its knowledge base. Webclopedia uses locally stored TREC corpus. Other QA systems use one or several search engines to retrieve related documents. AnswerBus currently chooses Google, Yahoo, YahooNews, Alta Vista and Wisenut. Using multiple search engines can cover more knowledge domains, meanwhile, increase the system's ability of error tolerance. Among the five search engines, Google, Alta Vista and WiseNut are three general purpose search engines for answering general questions. YahooNews indexes current news all over the world. It is intended to answer time sensitive questions. Yahoo is a human generated web directory and it assures higher quality of selected web sites. It is also used to answer general questions.
Aiming to incorporate more specific domains, AnswerBus is considering adding MedlinePlus as an additional search engine. Also, YahooNews doesn't index every new news stories. AnswerBus is thinking of modifying International News Connection ([8]) and adding it as another Web resource.
In any language, a sentence is not a simple combination of a set of words. It is a string of words with logical and lexical structures. Thus, oftentimes, in real life, a person does not need to hear every word in a sentence in order to effective capture its meaning. AnswerBus uses following experimental matching rule to determine whether a sentence is potentially an answer to the question:
It is not necessary for a sentence to match every word in the question in order to be an answer candidate.
Or more specifically, AnswerBus uses the following formula to select sentences that may be an answer. In the formula, Q is the number of words in a query, q is the number of matching words in a retrieved sentence.
More detailed explanation about this formula can be found in [7].
TREC 8's 200 questions were used to evaluate AnswerBus's question answering performance and also compare its performance to that of four other similar systems. Table 2 presents the performance of AnswerBus and other four systems. It provides the numbers of correct answers in the systems' top five and top one answers, the standard NIST scores, the maximal, minimal and average response times measured in seconds, standard deviation of response times, and the average lengths of returned answers.
Table 2 demonstrates that AnswerBus outperforms other similar systems, in terms of both accuracy and response time. AnswerBus also returns more concise answers than other systems.
Table 2 Performance of Online Question Answering Systems
Systems | Correct TOP 5 | Correct TOP 1 | NIST Score | Tmax (s) | Tmin (s) | Tmean (s) | Tstd dev | Lmean (byte) |
AnswerBus | 141 | 120 | 64.18% | 15.06 | 3.79 | 7.20 | 3.07 | 141 |
IONAUT | 44.88 | 2.78 | 12.51 | 6.81 | 1312 | |||
LCC | 97 | 75 | 41.73% | 342.52 | 4.30 | 44.24 | 32.63 | 178 |
QuASM | 13 | 7 | 4.45% | 284.29 | 2.61 | 20.72 | 33.92 | 1766 |
START | 29 | 29 | 14.50% | 62.07 | 2.02 | 9.84 | 7.45 |
No similar data were obtained for other two famous QA systems, Mulder and Webclopedia, because of their off-line status at the time when this evaluation was conducted.
As a conclusion, in comparison to other Web-based QA systems that are currently accessible on the Web, AnswerBus demonstrates a higher accuracy of question answering and a faster speed. The enhanced performance can be attributed to mainly three features of the system.