On an in-depth examination, we notice why the more expensive two-pass system is worse than the one-pass system. For example, consider the query ``horizon blue cross blue shield''(#). The one-pass system retrieves the relevant page, www.bcbsnj.com, as the top ranked page. However, the first pass also retrieves many health insurance/care related pages in the top ten pages. In the query expansion step, this query loses its focus on ``horizon blue cross blue shield'' and instead becomes a general health insurance query, failing to retrieve the desired page in the second pass. This loss of focus is observed for many other queries in our set.
It is worth noting that many pages retrieved by the TREC algorithms are quite relevant to the topic at hand. They are just not the page the user was looking for in our experiments. Under the TREC criteria for judging relevance, many of these pages are ``on topic'' and will be judged relevant. This would explain why under the TREC measurements, the commercial engines do not show any advantage over the TREC algorithms.