Since just 217.5 gigabytes of web data will not contain all the pages indexed by the commercial search engines, the TREC algorithm might be at a disadvantage because of the poor coverage of our crawl. To eliminate this problem, for every query in our test set, we add to our crawl all missing pages that are retrieved in the top ten ranks by any of the commercial search engines. We ran these queries on the commercial engines on October 17, 2000 and gathered the first ten results for each. We then fetched the pages that were not in our crawl and added them to our collection. This inclusion ensures that the TREC algorithms have access to all pages that have been retrieved by a commercial engine and are not at any disadvantage due to our small crawl. Even though quite unlikely, it is possible that we might have crawled pages that are not indexed by the commercial engines. This gives a slight advantage to the TREC ad-hoc algorithm in its ability to find such pages.