Our goal here
was to construct a large database that includes tables of as many different
varieties as possible. At the same time, we also needed to ensure that we get
a significant number of genuine tables in the database. For this practical
reason, we biased the data collection towards web pages that are more likely
to contain genuine tables …..
Even in this
somewhat biased collection, genuine tables only account for about 15% of all
leaf table elements