(1)
The first group is the layout features. The first few features in this group
are rather straightforward …..The last feature, cumulative length consistency,
or CLC, is a more complex one. It belongs to a type of features called
consistency features which we think play a very important role in table
detection. The observation behind this is that in a genuine table, one can
usually detect a certain kind of consistency, in terms of physical layout, or
content type, or syntax, or semantics, found either along the row dimension,
or the column dimension. Such consistency is usually much weaker, or does not
exist, in a non-genuine table. This kind of feature was explored in previous
work in a very narrow way: for example, some researchers looked for rows or
columns that consists entirely of dates, or numbers. We attempted to explore
this feature type more systematically
in a broader sense. And CLC is an instance of this feature type.