|
The first group
is the layout features. The first few features in this group are rather
straightforward …..The last feature, cumulative length consistency, or CLC,
is a more complex one. It belongs to a type of features called consistency
features which we think play a very important role in table detection. The
observation behind this is that in a genuine table, one can usually detect a
certain kind of consistency, in terms of physical layout, or content type, or
syntax, or semantics, found either along the row dimension, or the column
dimension. Such consistency is usually much weaker, or does not exist, in a
non-genuine table. This kind of feature was explored in previous work in a
very narrow way: for example, some researchers looked for rows or columns
that consists entirely of dates, or numbers. We attempted to explore this
feature type more systematically in a
broader sense. And CLC is an instance of this feature type.
|