9
Features: Layout
§Average # columns and standard deviation
§
§Average # rows and standard deviation
§
§Average overall cell length
§
§Cumulative Length Consistency (CLC)
–
(1)
The first group is the layout features. The first few features in this group are rather straightforward …..The last feature, cumulative length consistency, or CLC, is a more complex one. It belongs to a type of features called consistency features which we think play a very important role in table detection. The observation behind this is that in a genuine table, one can usually detect a certain kind of consistency, in terms of physical layout, or content type, or syntax, or semantics, found either along the row dimension, or the column dimension. Such consistency is usually much weaker, or does not exist, in a non-genuine table. This kind of feature was explored in previous work in a very narrow way: for example, some researchers looked for rows or columns that consists entirely of dates, or numbers. We attempted to explore this feature type more systematically  in a broader sense. And CLC is an instance of this feature type.