21
Results: Feature Groups
95.88
95.73
93.25
87.70
F (%)
97.50
97.27
95.70
88.15
P (%)
94.25
94.20
90.80
87.24
R (%)
LTW
LT
T
L
L:  Layout only                    LT:  Layout and content type
T:  Content type only          LTW:  Lay out, content type and word group
•
§Best result: all three feature groups
§Contribution of word group feature small
§
Using decision tree classifier:
(1.5)
As we can see, the best results was achieved … although the addition of the word group feature only contributed to a very small improvement. The reason for that, we think, is that genuine table and non-genuine table are very broad classes, each containing samples covering many different subjects. So simply using bag of words can not results in a feature that is too spread out and thus not very discriminative. On possible solution is to use word categories, instead of raw words for this feature – to be explored further