T:Content type onlyLTW:Lay out, content type and word group
•
§Best result: all three feature groups
§Contribution
of word group feature small
§
Using
decision tree classifier:
(1.5)
As we can see,
the best results was achieved … although the addition of the word group
feature only contributed to a very small improvement. The reason for that, we
think, is that genuine table and non-genuine table are very broad classes,
each containing samples covering many different subjects. So simply using bag
of words can not results in a feature that is too spread out and thus not
very discriminative. On possible solution is to use word categories, instead
of raw words for this feature – to be explored further