5
Comparison to Previous Work
§Previous Work
–Mostly based on heuristics
–Limited testing
•Tens to hundreds of samples
•Domain specific (airlines, news, etc.)
–
§Our approach
–Machine learning based, trainable
–Large scale testing (over 10,000 samples)
(1.5)
There has been a limited amount of previous work on this subject. However, most previous approaches are based on heuristics. And the algorithms were tested on either a very small database containing up to hundreds of samples, or a domain specific databases such as … The main goal of our study was to design a more generic and scalable table detection algorithm, therefore we decided to explore machine learning based algorithms that are automatically trainable and thus extendable. And the algorithms were trained and tested on a large database containing over 10000 samples.