13
Features: Text Content
§
§Treat it as a text classification problem
§Special characteristics:
–Dramatic length differences
–Highly skewed distribution
–Continuous score for incorporation with other features
(1)
Since most tables contain a lot of text, we decided to explore the possibility of deriving a feature by treating it as a text classification problem. Text classification is a well studied problem in IR and many algorithms have been proposed. However, there are many special characteristics in our particular application.
…. Finally, since this is only one of the many features, we need to have a continuous score as opposed to a binary decision in order to incorporate it with other features. Considering all these, we designed a feature based on the vector space model and we call it the word group feature.