16
Classifiers
§
§Decision Tree
–Highly non-homogeneous feature set
–Widely used in text classification
–
§Support Vector Machines (SVM)
–Strong theoretical foundation
–Best performance in a text classification test
§
§
(1)
Now that we have the features, we need to decide on the classifiers to use. We picked two popular classifiers to experiment with. Decision tree was selected because as you might have noticed, our feature set is highly non-homogeneous. Some feature are integers, some are floating point numbers. … Decision trees can handle that quite well. Secondly, decision trees have been widely used … and have proved to perform very well in that context. We also wanted to experiment with Support Vector Machines because they have a strong theoretical foundation based on structural risk minimization, and it was shown to be the best performer in a recent text categorization test.