Refereed Papers
Track: Data Mining: Learning
Paper Title:
FloatCascade Learning for Fast Imbalanced Web Mining
Authors:
- Xiaoxun Zhang(IBM China Research Lab)
- Xueying Wang(Peking University)
- Honglei Guo(IBM China Research Lab)
- Zhili Guo(IBM China Research Lab)
- Xian Wu(IBM China Research Lab)
- Zhong Su(IBM China Research Lab)
Abstract:
This paper is concerned with the problem of Imbalanced
Classification (IC) in web mining, which often arises on the web
due to the “Matthew Effect”. As web IC applications usually need
to provide online service for user and deal with large volume of data,
classification speed emerges as an important issue to be addressed.
In face detection, Asymmetric Cascade is used to speed up
imbalanced classification by building a cascade structure of simple
classifiers, but it often causes a loss of classification accuracy due to
the iterative feature addition in its learning procedure. In this paper,
we adopt the idea of cascade classifier in imbalanced web mining
for fast classification and propose a novel asymmetric cascade
learning method called FloatCascade to improve the accuracy. To
the end, FloatCascade selects fewer yet more effective features at
each stage of the cascade classifier. In addition, a decision-tree
scheme is adopted to enhance feature diversity and discrimination
capability for FloatCascade learning. We evaluate FloatCascade
through two typical IC applications in web mining: web page
categorization and citation matching. Experimental results
demonstrate the effectiveness and efficiency of FloatCascade
comparing to the state-of-the-art IC methods like Asymmetric
Cascade, Asymmetric AdaBoost and Weighted SVM.
Inquiries can be sent to: