Refereed Papers
Track: Security I: Misc
Paper Title:
Detecting Image Spam using Visual Features and Near Duplicate Detection
Authors:
- Bhaskar Mehta(Google Inc.)
- Saurabh Nangia(IIT Guwahati)
- Manish Gupta(IIT Guwahati)
- Wolfgang Nejdl(L3S Forschungszentrum )
Abstract:
Email spam is a much studied topic, but even though current email spam
detecting software has been gaining a competitive edge against text based
email spam, new advances in spam generation have posed a new challenge:
image-based spam. Image based spam is email which includes embedded
images containing the spam messages, but in binary format. In this paper, we
study the characteristics of image spam to propose two solutions for
detecting image-based spam, while drawing a comparison with the existing
techniques. The first solution, which uses the visual features for
classification, offers an accuracy of about 98%, i.e. an improvement of at
least 6% compared to existing solutions. SVMs (Support Vector Machines) are used to train classifiers using judiciously decided color,
texture and shape features. The second solution offers a novel approach for
near duplication detection in images. It involves clustering of image GMMs
(Gaussian Mixture Models) based on the Agglomerative
Information Bottleneck (AIB) principle, using Jensen-Shannon
divergence (JS) as the distance measure.
Inquiries can be sent to: