Image spam filtering through multi-features analysis /

Email spam has been an ongoing internet security attack. It has matured in different forms of attack such as text-based spam, URL-based spam and image-based spam. By making use of image spam, spammers managed to bypass the defence system of most anti-spam solutions. Here, the images display text me...

Full description

Saved in:
Bibliographic Details
Main Author: Al Hazza, Zubaidah Muataz Hazza
Format: Thesis
Language:English
Published: Kuala Lumpur : Kulliyyah of Information and Communication Technology, International Islamic University Malaysia, 2015
Subjects:
Online Access:http://studentrepo.iium.edu.my/handle/123456789/5407
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Email spam has been an ongoing internet security attack. It has matured in different forms of attack such as text-based spam, URL-based spam and image-based spam. By making use of image spam, spammers managed to bypass the defence system of most anti-spam solutions. Here, the images display text messages to the end-users but the anti-spam software only sees pixels and cannot differentiate the text. Presently, image classification offers effective way to filter image spam. It requires extracting and analyzing selected features in the image. In this thesis, a lightweight image spam filter is proposed and implemented. It uses text, colour, URL and header features. Selecting simple and effective methods for each feature has been worked out to reduce the computing cost and enable a lightweight anti-image spam solution. One main feature being used is detecting the text components in the image which is challenging due to the application of obfuscation methods by spammers. Major contributions of this thesis are, research and implementation of : i) text detection method for obfuscated text named Accumulated Text Extraction (ATE) and ii) colour analysis method named Saturation-Lightness classification (SLC) for differentiating between computer generated and natural images, and iii) image spam filter named Lightweight Image Spam Filter (LISF) that uses lightweight methods to extract features;. The proposed image spam filter consists of four phases; text analyzer, colour analyzer, URL analyzer and header analyzer. The proposed methods give high detection rates compared to other methods. ATE gives F-measure value of 86%; SLC gives F-measure value of 85% and LISF gives F-measure value of 88% for detecting image spam. The results indicated that the proposed methods for text and colour features are effective and can be applied for image spam filtering.
Physical Description:xvii, 165 leaves : ill. ; 30cm.
Bibliography:Includes bibliographical references (leave 132-140).