E-Mail Filtering Using Bayesian Network

E-Mail is important today. It is applied in many application; Education, Business and personal communication. Once there are too many E-Mail arrived in the mailbox and mostly are unwanted E-Mail, called Spam. Spam is a costly problem. At Prince of Songkhla University (PSU), there are around 5,000 e-...

Full description

Saved in:
Bibliographic Details
Main Author: Kanakorn, Horsiritham
Format: Thesis
Language:eng
eng
Published: 2004
Subjects:
Online Access:https://etd.uum.edu.my/1242/1/KANAKORN_HORSIRITHAM.pdf
https://etd.uum.edu.my/1242/2/1.KANAKORN_HORSIRITHAM.pdf
Tags: Add Tag
No Tags, Be the first to tag this record!
id my-uum-etd.1242
record_format uketd_dc
institution Universiti Utara Malaysia
collection UUM ETD
language eng
eng
topic TK5101-6720 Telecommunication
spellingShingle TK5101-6720 Telecommunication
Kanakorn, Horsiritham
E-Mail Filtering Using Bayesian Network
description E-Mail is important today. It is applied in many application; Education, Business and personal communication. Once there are too many E-Mail arrived in the mailbox and mostly are unwanted E-Mail, called Spam. Spam is a costly problem. At Prince of Songkhla University (PSU), there are around 5,000 e-mail users and around 40,000 messages received a day. There are 10% of them are virus and spam messages. Otherwise, the mail server has to pay memory and CPU load to process these virus and spam messages. These may cause the server response slowly and sometime once the system resources are insufficient, the mail server may crash and unavailable. Many filtering techniques are proposed. Bayesian Network is one of the popular Spam Filtering methods. This project is study Bayesian Network using SpamBayes, Open Source Software. Spam E-Mail are always written in English but at PSU there are Thai Language Spam found increasingly. Thai Language is different from English Language because English word is separated by space but Thai Language is not. The project examines the SpamBayes accuracy on Spam classification of mix Thai and English E-Mail messages. Thai and English E-Mail are trained together and test messages are also Thai and English mixed. The result shows that SpamBayes can classify Spam both in Thai or English.
format Thesis
qualification_name masters
qualification_level Master's degree
author Kanakorn, Horsiritham
author_facet Kanakorn, Horsiritham
author_sort Kanakorn, Horsiritham
title E-Mail Filtering Using Bayesian Network
title_short E-Mail Filtering Using Bayesian Network
title_full E-Mail Filtering Using Bayesian Network
title_fullStr E-Mail Filtering Using Bayesian Network
title_full_unstemmed E-Mail Filtering Using Bayesian Network
title_sort e-mail filtering using bayesian network
granting_institution Universiti Utara Malaysia
granting_department Sekolah Siswazah
publishDate 2004
url https://etd.uum.edu.my/1242/1/KANAKORN_HORSIRITHAM.pdf
https://etd.uum.edu.my/1242/2/1.KANAKORN_HORSIRITHAM.pdf
_version_ 1747827103702712320
spelling my-uum-etd.12422013-07-24T12:11:04Z E-Mail Filtering Using Bayesian Network 2004 Kanakorn, Horsiritham Sekolah Siswazah Sekolah Siswazah TK5101-6720 Telecommunication E-Mail is important today. It is applied in many application; Education, Business and personal communication. Once there are too many E-Mail arrived in the mailbox and mostly are unwanted E-Mail, called Spam. Spam is a costly problem. At Prince of Songkhla University (PSU), there are around 5,000 e-mail users and around 40,000 messages received a day. There are 10% of them are virus and spam messages. Otherwise, the mail server has to pay memory and CPU load to process these virus and spam messages. These may cause the server response slowly and sometime once the system resources are insufficient, the mail server may crash and unavailable. Many filtering techniques are proposed. Bayesian Network is one of the popular Spam Filtering methods. This project is study Bayesian Network using SpamBayes, Open Source Software. Spam E-Mail are always written in English but at PSU there are Thai Language Spam found increasingly. Thai Language is different from English Language because English word is separated by space but Thai Language is not. The project examines the SpamBayes accuracy on Spam classification of mix Thai and English E-Mail messages. Thai and English E-Mail are trained together and test messages are also Thai and English mixed. The result shows that SpamBayes can classify Spam both in Thai or English. 2004 Thesis https://etd.uum.edu.my/1242/ https://etd.uum.edu.my/1242/1/KANAKORN_HORSIRITHAM.pdf application/pdf eng validuser https://etd.uum.edu.my/1242/2/1.KANAKORN_HORSIRITHAM.pdf application/pdf eng public masters masters Universiti Utara Malaysia Androutsopoulos I., Koutsias J., Chandrinos K. & Spyropoulos C. D. (2000). An experimental comparison of naive bayesian and keyword-based anti-spam filtering with personal e-mail messages, SIGIR 160-167. Anthony B. (2004). Spam Bayes-Background Reading, <http://spambayes.sourceforge.net/backround.html> (20 September 2004) Bauer M. & Winter B. (2000). Using Postfix for Secure SMTP Gateways, Linux Journal, volume 2000, Issue 78es. Bevilacqua-Linn M. (2003). Machine Learning for Naive Bayesian Spam Filter Tokenizntion, University of Rochester, New York. Bickmore, T. W. (1994). Real-Time Sensor Data Validation, NASA Contractor Report 195295, National Aeronatics and Space Administration. Cranor L. F. & LaMacchia B.A. (1998). Spam! Communications of the ACM, Vol.41, NO. 8, p.74-83. Cunningham P., Nowlan N., Delany SJ. & Haahr M. (2003). A Case-Based Approach to Spam Filtering that Can Track Concept Dry?, The ICCBR'03 Workshop on Long-Lived CBR Systems, Trondheim, Norway. Diao Y., Lu H. & Wu D. (2000). A Comparative Study of Classification Based Personal E-mail Filtering, Proceedings of the 4th Pacific-Asia Conference on Knowledge Discovery and Data Mining, Current Issues and New Applications, p. 408 - 41 9. Springer-Verlag London, UK. Elkan C. (1997). Naile Bayesian Learning, Department of Computer Science, Harvard University. Graham P. (2002). A Plan for Spam, <http://www.paulgraham.com/spam.html> (20 September 2004) Huang, T., Koller, D., Malik, J., Ogasawara, G., Rao, B., Russell, S., & Weber, J. (1994). Automatic Symbolic Traffic Scene Analysis Using Belief Networks,Proceedings of National Conference on Artificial Intelligence, Morgan Kaufmann, San Mateo, CA. Hung E. (2001). Deduction of Proclaim Recipes from Classified Emails, Department of Computer Science, University of Maryland. Itskevitch J. (2001). Automatic Hierarchical E-Mail Classification Using Association Rules, Simon Fraser University. Kevin R. G. (2003). Using Latent Semantic Indexing to Filter Spam, SAC, ACM. Kristian E. (2004). Winning the War on spam: Comparison of Bayesian spam filters, <http://home.dataparty.no/kristian/reviews/bayesian> (20 September 2004) Meyer T.A.& Whateley B. (2004). SpamBayes: Effective open-source, Bayesian based, email classification system, CEAS, Canada. Niedermayer D. (1998). An Introduction to Bayesian Networks and their Contemporary Applications,<http ://www.niedermayer.ca/papers/bayesian/index.html (20 September 2004) Pazzani J. M. (2000). Representation of Electronic Mail Filtering Profiles: A User Study, Department of Information and Computer Science, University of California. Peter T. (2004). SpamBayes-Credit,<http://spambayes.sourceforget>(20 September 2004) Redmond M. and Adelson B. (1998). AlterEgo E-Mail Filtering Agent-Using CBR as a Service, In "Case-Based Reasoning Integrations, Papers from the 1998 Workshop" (AAAI-98). 143-148. Madison, WI. AAAI Press. Rennie J. D. M. (2000). ifile: An Application of Machine Learning to Email Filtering, AI Lab, MIT, KDD2000 Text Mining Workshop Boston, MA USA. Robinson G. (2003). Better Bayesian Filtering, <http://www.paulgraham.com/better.html>(20 September 2004) Sabil M. (2002). MeatSlicer: Spam Classification with Naieve Bayes and Smart Heuristics, <http://web.mit.edu/msalib/www/writings/classes/6.034/project2/paper.pdf>(20 September 2004) Sahami M., Dumais S., Heckerman D., and Horvitz E. (1998) A Bayesian approach to filtering junk email, In Proceedings of the AAAI Workshop on Learning for Text Categorization. Vemuri V. and Tang N. (2004). Solving Inverse Problems via Machine Learning and Knowledge Discovery, In (Eds.Takumi Ichimura and Katsumi Yoshida.),Knowledge-Based Intelligent Systems for Healthcare, CRC Press. White Paper. (2003). "Symantec: Neural Network-based Antispam heuristics" http://www.syrnantec.com. (16 July 2004,)