Classifying spam with generalized additive neural networks

Labuschagne, Pieter

dc.contributor.advisor	Du Toit, J.V., Dr.
dc.contributor.author	Labuschagne, Pieter
dc.date.accessioned	2017-08-30T12:39:35Z
dc.date.available	2017-08-30T12:39:35Z
dc.date.issued	2017
dc.identifier.uri	http://hdl.handle.net/10394/25456
dc.description	MSc (Computer Science), North-West University, Potchefstroom Campus, 2017	en_US
dc.description.abstract	E-mail is an important and convenient communication tool used by many people on a daily basis. For individuals it is an inexpensive way to stay in contact with family and friends located around the world. An e-mail address serves as an online identity when signing up for different online services like social media (Facebook) and social networking (LinkedIn). Companies use e-mails to facilitate communication between employees and to communicate with their clients by sending information such as newsletters, invoice statements and promotional content. E-mails are also used for core business marketing. Unfortunately, some of the benefits provided by the e-mail application like sending out mass e-mails with little effort at a minimal cost to the sender, are abused by some e-mail users known as spammers. A spammer's incentive for sending unsolicited e-mails in large quantities to an indiscriminate set of recipients is mostly driven by revenue generation. Most spam messages sent contain content related to promotional products and services, which might be a scam or phishing attempt to steal sensitive user information like banking details and passwords. Currently, more than 55.00% of all e-mail network traffic comprises unsolicited spam e-mails which clutters users' inboxes. Traditional spam-filtering approaches have thus far been unsuccessful in solving the spam problem. This is partly due to spammers who generate new spam message content on a regular basis making it difficult for spam filters to classify spam according to a fixed pattern. The main purpose of this study is to determine the feasibility of employing a Generalized additive neural network (GANN) to filter spam e-mail messages with a specific automated construction algorithm. The GANN is a relatively new supervised machine learning technique capable of recognising complex patterns in data and able to adapt to changes over time. The use of GANN models is suggested for classification problems where it might be important to understand the relationship between input attributes and the expected target value. In this study the definition of spam, consequences of unmanaged spam and current spam-filtering techniques are investigated. The current state of the spam problem is summarised followed by a discussion on artificial neural networks that have pattern recognition capabilities. Literature related to the GANN is reviewed with a discussion on both the interactive and automated construction methodologies for the GANN. The latter will be considered as a possible spam filter to try and mitigate the spam problem. A number of spam filtering experiments are conducted on five publicly available spam corpora (Enron, GenSpam, PU1, SpamAssassin and TREC2005) each with different pre-processing techniques and evaluation measures. The Bagging and Boosting ensemble techniques which may improve on the GANN's results are also considered. The GANN and ensembles are then compared to other spam filtering techniques applied to the five corpora before being compared to each other. Results show that the GANN is a feasible spam filter able to mitigate spam e-mails. It compares well to other spam filter techniques found in the literature. In addition, both ensemble methods are able to improve on the GANN's results in most cases.	en_US
dc.language.iso	en	en_US
dc.publisher	North-West University (South Africa) , Potchefstroom Campus	en_US
dc.subject	Artificial neural networks	en_US
dc.subject	AutoGANN	en_US
dc.subject	Bagging	en_US
dc.subject	Boosting	en_US
dc.subject	Classification	en_US
dc.subject	E-mail	en_US
dc.subject	Ensemble	en_US
dc.subject	GANN	en_US
dc.subject	Generalized additive neural networks	en_US
dc.subject	MLP	en_US
dc.subject	Multilayer perceptron	en_US
dc.subject	Spam	en_US
dc.subject	AutoGANN	en_US
dc.title	Classifying spam with generalized additive neural networks	en_US
dc.type	Thesis	en_US
dc.description.thesistype	Masters	en_US

Files in this item

Name:: Labuschagne_ P_2017l.pdf
Size:: 2.860Mb
Format:: PDF

View/Open

This item appears in the following Collection(s)

Engineering [1424]

Show simple item record