Source Code

Email Spam Classification

For my first workshop with AI@UCI, I decided to tackle the topic of email spam classification. The goal was to help students understand some basic principles of probability and AI classification as well as walking them through some real world applications and development of the technology. I used a naive bayes classifier in order to determine whether or not an email was legitimate (ham) or spam. I first trained the classifier using a large dataset of prelabeled email contents. I used the “bag of words” model in my training, so each word occurrence would be counted and associated with ham or spam emails depending on the label. Once the classifier has been trained, I am able to feed it new emails it has not seen before and get a decision, based on the contents of the email, on whether the email is spam or ham.