Skip to content

beelze-b/WhosInMyDMs

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

62 Commits
 
 
 
 
 
 
 
 

Repository files navigation

WhosInMyDMs

For our assignment, we analyzed SMS text messages to classify them as ‘spam’ or ‘ham’. As online communication has adapted and shifted from email to various forms of direct messaging, phishers have adjusted where and how they target individuals with spam. Users want to know that their accounts are secure, and they do not have time to be bothered by receiving spam notifications and messages. We looked to create a classification model, using the following algorithms: Logistic Regression, Random Forest, and LSTM Neural Network. For each of these models, we performed feature extraction using sparse vectoratization techniques CountVectorizer and TfidfVectorizer as well as utilizing bigrams, n-grams and word embeddings to test against a dense vector representation. After performing an ablation study on the models and tuning each with the optimal hyperparameters via cross validation, we compared a set of performance metrics: AUC, precision, recall, and F1 score to choose the best model. Within our paper, we further discuss the the business value of counteracting spam and malicious messages through classification, the data sources and preprocessing, and explaining which model and feature extraction method performs best. Following this we construct a proposal of how to deploy our solution into production.

About

Group Project

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 3

  •  
  •  
  •