WhosInMyDMs

For our assignment, we analyzed SMS text messages to classify them as ‘spam’ or ‘ham’. As online communication has adapted and shifted from email to various forms of direct messaging, phishers have adjusted where and how they target individuals with spam. Users want to know that their accounts are secure, and they do not have time to be bothered by receiving spam notifications and messages. We looked to create a classification model, using the following algorithms: Logistic Regression, Random Forest, and LSTM Neural Network. For each of these models, we performed feature extraction using sparse vectoratization techniques CountVectorizer and TfidfVectorizer as well as utilizing bigrams, n-grams and word embeddings to test against a dense vector representation. After performing an ablation study on the models and tuning each with the optimal hyperparameters via cross validation, we compared a set of performance metrics: AUC, precision, recall, and F1 score to choose the best model. Within our paper, we further discuss the the business value of counteracting spam and malicious messages through classification, the data sources and preprocessing, and explaining which model and feature extraction method performs best. Following this we construct a proposal of how to deploy our solution into production.

Name		Name	Last commit message	Last commit date
Latest commit History 62 Commits
data		data
src		src
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

WhosInMyDMs

About

Uh oh!

Releases

Packages

Contributors 3

Uh oh!

Languages

License

beelze-b/WhosInMyDMs

Folders and files

Latest commit

History

Repository files navigation

WhosInMyDMs

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Uh oh!

Languages

Packages