An Empirical Replication of CNM06: Supervised Model Comparisons.
Research Focus:
This was my smaller scale replication of the CNM06 paper for my COGS118A class. I took three algorithms, SVM, Decision Trees, and Random Forests, analysing their performance across 4 datasets and 10 trials. I also split my data per dataset 4 times using testing partitions of 0.2, 0.3, 0.4, and 0.5 to further analyse what effect, if any, data splitting had on my analysis.
The algorithm's performances were tested using 4 metrics: Accuracy, Precision, Recall, and F1 which I discuss further in my paper.
The final report can be found under "COGS 118A Final Project Report.pdf".
The code for running all algorithms, datasets and trials can be found under "COGS 118A Final Project Code.ipynb".
The code for my heatmaps per dataset can be found under "COGS 118A Final Project - Heat Map.ipynb".
Raw Data collected from my analysis can be found under each file titled "Raw Testing Data (Data_Name).xlsx"
Datasets can be found from the UCI Machine Learning Repository:
https://archive.ics.uci.edu/ml/datasets/adult
https://archive.ics.uci.edu/ml/datasets/Letter+Recognition