To merge the files: mergeFiles.py python3 mergeFiles.py /path-to-folder/ mergedFile

Learn the training set:nblearn.py python3 nblearn.py mergedFile model

Classify the test data:nclassify.py python3 nbclassify.py model testfile

Precision of SPAM: 0.946808 Recall of SPAM: 0.980716 F-Score of SPAM: 0.96346402

Precision of HAM: 0.992908 Recall of HAM: 0.98000 F-Score of HAM: 0.98641177

Precision of POS: 0.881175 Recall of POS: 0.758056 F-Score of POS: 0.814992

Precision of NEG: 0.787716 Recall of NEG: 0.897778 F-Score of NEG:0.839154

SVM:

Precision of SPAM: 0.956667 Recall of SPAM: 0.790634 F-Score of SPAM: 0.865762

Precision of HAM: 0.9285 Recall of HAM: 0.987 F-Score of HAM: 0.956859

Precision of POS: 0.955357 Recall of POS: 0.856000 F-Score of POS: 0.902954

Precision of NEG: 0.643564 Recall of NEG: 0.86667 F-Score of NEG: 0.738636

MEGAM: I have used binary class instead of multiclass(mentioned in Piazza)

Precision of SPAM: 0.980080 Recall of SPAM: 0.984000 F-Score of SPAM: 0.982036

Precision of HAM: 0.955432 Recall of HAM: 0.944904 F-Score of HAM: 0.950139

Precision of POS: 0.936937 Recall of POS: 0.832000 F-Score of POS: 0.881356

Precision of NEG: 0.592233 Recall of NEG: 0.81333 F-Score of NEG: 0.685393

When training set is reduced to 10% of the original training set

Precision of SPAM: 0.7623 Recall of SPAM: 0.684 F-Score of SPAM: 0.7210

Precision of HAM: 0.725 Recall of HAM: 0.786 F-Score of HAM: 0.75426

Precision of POS: 0.56819 Recall of POS: 0.6135 F-Score of POS: 0.59002

Precision of NEG: 0.51264 Recall of NEG: 0.5986 F-Score of NEG: 0.55231

SVM:

Precision of SPAM: 0.7053 Recall of SPAM: 0.6754 F-Score of SPAM: 0.80638

Precision of HAM: 0.7234 Recall of HAM: 0.6912 F-Score of HAM: 0.87614

Precision of POS: 0.7125 Recall of POS: 0.6195 F-Score of POS: 0.7397

Precision of NEG: 0.593 Recall of NEG: 0.6123 F-Score of NEG: 0.657005

MEGAM:

Precision of SPAM: 0.7953 Recall of SPAM: 0.6951 F-Score of SPAM: 0.74183

Precision of HAM: 0.7865 Recall of HAM: 0.7312 F-Score of HAM: 0.75784

Precision of POS: 0.6951 Recall of POS: 0.6651 F-Score of POS: 0.71015

Precision of NEG: 0.5123 Recall of NEG: 0.6542 F-Score of NEG: 0.57461

The Precision,Recall and F-Score in Naive baye's classifier,SVM and Megam decreases as the training set is reduced because the number of unknown words increases in dev set as the training set is reduced and the classifier will not be able to decide whether the words belong to SPAM and HAM.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Files

README.md

Latest commit

History

README.md

File metadata and controls