Skip to content

alexanderapers/binary-classification-tree

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

13 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

binary_classification_tree

Simple implementation of a binary classification tree on data with numeric features using the gini-index as an impurity measure. Additionally, bagging and random forests have been implemented.

Run experiment.py to train the models on Eclipse 2.0 and predict the presence of bugs on the Eclipse 3.0 dataset. The first experiment uses the vanilla binary decision tree. The second experiment uses the bagging technique while the third experiment runs the random forests algorithm. After running the models a small classification report is shown alongside statistical tests to compare the performance in accuracy between models.

The main functions for growing and testing trees with or without bagging can be found in main.py. For random forests, set the "nfeat" parameter to the nearest integer to the square root of the number of predictor variables.

tree_node.py contains the definition of the Tree and Node primitives used by the algorithms.

load_data.py handles the input/output and includes the specific predictor variables that are used.

fast_split.py contains the definition of a numba compiled algorithm for finding the optimal split with highest impurity reduction. Numba compilation is used to speed up this most time consuming step of the algorithm.

The requirements.txt file specifies the dependencies for the project.

pruned subtree showing first three splits

References

About

Implementation of binary classification trees on data with numeric features. Also supports bagging and random forests.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages