binary_classification_tree

Simple implementation of a binary classification tree on data with numeric features using the gini-index as an impurity measure. Additionally, bagging and random forests have been implemented.

Run experiment.py to train the models on Eclipse 2.0 and predict the presence of bugs on the Eclipse 3.0 dataset. The first experiment uses the vanilla binary decision tree. The second experiment uses the bagging technique while the third experiment runs the random forests algorithm. After running the models a small classification report is shown alongside statistical tests to compare the performance in accuracy between models.

The main functions for growing and testing trees with or without bagging can be found in main.py. For random forests, set the "nfeat" parameter to the nearest integer to the square root of the number of predictor variables.

tree_node.py contains the definition of the Tree and Node primitives used by the algorithms.

load_data.py handles the input/output and includes the specific predictor variables that are used.

fast_split.py contains the definition of a numba compiled algorithm for finding the optimal split with highest impurity reduction. Numba compilation is used to speed up this most time consuming step of the algorithm.

The requirements.txt file specifies the dependencies for the project.

pruned subtree showing first three splits

References

Dataset used described in T. Zimmermann, R. Premraj and A. Zeller, "Predicting Defects for Eclipse"

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
data		data
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
pruned_tree.png		pruned_tree.png
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

binary_classification_tree

References

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

binary_classification_tree

References

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages