This project provides implementations for calculating Gini Impurity. The scripts compute the Gini Impurity for each feature column and its values in a given relational dataset. Ideal for feature selection and data analysis workflows.
It includes two versions:
- (Classic): Calculates Gini Impurity using the last column as the label, supporting only 'yes' or 'no' labels.
- (Evolved): Can work with any feature and its values as the label column and label, relative to a user-specified target.
- Place the dataset file path into the
file_pathvariable inside the script. - In the classic version the last column of the dataset must contain the ('yes' or 'no') values as labels.
- Make sure the dataset is an Excel or CSV file(
.xlsx,.xls,.csv) and properly formatted.
- Python 3.x
- Libraries:
pandastermcolor
All details, including the explanation of the code and its output, are available in the gini-impurity.pdf file provided with this project.
It is part of a practice project assigned by a university professor, designed to help students learn basic data analysis concepts. This is a simple exercise and not intended for production use.
