Skip to content

This repository contains Python scripts for calculating the Gini Impurity measure for each feature in a relational dataset, great for feature selection, data preprocessing, decision tree construction, binary classification tasks.

Notifications You must be signed in to change notification settings

a-partovii/Gini-Impurity

Repository files navigation

Gini Impurity Calculation

This project provides implementations for calculating Gini Impurity. The scripts compute the Gini Impurity for each feature column and its values in a given relational dataset. Ideal for feature selection and data analysis workflows.

It includes two versions:

  • (Classic): Calculates Gini Impurity using the last column as the label, supporting only 'yes' or 'no' labels.
  • (Evolved): Can work with any feature and its values as the label column and label, relative to a user-specified target.

Gini Preview

Attentions

  1. Place the dataset file path into the file_path variable inside the script.
  2. In the classic version the last column of the dataset must contain the ('yes' or 'no') values as labels.
  3. Make sure the dataset is an Excel or CSV file(.xlsx, .xls, .csv) and properly formatted.

Requirements

  • Python 3.x
  • Libraries: pandas termcolor

Documentation

All details, including the explanation of the code and its output, are available in the gini-impurity.pdf file provided with this project.

It is part of a practice project assigned by a university professor, designed to help students learn basic data analysis concepts. This is a simple exercise and not intended for production use.

About

This repository contains Python scripts for calculating the Gini Impurity measure for each feature in a relational dataset, great for feature selection, data preprocessing, decision tree construction, binary classification tasks.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages