Predictive Analysis of Cardiovascular Disease Risk Factors

Aayush Jha | Indian Statistical Institute (ISI)

Problem Statement

This project aims to leverage machine learning to predict the presence of heart disease in patients based on a set of medical and demographic features. The goal is to build a reliable classification model and identify the most significant risk factors to aid in preventative healthcare strategies.

Dataset

The analysis was performed on the well-known "Heart Disease UCI" dataset, sourced from Kaggle.

Link: https://www.kaggle.com/datasets/redwankarimsony/heart-disease-data
Attributes: The dataset includes 14 attributes such as age, sex, chest pain type, resting blood pressure, cholesterol, and more.
Target Variable: num (1 = Heart Disease, 0 = No Heart Disease)

Methodology & Key Steps

Data Cleaning & EDA: Investigated feature distributions, handled missing values represented by '?', and visualized relationships with the target variable to uncover initial insights.
Data Preprocessing: Performed one-hot encoding on categorical variables and scaled all numerical features using StandardScaler to prepare the data for modeling.
Model Implementation: Developed and compared three distinct classification models:
- Logistic Regression (Baseline)
- Random Forest Classifier
- XGBoost Classifier
Evaluation: Assessed model performance on a held-out test set using Accuracy, Precision, Recall, F1-Score, and identified the best model.
Insight Generation: Extracted feature importances from the top-performing model to identify the key predictors of heart disease.

Key Findings & Visualizations

Finding 1: Chest pain type is a strong indicator of heart disease. The analysis showed that patients with 'non-anginal chest pain' (cp = 2) have a significantly higher likelihood of having heart disease compared to other chest pain types.

Finding 2: Several medical metrics are highly correlated. The correlation heatmap revealed strong relationships between features. For instance, thalach (maximum heart rate achieved) and slope (the slope of the peak exercise ST segment) showed a noticeable correlation with the presence of heart disease.

Model Performance

The models were evaluated, and Random Forest demonstrated the best overall performance for this prediction task.

Model	Accuracy	Precision (class 1)	Recall (class 1)	F1-Score (class 1)
Logistic Regression	78%	86%	76%	81%
Random Forest	84%	88%	83%	86%
XGBoost Classifier	83%	89%	82%	85%

Feature Importance from Random Forest Model:

The model identified which medical factors were most influential in its predictions. This provides a clear, data-driven focus for clinical screening.

The top 3 predictors identified were [Feature 1], [Feature 2], and [Feature 3]. (Replace these with the top 3 features from your Random Forest model).

How to Run

Clone the repository:

git clone [https://github.com/aayush-0131/Heart-Disease-Prediction-Project.git]

Install the required dependencies:
```
pip install -r requirements.txt
```
Open and run the Heart_Disease_Analysis.ipynb Jupyter Notebook.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
anaconda_projects/db		anaconda_projects/db
images		images
.gitignore		.gitignore
Heart_Disease_Analysis.ipynb		Heart_Disease_Analysis.ipynb
README.md		README.md
heart_disease_uci.csv		heart_disease_uci.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Predictive Analysis of Cardiovascular Disease Risk Factors

Problem Statement

Dataset

Methodology & Key Steps

Key Findings & Visualizations

Model Performance

How to Run

About

Uh oh!

Releases

Packages

Languages

aayush-0131/Heart-Disease-Prediction-Project

Folders and files

Latest commit

History

Repository files navigation

Predictive Analysis of Cardiovascular Disease Risk Factors

Problem Statement

Dataset

Methodology & Key Steps

Key Findings & Visualizations

Model Performance

How to Run

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages