- The project aims to predict heart disease using machine learning models based on medical data.
- The dataset includes patient details such as age, cholesterol levels, blood pressure, and ECG results.
- Data Cleaning: Checked for missing values and handled any inconsistencies.
- Feature Scaling: Standardized numerical data to improve model performance.
- Data Splitting: Divided the dataset into training and testing sets (80-20 split).
- SMOTE (Synthetic Minority Oversampling Technique) was used to balance the dataset.
- Feature Selection and Dimensionality Reduction (PCA, t-SNE) were applied to enhance model efficiency.
- Evaluated multiple machine learning models:
- Logistic Regression
- Random Forest
- Support Vector Machine (SVM)
- K-Nearest Neighbors (KNN)
- Neural Networks
- Used performance metrics like Accuracy, Precision, Recall, and AUC-ROC.
- Random Forest with SMOTE performed best, achieving around 96.8% accuracy.
- Feature Engineering & PCA improved model performance.
- The project demonstrates how machine learning can assist in early detection of heart disease, potentially helping doctors make better decisions.