A Comparative Study of Machine Learning Algorithms for Veterinary and Speculative Scenarios

Overview

This project investigates the application of machine learning models to two distinct prediction tasks:

Forecasting health outcomes in horses
Predicting passenger transportation in the hypothetical Spaceship Titanic scenario

The study implements and compares four primary machine learning algorithms to analyze these diverse datasets, providing insights into model performance and optimization techniques.

Key Features

Implementation of multiple machine learning algorithms:
- CatBoost
- K-Nearest Neighbors (KNN)
- Support Vector Machine (SVM)
- Naive Bayes
Comprehensive data preprocessing pipeline
Hyperparameter tuning optimization
Ensemble methods implementation
Cross-validation techniques for robust evaluation

Key Steps

Data Preprocessing

Data cleaning and missing value handling
- Numeric columns: Mean value imputation
- Non-numeric columns: Mode value imputation
Label encoding for categorical variables
Feature engineering and dimensionality reduction
Dataset augmentation using random value generation within attribute ranges

Model Implementation

Dataset splitting into training and validation sets
10-fold Cross Validation implementation
Stratified sampling for balanced class distributions
Model training with hyperparameter optimization
Performance evaluation using F1-Score metric

Results (F1-Score)

Spaceship Titanic Dataset Performance

CatBoost: 80.64
KNN: 79.23
SVM: 75.55
Naive Bayes: 76.6

Horse Health Dataset Performance

CatBoost: 78.65
KNN: 67.07
SVM: 46.95
Naive Bayes: 39.63

CatBoost consistently demonstrated the most robust performance across both datasets, achieving the highest scores in all scenarios.

Project Limitations

Dataset Size
- Initial dataset size was insufficient
- Required artificial data augmentation
- May impact model generalization
Feature Independence
- Naive Bayes assumption of feature independence may not hold true
- Could affect model accuracy in real-world scenarios
Computational Resources
- SVM implementation may be computationally intensive
- Could limit scalability for larger datasets
Model Complexity
- Advanced models like CatBoost may require more tuning
- Could increase implementation complexity

Name		Name	Last commit message	Last commit date
Latest commit History 24 Commits
Bank-Churn		Bank-Churn
Horses-Health		Horses-Health
Spaceship-Titanic		Spaceship-Titanic
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

A Comparative Study of Machine Learning Algorithms for Veterinary and Speculative Scenarios

Overview

Key Features

Key Steps

Data Preprocessing

Model Implementation

Results (F1-Score)

Spaceship Titanic Dataset Performance

Horse Health Dataset Performance

Project Limitations

About

Releases

Packages

Languages

JayaWinata/Final-Project-PPM

Folders and files

Latest commit

History

Repository files navigation

A Comparative Study of Machine Learning Algorithms for Veterinary and Speculative Scenarios

Overview

Key Features

Key Steps

Data Preprocessing

Model Implementation

Results (F1-Score)

Spaceship Titanic Dataset Performance

Horse Health Dataset Performance

Project Limitations

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages