Skip to content

11Pk/MachineLearning

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 

Repository files navigation

Project Motivation & Learning Process

This project was built mainly to understand the fundamental classification and regression algorithms in machine learning and how they behave on a real-world dataset.

I started with a resume dataset that contained both numerical information (such as years of experience) and textual information (such as skills). My initial goal was to predict:

  • Whether a candidate would be Hired or Not Hired (classification)
  • An AI-based resume score (regression)

Why TF-IDF Was Used

While exploring the data, I realized that important information like skills was stored as text, and machine learning models cannot directly work with raw text. Simply ignoring this field led to poor model performance and loss of important signals.

To address this, I used TF-IDF (Term Frequency–Inverse Document Frequency) to convert textual skill information into numerical features. This allowed the models to capture which skills were more informative across resumes, while reducing the importance of very common words.

Using TF-IDF helped combine text-based features with structured numeric data, creating a hybrid feature representation.


Model Selection and Experiments

I began with Linear Regression as a baseline model to understand how well a simple linear relationship could predict the resume score. While this provided an initial benchmark, it struggled to capture more complex patterns in the data.

To improve performance, I then used a Random Forest Regressor, which can model non-linear relationships and interactions between features. This significantly improved accuracy and resulted in a much higher explained variance.

For the classification task (Hire vs Not Hire), I used Logistic Regression to understand how probabilistic classification works and to keep the model interpretable.


Key Takeaways

Through this project, I gained a clear understanding of:

  • The difference between regression and classification
  • Why feature representation (especially for text data) is critical
  • How model complexity affects performance
  • Why ensemble models can outperform simple linear models on real-world data

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors