GitHub - 11Pk/MachineLearning

Project Motivation & Learning Process

This project was built mainly to understand the fundamental classification and regression algorithms in machine learning and how they behave on a real-world dataset.

I started with a resume dataset that contained both numerical information (such as years of experience) and textual information (such as skills). My initial goal was to predict:

Whether a candidate would be Hired or Not Hired (classification)
An AI-based resume score (regression)

Why TF-IDF Was Used

While exploring the data, I realized that important information like skills was stored as text, and machine learning models cannot directly work with raw text. Simply ignoring this field led to poor model performance and loss of important signals.

To address this, I used TF-IDF (Term Frequency–Inverse Document Frequency) to convert textual skill information into numerical features. This allowed the models to capture which skills were more informative across resumes, while reducing the importance of very common words.

Using TF-IDF helped combine text-based features with structured numeric data, creating a hybrid feature representation.

Model Selection and Experiments

I began with Linear Regression as a baseline model to understand how well a simple linear relationship could predict the resume score. While this provided an initial benchmark, it struggled to capture more complex patterns in the data.

To improve performance, I then used a Random Forest Regressor, which can model non-linear relationships and interactions between features. This significantly improved accuracy and resulted in a much higher explained variance.

For the classification task (Hire vs Not Hire), I used Logistic Regression to understand how probabilistic classification works and to keep the model interpretable.

Key Takeaways

Through this project, I gained a clear understanding of:

The difference between regression and classification
Why feature representation (especially for text data) is critical
How model complexity affects performance
Why ensemble models can outperform simple linear models on real-world data

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
README.md		README.md
Resume_Analyser.ipynb		Resume_Analyser.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Project Motivation & Learning Process

Why TF-IDF Was Used

Model Selection and Experiments

Key Takeaways

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Project Motivation & Learning Process

Why TF-IDF Was Used

Model Selection and Experiments

Key Takeaways

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages