Skip to content

NathanielH-snek/DSSalaryAnalysis

Repository files navigation

AI Jobs Salary Analysis

This project looks into how remote work affects wages for jobs.


🔍 Overview

As often the people performing the analysis, data scientists are particularly interested in salaries of various data science related roles. Various attempts have been made to predict salaries using both regression and machine learning to identify the optimal locations, job titles, etc. A complexity of data science is the wide range of job titles assigned by various titles. This is in some regards inherent to data science due to its very wide range of job duties. As such assessing which job titles pay the most may result in more efficient job searches. The dataset for this analysis also includes a wide range of countries of residence and job location. Insight into disparity of salaries across countries may generate better understanding of what to consider when applying to jobs. In fact average cost of living may be a variable worth adding to this dataset in the future. The most interesting part of this data is however in regards to remote employees. Naturally remote work as proliferated since 2020 due to COVID-19. As such and also as a future data science professional it would be interesting to understand how working remotely may affect wages, to allow for an understanding if it is worth the potential convenience.


🛠️ Tech Stack

  • R
  • Quarto
  • GGPlot

🚀 Key Steps Taken

  • Filtered the dataset to focus only on 2024 data (ensures relevance and eliminates time-based confounders).
  • Refactored and explored categorical variables (e.g., remote_ratio, experience_level, etc.).
  • Visualized salary distributions and examined skewness, especially across experience levels.
  • Created a DAG to logically structure relationships and determine adjustment sets for regression.
  • Built a linear model to test hypothesis, controlling for:
    • us_not (employee_residence),
    • employment_type,
    • experience_level.
  • Assessed model diagnostics including residuals, Cook’s distance, and leverage.

📈 Results

  • Contrary to the hypothesis, both full and partial remote work are associated with statistically significant lower salaries:
    • Full remote: ~$14k–$17k less
    • Some remote: ~$4k–$44k less (wide CI due to small n)
  • Experience level is strongly correlated with salary, as expected.
  • US-based employees earn significantly more on average.
  • Model explains ~14.6% of the variance in salaries (low R², expected given many unmeasured factors like company type, skills, industry, etc.) or ~22.2 % with a logged model

🧠 What I Learned

  • Learned how to assess OLS models
  • Gained experience handling and visualizing data in R

📦 Installation & Usage

Clone the repo

git clone https://github.com/NathanielH-snek/DSSalaryAnalysis.git

Load the qmd file in RStudio Install packages in RStudio Run all cells to replicate analysis with the included data

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors