Inferential and Predictive Analytics with Formula 1 Data

About Formula 1

Formula 1, or F1, "is the highest class of single-seater auto racing sanctioned by the Fédération Internationale de l'Automobile (FIA)" (Source: Wikipedia). F1 is a global sport, with the annual championship involving races that have been held on all continents (except Antarctica).

In F1, constructors build two race cars and engage drivers to race in them. Drivers first compete for a starting position in the race in a qualifying round, with the fastest driver starting at the front. In the actual race, drivers compete for a higher finishing position by overtaking other cars. Each race involves about 60-80 laps, depending on the circuit.

About the project

This project constitutes the final project submission for GR5069: Applied Data Science offered by the Quantitative Methods in Social Science department in Columbia University. The project uses data from an F1 dataset to solve one inferential and one predictive problem. Full data was provided by the course instructors through an Amazon S3 bucket maintained for the course, although the data seem similar to this set from Kaggle.

Project Objectives

Build and maintain a well structured and documented record of work on a data science project to facilitate transferability and reproduction of work.
- Maintain a well-structured GitHub Repo with an informative landing page.
- Comment code using established best practices
- Make commits using established best practices
- Track models built for the project
Exercise proper data management on Amazon S3.
Understand and practice the different philosophies behind inferential and predictive data modelling.
Use Data Visualisations effectively.
Deal with missing data appropriately for a given task.

Inferential Task

The Inferential task seeks to answer the question "what factors explain why a driver arrives in second place in F1 races between 1950 and 2010?"

The question is approached using an informal theory of F1 strategy and performance, operationalised as a statistical model to understand which factors affect a driver's chance of arriving in second place, and how.

Prediction Task

The prediction task seeks to build a predictive model to predict which driver comes in second place for races between 2011 and 2017, using data from 1950 to 2010.

Project Progress

To Do

Done

General

Create basic repo structure
Populate README
Draw up to do list
Populate and maintain references

Prediction Task

Choose and justify choice of model evaluation metric
Split data into training and test data
Run and track several models with different hyperparameters on the test set.

Experiment with features that could help predict the target
- Reuse features from inferential task
- Add more features that could help
  - Wrangle data to provide these other features
- Use feature extraction and feature selection techniques to generate other features to try

Iterate over 3 and 4 to try and improve model performance
Share model results
Explain and discuss best performing models

Provide statistics of model performance
Provide measures of feature importance

Final

Comparison of predictive model with inferential model

Inferential Task

Develop and explain informal theory of F1 and testable hypotheses in Inferential.md
Develop and explain statistical approach, and operationalisation of variables, for the inferential task in Inferential.md
Wrangle data to provide variables of interest based on 2.
Run analysis and present results

Overall model fit
Variable importance
Marginal effects of variables

Discussion of statistical results in relation to proposed theory

Repo File Structure

project\
|
| -- src
|     |-- data            <- Code to read/munge raw data.
|     |-- features        <- Code to transform/append data.
|     |-- models          <- Code to analyze the data.
|     |-- visualizations  <- Code to generate visualizations.
|
| -- reports
|     |-- documents       <- Documents synthesizing the analysis.
|     |-- figures         <- Images generated by the code.
|
| -- References.md        <- Data dictionaries, explanatory materials.
|
| -- README.md            <- Project description.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Inferential and Predictive Analytics with Formula 1 Data

About Formula 1

About the project

Project Objectives

Inferential Task

Prediction Task

Project Progress

To Do

Done

General

Prediction Task

Final

Inferential Task

Repo File Structure

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
reports		reports
src		src
README.md		README.md
References.md		References.md

Folders and files

Latest commit

History

Repository files navigation

Inferential and Predictive Analytics with Formula 1 Data

About Formula 1

About the project

Project Objectives

Inferential Task

Prediction Task

Project Progress

To Do

Done

General

Prediction Task

Final

Inferential Task

Repo File Structure

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages