data-science-workbooks

Repository to house data science related code.

Usage

This repo is configured with the Dev Containers VS Code extension. To run locally pull the project down and run in a container. All dependencies are included in this process.

Workbooks

spark-machine-learning

Jupyter notebook demonstrating the process of producing a machine learning model in Spark.

Purpose

Take super hero data to create a model that predicts a combat score based on other elements.

Source

Marvel Superheroes Kaggle Dataset

Methods

Process included analyzing the data to determine best model for purpose and assist with feature selection. Data visualization to better understand data. Cleansing the dataset to prepare for model training. Model training using the Spark Random Forest Regression framework. Model evaluation using Sparks ML Evaluator functions.

Dependencies

Managed in .devcontainer/dependencies.sh.

Python
PySpark
Kaggle Cli
Numpy
Seaborn

flowchart LR

A[Hard] -->|Text| B(Round)
B --> C{Decision}
C -->|One| D[Result 1]
C -->|Two| E[Result 2]

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
.devcontainer		.devcontainer
workbooks		workbooks
.gitignore		.gitignore
readme.md		readme.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

data-science-workbooks

Usage

Workbooks

spark-machine-learning

Purpose

Source

Methods

Dependencies

About

Releases

Packages

Languages

cjlludwig/data-science-workbooks

Folders and files

Latest commit

History

Repository files navigation

data-science-workbooks

Usage

Workbooks

spark-machine-learning

Purpose

Source

Methods

Dependencies

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages