Database Building with SQL

Iron Hack's week 3 project.

Index

Introduction 🎞
Contents📁
Problem Instructions 📝
Data Exploration 🔍
Data Cleaning 🧹
Building a Database👷
Filling with dummy values 🤖
Queries ☝🤓

Introduction 🎞

A man named Deli Ushion, in 2023 A.D., has decided to re-open Blockbuster as a self-service automatic movie rental store without staff. This is not a great idea, but he's paying us, so we have to obey.

Deli says that he has recover some Blockbuster's data from back in the day and he wants us to clean the data and export it into a database. He's no programmer so he's trusting our criteria as long as it is a SQL database (his brother-in-law, Manuel, told him about it and he believes is the next big thing).

Contents📁

data: all the .csv cleaned.
img: folder with the images used in the readme.
notebooks: all the notebooks used in the project.
sql-csvs: csvs with dummy data for customer and rental table.
sql-scripts: all the .sql scripts.

This readme only contains the conclussions, the process is explained in the notebooks

Problem Instructions 📝

The problem is divided into 4 parts and a bonus:

Explore the data and write down what you have found
- you can use: df.describe(), df["column"], etc.
Clean the data (you can get rid of columns that doesn't give information)
Build your database
Write at least 10 queries including: join, groupby, orderby, where, subqueries….that you think will be useful to get interesting insights from the data.(SELECT* FROM TABLE_NAME doesn't count...)

Bonus: Get creative!!! Create totally new tables or enrich the csv files with new data (found on the internet or even made up) that makes your database more valuable.

Let's get into it!

Data Exploration 🔍

The tables that are worth keeping for the SQL database are: actor, film, inventory and rental. old_HDD, when transformed, will help us relate film, category and actor.

On the other side, language would have been useful if film had no repeated values in the language_id column. I don't believe there's something that can be done about it with just data cleaning and transformation, so I'm going to drop it.

Data Cleaning 🧹

I did a general cleaning of all of the tables mentioned before and then I modified the old_HDD to be the actor_film table, serving as a many-to-many table for the two of them. I also used the category of that table to include it into the film table and have each film with its category.

Building a Database 👷‍♂️

The database created was the following:

database

Filling with dummy values 🤖

In order to make queries more interesting, I filled customer with fake data using the Faker library, and I modified the original rental.csv so that the dates are from 2023 and the ids match with the tables created in the previous section and with the newly created customer table.

Queries ☝🤓

Deli asked some questions about the store one month after the grand opening and it's surprisingly not going bad.

With the previous dummy values and the cleaned ones, I challenged myself doing queries that I didn't get how to do it.

Name		Name	Last commit message	Last commit date
Latest commit History 77 Commits
data		data
img		img
notebooks		notebooks
sql-csvs		sql-csvs
sql-scripts		sql-scripts
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
src.py		src.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Database Building with SQL

Index

Introduction 🎞

Contents📁

Problem Instructions 📝

Data Exploration 🔍

Data Cleaning 🧹

Building a Database 👷‍♂️

Filling with dummy values 🤖

Queries ☝🤓

About

Releases

Packages

Languages

License

Kohkitos/sql-data-base-building

Folders and files

Latest commit

History

Repository files navigation

Database Building with SQL

Index

Introduction 🎞

Contents📁

Problem Instructions 📝

Data Exploration 🔍

Data Cleaning 🧹

Building a Database 👷‍♂️

Filling with dummy values 🤖

Queries ☝🤓

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages