Netflix Data Analysis with PostgreSQL and Python

This repository contains a comprehensive Exploratory Data Analysis (EDA) of the Netflix Movies and TV Shows dataset using PostgreSQL and Python.
The project is structured around 15 business-driven analytical questions, demonstrating a wide range of SQL techniques — from basic filtering and aggregation to advanced analytics such as window functions and full-text search.

Additionally, it includes a Python notebook for further data transformation, visualization, and integration with the IMDb dataset, allowing for a deeper exploration of patterns and rating correlations between the two platforms.

📊 Project Overview

The main goal of this project is to analyze Netflix’s content library to extract meaningful insights.
The analysis is performed in two complementary parts:

SQL Analysis (PostgreSQL):
A single SQL script defines the database schema, loads the dataset, and answers 15 analytical questions.
Python Analysis (Jupyter Notebook):
A notebook that performs data cleaning, visualization, and integration with IMDb data to analyze trends in movie ratings and categories.

✨ Key SQL Concepts Demonstrated

This project showcases practical applications of the following SQL techniques:

Schema Definition: Creating relational tables with appropriate data types (CREATE TABLE).
Data Transformation & Cleaning: Applying UNNEST, string_to_array, split_part, and type casting (::INT, to_date).
Advanced Aggregation: Using GROUP BY GROUPING SETS to compute totals and subtotals.
Window Functions: Employing RANK() to find the top-N items per category.
Common Table Expressions (CTEs): Organizing complex logic with WITH clauses for readability.
Date/Time Functions: Filtering and aggregating data by time intervals (CURRENT_DATE, INTERVAL).
Full-Text Search: Implementing search capabilities using to_tsvector and plainto_tsquery.
Pattern Matching: Using LIKE and ILIKE for flexible string filtering.

🐍 Key Python Concepts Demonstrated

Pandas & NumPy: Cleaning, transforming, and restructuring raw data for consistency.
Exploratory Data Analysis (EDA): Generating descriptive statistics and identifying key trends.
Data Merging: Integrating Netflix data with IMDb datasets to uncover rating and genre correlations.
Visualization: Using Matplotlib and Seaborn to visualize rating distributions and patterns.

💾 Datasets

Dataset	Source	Description
Netflix Movies and TV Shows	Kaggle	Main dataset containing Netflix titles and metadata.
IMDb Datasets	IMDb Data Interface	External data source providing movie ratings and title information.

⚙️ Prerequisites

Before running the analysis, make sure you have the following installed:

PostgreSQL (installed and running)
Python 3.x
Jupyter Notebook

Required Python libraries:

pip install pandas numpy matplotlib seaborn psycopg2-binary

1. Table Creation

First, run the CREATE TABLE statement from the top of the provided SQL script to create the netflix table structure in your database.

DROP TABLE IF EXISTS netflix;
CREATE TABLE netflix
(
    show_id      VARCHAR(5),
    type         VARCHAR(10),
    title        VARCHAR(250),
    director     VARCHAR(550),
    casts        VARCHAR(1050),
    country      VARCHAR(550),
    date_added   VARCHAR(55),
    release_year INT,
    rating       VARCHAR(15),
    duration     VARCHAR(15),
    listed_in    VARCHAR(250),
    description  VARCHAR(550)
);

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
README.md		README.md
netflix_script.sql		netflix_script.sql
netflix_titles.csv		netflix_titles.csv
nextflix_imdb_rating.ipynb		nextflix_imdb_rating.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Netflix Data Analysis with PostgreSQL and Python

📊 Project Overview

✨ Key SQL Concepts Demonstrated

🐍 Key Python Concepts Demonstrated

💾 Datasets

⚙️ Prerequisites

1. Table Creation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Netflix Data Analysis with PostgreSQL and Python

📊 Project Overview

✨ Key SQL Concepts Demonstrated

🐍 Key Python Concepts Demonstrated

💾 Datasets

⚙️ Prerequisites

1. Table Creation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages