This project performs an Exploratory Data Analysis (EDA) on a Netflix dataset using Python. It explores various insights about the content available on Netflix through visualizations.
- Clean and preprocess the Netflix dataset
- Analyze:
- Distribution of Movies vs TV Shows
- Proportion of content ratings
- Movie duration patterns
- Release trends over the years
- Top contributing countries
- Yearly comparison of Movies vs TV Shows
The dataset (netflix_dataset.csv) includes the following relevant columns:
type: Indicates whether the title is a Movie or TV Showtitle: Name of the contentcountry: Country of originrelease_year: Year the title was releasedrating: Content rating (e.g., TV-MA, PG, etc.)duration: Duration in minutes (for Movies) or seasons (for TV Shows)- File:
netflix_dataset.csv - Source: Kaggle - Netflix Shows Dataset
- Python
- pandas
- matplotlib
The notebook generates the following plots:
- Content Type Distribution – Bar chart showing the count of Movies vs TV Shows.
- Rating Distribution – Pie chart illustrating the proportion of different content ratings.
- Movie Duration Distribution – Histogram showing the distribution of movie lengths.
- Titles Released Per Year – Scatter plot of number of titles released annually.
- Top 10 Countries – Horizontal bar chart showing countries with the most titles.
- Movies vs TV Shows Over Time – Subplot comparing yearly trends in Movies and TV Shows.
- Clone this repository:
git clone https://github.com/your-username/netflix-eda.git
cd netflix-eda