Netflix has become a major player in the entertainment industry offering a diverse array of TV Shows and Movies to its global audience.
This dataset provides a snapshot of Netflix's content library including various attributes such as titles, genres, release years, ratings and durations.
The dataset used for this analysis is sourced from Kaggle and includes information on Netflix TV Shows and Movies.
Link to the Dataset : Netflix TV Shows and Movies
-
To gain insights into the content available on Netflix, understanding the patterns to uncover valuable insights into how the platform evolves its offerings.
-
This Exploratory Data Analysis (EDA) aims to address the following key questions :
-
Content Distribution : What are the distribution patterns of TV Shows and Movies across different genres and countries? How does the content vary in terms of release year and rating?
-
Trend Analysis : Are there observable trends in the release of TV Shows and Movies over time? How has the number of new additions to Netflix's library evolved?
-
Genre Popularity : What are the most and least popular genres in Netflix’s library? How does genre popularity differ by region or over time?
-
Content Characteristics : What are the typical characteristics (e.g., duration, rating) of TV Shows vs Movies? How do these characteristics vary by genre or release year?
-
Regional Diversity : How diverse is Netflix’s content offering in terms of geographic origin? Are there particular regions that contribute more significantly to Netflix’s library?
-
-
This analysis will provide a deeper understanding of Netflix content library, revealing trends and patterns that can address future content strategy and development.
- Setting up the Enviroment
- Libraries required for the Project
- Getting started with Repository
- Steps involved in the Project
- Conclusion
Jupyter Notebook is required for this project and you can install and set it up in the terminal.
- Install the Notebook
pip install notebook
- Run the Notebook
jupyter notebook
NumPy
- Go to the terminal and run this code
pip install numpy
Pandas
- Go to the terminal and run this code
pip install pandas
Matplotlib
- Go to the terminal and run this code
pip install matplotlib
Seaborn
- Go to the terminal and run this code
pip install seaborn
- Clone this repository to your local machine by using the following command :
git clone https://github.com/TheMrityunjayPathak/Netflix-Data-Analysis.git
Importing Libraries
- Importing necessary libraries like numpy, pandas, matplotlib and seaborn.
Reading CSV File
- Reading CSV file by using pd.read_csv() method.
Overview of the Dataset
-
Information about shape and size of the dataset.
-
Types of column present in the dataset (numerical, categorical, text).
-
Detailed Info about the dataset using df.info() method.
Handling Null values in the Dataset
- Filling the null values with most frequent category in categorical columns.
Changing DataType of Columns
- Modifying the datatype of date_added column to pandas datetime format.
Utilizing existing information to create new Columns
-
Extracting year, month and dates from date_added column.
-
Splitting listed_in column based on (,) and selecting first value as genre.
-
Splitting cast column based on (,) and selecting first value as lead actor.
Splitting the Dataset
- Splitting the dataset based on type of content (like TV Shows and Movies).
Statistical Analysis
-
No. of TV Shows and Movies available on Netflix.
-
No. of shows in each rating category.
-
No. of shows released each year.
Data Visualization
- No. of TV Shows and Movies available on Netflix.
- No. of shows in each Rating Category.
- No. of shows uploaded on Netflix each year.
- No. of shows uploaded on Netflix each month.
- No. of shows uploaded on Netflix each day.
- No. of shows available on Netflix in each country.
- No. of Movies released on Netflix in each genre.
- No. of TV Shows released on Netflix in each genre.
- No. of Movies for a lead actor on Netflix.
- No. of TV Shows for a lead actor on Netflix.
- Avg. length of Movies in each genre.
- Avg. length of TV Shows in each genre.
- Distribution of length of Movies on Netflix.
- Distribution of seasons of TV Shows on Netflix.
Here are some key findings about the analysis :
-
Cleaned and analyzed dataset of 8000+ Netflix Movies and TV Shows.
-
More than 60% of the content on Netflix is rated for Mature Audience Only.
-
More than 20% of the Movies and TV Shows are uploaded on 1st Day of the Month.
-
More than 30% of the content is exclusive for United States.