Movie Recommendation System Using Netflix Dataset

This project implements a movie recommendation system using the Netflix Prize Dataset. The focus is on applying Collaborative Filtering methods to provide personalized movie recommendations. The system includes both User-User Collaborative Filtering and Movie-Movie Collaborative Filtering.

Key Steps:

1. Data Preprocessing:

Processed a dataset of 100 million records, focusing on 100 movies and 2000 users for analysis.
Performed Exploratory Data Analysis (EDA) on a 10 million record sample and the full dataset.
Preprocessed data stored in Parquet format on Amazon S3.

2. Data Transformation:

Converted the dataset into efficient dictionaries for reduced time complexity:
- user2movie: Maps users to the movies they rated.
- movie2user: Maps movies to the users who rated them.
- usermovie2rating: Maps user-movie pairs to their ratings.
- usermovie2rating_test: Used for testing purposes.

3. Collaborative Filtering:

Implemented User-User Collaborative Filtering using Matrix Factorization on a subset of 10 million records (1,000 users, 200 movies).
Movie-Movie Collaborative Filtering was also applied using the same subset.

4. Advanced Models:

Implemented a Restricted Boltzmann Machine (RBM) for enhancing recommendation accuracy.

Performance Metrics:

User-User Collaborative Filtering:

Mean Squared Error (MSE):
- Train MSE: 0.5854
- Test MSE: 0.8420
Root Mean Squared Error (RMSE):
- Train RMSE: ~0.765
- Test RMSE: ~0.917

Comparison with Netflix Prize Benchmark:

The baseline RMSE for the competition was 1.025, with the winning RMSE target around 0.91.
The model’s performance is close to the benchmark, demonstrating the robustness of the approach, especially on the test set.

Tools and Technologies:

Big Data Processing: PySpark, Databricks
Storage: Amazon S3
Libraries Used:
- Python
- pyspark, numpy, pandas
Modeling Techniques:
- User-User Collaborative Filtering
- Movie-Movie Collaborative Filtering
- Matrix Factorization
- Restricted Boltzmann Machine (RBM)

Conclusion:

This project successfully demonstrated the effectiveness of User-User Collaborative Filtering and Movie-Movie Collaborative Filtering. These methods produced promising results, with RMSE values closely aligned to the Netflix Prize benchmark. Although the other models (Bayesian Matrix Factorization and RBM) were developed for future refinement, the primary focus was on achieving strong performance in the collaborative filtering techniques. Further experimentation and optimization could help improve the models and expand the dataset for better scalability.

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
Dictionaries.ipynb		Dictionaries.ipynb
EDA_of_Netflix_mini_dataset.ipynb		EDA_of_Netflix_mini_dataset.ipynb
Matrix_Factorization.ipynb		Matrix_Factorization.ipynb
Movie_Movie_Collaberative_filtering.ipynb		Movie_Movie_Collaberative_filtering.ipynb
RBM_tf.ipynb		RBM_tf.ipynb
README.md		README.md
User_User_Colaberative_Filtering.ipynb		User_User_Colaberative_Filtering.ipynb
User_data_Preprcoessing.ipynb		User_data_Preprcoessing.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Movie Recommendation System Using Netflix Dataset

Key Steps:

1. Data Preprocessing:

2. Data Transformation:

3. Collaborative Filtering:

4. Advanced Models:

Performance Metrics:

User-User Collaborative Filtering:

Comparison with Netflix Prize Benchmark:

Tools and Technologies:

Conclusion:

Sources:

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Movie Recommendation System Using Netflix Dataset

Key Steps:

1. Data Preprocessing:

2. Data Transformation:

3. Collaborative Filtering:

4. Advanced Models:

Performance Metrics:

User-User Collaborative Filtering:

Comparison with Netflix Prize Benchmark:

Tools and Technologies:

Conclusion:

Sources:

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages