-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Showing
1 changed file
with
50 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,2 +1,52 @@ | ||
# Movie-Recommender | ||
Built Clustering model to design a movie recommender based on rating and tags by other users. | ||
|
||
# Problem Statement | ||
## Blockbuster or art film? | ||
1. Set up a data science project structure in a new git repository in your GitHub account | ||
2. Download the one of the MovieLens datasets from https://grouplens.org/datasets/movielens/ | ||
3. Load the data set into panda data frames | ||
4. Formulate one or two ideas on how the combination of ratings and tags by users helps the data set to establish additional value using exploratory data analysis | ||
5. Build one or more clustering models to determine similar movies to recommend using the other ratings and tags of movies by other users as features | ||
6. Document your process and results | ||
7. Commit your notebook, source code, visualizations and other supporting files to the git repository in GitHub | ||
|
||
|
||
## Data Description | ||
This dataset (ml-latest-small) describes 5-star rating and free-text tagging activity from http://movielens.org, a movie recommendation service. It contains 100836 ratings and 3683 tag applications across 9742 movies. These data were created by 610 users between March 29, 1996 and September 24, 2018. Users were selected at random for inclusion. All selected users had rated at least 20 movies.Each user is represented by an id and had rated at least 20 movies. | ||
The data are contained in the files `links.csv`, `movies.csv`, `ratings.csv` and `tags.csv`. | ||
|
||
|
||
## What we want to do? | ||
- use few Clustering models to recommend movies based on ratings and tags by other users. | ||
|
||
## Analysis of Data | ||
|
||
 | ||
|
||
### Using Feature Engineering and exploratory analysis we could use features to the best for training Clustering models and generate Recommendations | ||
|
||
- First, we used KMeans Clustering Model, after finding optimal k value which was 20 we could form these clusters: | ||
|
||
 | ||
|
||
- Then we used Agglomerative Clustering Model with 20 clusters again and formed these clusters: | ||
|
||
 | ||
|
||
- We then built two recommenders each based on above mentioned clustering models. | ||
|
||
## Results | ||
### When tried to find 5 recommendations for Titanic (1997) | ||
#### KMeans based Recommender suggested: | ||
- American Beauty (1999) | ||
- Good Will Hunting (1997) | ||
- Titanic (1997) | ||
- Eternal Sunshine of the Spotless Mind (2004) | ||
- Beautiful Mind, A (2001) | ||
#### Agglomerative Clustering based Recommender suggested: | ||
- Beautiful Mind, A (2001) | ||
- Rob Roy (1995) | ||
- Pearl Harbor (2001) | ||
- Bodyguard, The (1992) | ||
- About Time (2013) |