🎬 Movie Recommendation Engine

📌 Overview

This project is a Movie Recommendation Engine that provides personalized suggestions using six different machine learning algorithms. Each model uses structured movie metadata to learn what the user likes and predict similar content.

👉 Live Demo

Workflow

1️⃣ Data Preprocessing – Movie Vector Creation

We start by converting raw movie metadata into a fixed 800-dimensional vector representation.

Steps:

Extract features from: genres, cast, overview, keywords, director
Use TF-IDF vectorization on text-based columns.
Reduce dimensionality (e.g., via PCA or Truncated SVD) to form 800-D vectors.

Formula:

TF-IDF Score:
TF-IDF(t, d) = TF(t, d) × log(N / DF(t))
where t = term, d = document, N = total documents, DF(t) = document frequency

➡️ Each movie becomes a numeric vector in ℝ⁸⁰⁰
➡️ These vectors are the basis for all models.

2️⃣ K-Nearest Neighbors (KNN)

Finds the most similar movies based on vector distances from liked movies.

Used for:

Recommending movies close to user's liked history in feature space.

Metrics:

Cosine Similarity:
sim(A, B) = (A · B) / (||A|| × ||B||)
Euclidean Distance:
dist(A, B) = √Σ (Ai - Bi)²

3️⃣ Clustering (K-Means / Hierarchical)

Groups movies into clusters of similar content.

Used for:

Recommending movies from the same cluster as liked ones.

Core Concept:

K-Means Objective:
Minimize total within-cluster variance:
∑ᵢ ∑ₓ ∈ Cᵢ ||x - μᵢ||²
where μᵢ is cluster centroid
Linkages in Hierarchical Clustering:
- Single-link: min distance
- Complete-link: max distance
- Average-link: average pairwise distance

4️⃣ Perceptron (Neural Network)

A shallow neural net that classifies whether a movie will be liked.

Used for:

Binary classification (like/dislike) from past data.

Model:

Input → Dense Layer → ReLU → Dense Layer → Sigmoid Output

Core Formula:

Output:
y = sigmoid(Wx + b)
where sigmoid(z) = 1 / (1 + e⁻ᶻ)
Loss Function: Binary Cross-Entropy
L = -[y·log(p) + (1-y)·log(1-p)]

5️⃣ Bayesian Recommendation (Naïve Bayes)

Classifies based on conditional probabilities assuming feature independence.

Used for:

Probabilistic prediction of movie preference.

Core Formula:

Bayes Theorem:
P(A | B) = (P(B | A) × P(A)) / P(B)
We predict the class (like/dislike) with the highest posterior.

6️⃣ Content-Based Filtering

Recommends movies that are textually or semantically similar to user's liked movies.

Used for:

Matching metadata (genres, plot, actors) using vector space models.

Similarity:

Same as cosine similarity:
sim(A, B) = (A · B) / (||A|| × ||B||)

🧠 Summary of Models

Model	Type	Formula / Key Concept
TF-IDF + Vectors	Preprocessing	TF-IDF(t, d) = TF × log(N / DF)
KNN	Similarity-based	Cosine / Euclidean distance
K-Means Clustering	Unsupervised	Minimize ∑
Perceptron	Neural Network	y = sigmoid(Wx + b)
Naïve Bayes	Probabilistic	P(A
Content-Based	Metadata Matching	Cosine similarity on text-based features

📊 Dataset

TMDB or similar movie dataset.
Features include: Title, Genres, Cast, Overview, Keywords, Ratings, Popularity, etc.

Name		Name	Last commit message	Last commit date
Latest commit History 155 Commits
.github/workflows		.github/workflows
Models		Models
Pre Processing		Pre Processing
Report		Report
Website		Website
.gitignore		.gitignore
Readme.md		Readme.md
Report.pdf		Report.pdf
report.md		report.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

🎬 Movie Recommendation Engine

📌 Overview

Workflow

1️⃣ Data Preprocessing – Movie Vector Creation

Steps:

Formula:

2️⃣ K-Nearest Neighbors (KNN)

Used for:

Metrics:

3️⃣ Clustering (K-Means / Hierarchical)

Used for:

Core Concept:

4️⃣ Perceptron (Neural Network)

Used for:

Model:

Core Formula:

5️⃣ Bayesian Recommendation (Naïve Bayes)

Used for:

Core Formula:

6️⃣ Content-Based Filtering

Used for:

Similarity:

🧠 Summary of Models

📊 Dataset

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 6

Uh oh!

Languages

Levi477/Recommendation-Engine

Folders and files

Latest commit

History

Repository files navigation

🎬 Movie Recommendation Engine

📌 Overview

Workflow

1️⃣ Data Preprocessing – Movie Vector Creation

Steps:

Formula:

2️⃣ K-Nearest Neighbors (KNN)

Used for:

Metrics:

3️⃣ Clustering (K-Means / Hierarchical)

Used for:

Core Concept:

4️⃣ Perceptron (Neural Network)

Used for:

Model:

Core Formula:

5️⃣ Bayesian Recommendation (Naïve Bayes)

Used for:

Core Formula:

6️⃣ Content-Based Filtering

Used for:

Similarity:

🧠 Summary of Models

📊 Dataset

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 6

Uh oh!

Languages

Packages