🎮 Steam Review Sentiment Analysis

Analyze what players really think about a game — automatically, at scale — using NLP and machine learning on real Steam reviews.

Overview

This project scrapes thousands of user reviews from the Steam Web API, preprocesses the text, labels each review as positive or negative using a lexicon-based approach, and trains classification models to predict sentiment.

The analysis was performed on No Man's Sky (App ID: 275850), collecting 10,000 English reviews.

🔍 What This Project Does

Step	Description
Data Collection	Scrapes up to 10,000 reviews via the Steam API using cursor-based pagination
Preprocessing	Cleans text (removes mentions, URLs, numbers, punctuation), normalizes slang words, tokenizes, removes stopwords, and applies stemming
Labeling	Assigns a polarity score using an English sentiment lexicon (positive/negative word lists), then classifies each review
Visualization	Generates pie charts, class distribution plots, text length histograms, and word clouds for overall, positive, and negative reviews
Feature Extraction	Transforms text using TF-IDF (top 200 features) and Word2Vec (100-dimensional embeddings)
Classification	Trains and evaluates Naive Bayes and Random Forest classifiers across multiple schemes

📊 Results

Scheme	Model	Features	Train Accuracy	Test Accuracy
#1	Naive Bayes	TF-IDF	87.1%	86.7%
#2	Random Forest	TF-IDF	99.2%	98.6%
#3	Random Forest	Word2Vec	100%	98.6%

Random Forest consistently outperforms Naive Bayes, achieving ~98.6% test accuracy on both TF-IDF and Word2Vec features.

Sentiment Distribution in the Dataset

✅ Positive: 9,835 reviews (98.4%)
❌ Negative: 165 reviews (1.6%)

🛠️ Tech Stack

Python — pandas, NumPy, scikit-learn, NLTK, Gensim, Matplotlib, Seaborn, WordCloud
Steam Web API — for live review scraping
Jupyter Notebook — for interactive exploration and visualization

🚀 Getting Started

Prerequisites

pip install requests pandas nltk scikit-learn gensim matplotlib seaborn wordcloud

Run the Notebook

Clone the repository
Open Steam_Review_Sentiment_Analysis_Aditya_Nur_Huda.ipynb in Jupyter
Set the game_id variable to any Steam App ID
Run all cells

game_id = 275850  # Change this to analyze any game on Steam
df = get_steam_reviews(game_id, num_reviews=10000)

📁 Project Structure

.
├── Steam_Review_Sentiment_Analysis_Aditya_Nur_Huda.ipynb  # Main notebook
├── english-positive.csv                                    # Positive sentiment lexicon
├── english-negative.csv                                    # Negative sentiment lexicon
└── README.md

💡 Key Insights

The overwhelming majority of No Man's Sky reviews are positive, reflecting its successful comeback after a rocky launch.
Random Forest + TF-IDF provides the best balance of performance and interpretability.
Slang normalization is a critical preprocessing step for gaming communities, where informal language is ubiquitous.

👤 Author

Aditya Nur Huda

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
LICENSE		LICENSE
README.md		README.md
Steam_Review_Sentiment_Analysis_Aditya_Nur_Huda.ipynb		Steam_Review_Sentiment_Analysis_Aditya_Nur_Huda.ipynb
english-negative.csv		english-negative.csv
english-positive.csv		english-positive.csv
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🎮 Steam Review Sentiment Analysis

Overview

🔍 What This Project Does

📊 Results

Sentiment Distribution in the Dataset

🛠️ Tech Stack

🚀 Getting Started

Prerequisites

Run the Notebook

📁 Project Structure

💡 Key Insights

👤 Author

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

🎮 Steam Review Sentiment Analysis

Overview

🔍 What This Project Does

📊 Results

Sentiment Distribution in the Dataset

🛠️ Tech Stack

🚀 Getting Started

Prerequisites

Run the Notebook

📁 Project Structure

💡 Key Insights

👤 Author

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages