This project focuses on analyzing customer sentiment based on textual data, such as product reviews, feedback, or social media posts. The goal is to classify customer feedback into different sentiment categories (positive, negative, neutral) using machine learning techniques and visualize the results through various graphs and charts.
- Python
- Pandas
- Scikit-learn
- Matplotlib
- Seaborn
- WordCloud
- Jupyter Notebook (or any other Python IDE)
- NLTK (for text preprocessing)
The dataset consists of customer feedback, which includes the following columns:
- Review: The textual feedback given by customers.
- Sentiment: The sentiment label (positive, negative, neutral).
The objective is to analyze consumer complaints data to predict whether a complaint would be disputed based on the issue reported.
- Loaded and cleaned a large consumer complaints dataset from the Consumer Financial Protection Bureau (CFPB).
- Processed and transformed textual data into features using techniques like label encoding and CountVectorizer.
- Explored key trends in consumer complaints, including the most common issues, products, and companies involved in disputes.
- Created visualizations and word clouds to reveal insights into common complaint issues and responses from companies.
- Applied and evaluated multiple machine learning algorithms:
- Naive Bayes
- Decision Tree
- K-Nearest Neighbors (KNN)
- Achieved accuracies of:
- 78.66% with Naive Bayes
- 79.83% with Decision Tree
- 78.94% with KNN on test data.
- Sentiment Distribution: The overall sentiment of customer feedback is categorized into positive, negative, and neutral, with a visual representation of the distribution.
- Word Clouds: Display the most frequent words used in positive, negative, and neutral reviews.
- Confusion Matrix: Visualizes the performance of the sentiment classification model.