This script performs sentiment analysis on IMDB movie reviews using the TextBlob library. The main goal is to process text data to determine sentiment polarity and subjectivity.
- Python Libraries:
pandasfor data manipulation.textblobfor performing sentiment analysis.nltkandspacyfor natural language processing tasks.
- Installation:
The script includes commands to install necessary libraries:
pip install nltk pip install -U spacy
- Dataset: The script reads data from a CSV file named 'Train.csv'.
- Reading Data:
train = pd.read_csv('Train.csv')
- Handling Missing Values:
train.dropna(inplace=True)
- Text Cleaning Functions:
Includes functions to remove punctuation, special characters, URLs, numbers, and stopwords. Additionally, it performs lemmatization to bring words to their base form.
def remove_punctuations(text): ... def custom_remove_stopwords(text): ... def remove_special_characters(text): ... def lemmatize_text(text): ... def remove_URL(text): ... def remove_numbers(text): ...
- Computing Sentiment:
Each text entry is processed to compute sentiment using TextBlob. The sentiment property of TextBlob outputs a polarity and subjectivity score.
train['sentiment'] = train['text'].apply(lambda tweet: TextBlob(tweet).sentiment)
- Output: 9,166 reviews classified as negative, Neutral sentiments are significantly fewer, totaling only 262 instances, Positive sentiments are the least common, with merely 25 instances.
- Discussion: Overall, the dominance of negative sentiment underscores a potential concern that may need addressing if the dataset represents a business or service. These insights can guide further analysis to pinpoint the underlying causes and inform strategies for enhancing satisfaction or perception.