Skip to content

Latest commit

 

History

History
33 lines (23 loc) · 1.5 KB

README.md

File metadata and controls

33 lines (23 loc) · 1.5 KB

Twitter Sentiment Analysis: Project Overview

Analyzed tweets to determine positive, negative, or neutral sentiment from kaggle competition data.

Resources Used:

Python Version: 3.6 Packages: pandas, numpy, plotly, SpaCy, nltk

Exploratory Data Analysis

Distribution of sentiments in training data alt text

To evaluate my models I used the Jaccard index, which determines the similarity of two sample sentences.

Here are the distributions of Jaccard scores on tweets compared with training tweets and selected parts of a tweet. alt text alt text

List of the most common words (after removal of stopwords) alt text alt text

Model Building

I used SpaCy to teach my named entity recogniser alt text My steps:

  1. Load the model
  2. Shuffle and loop over selected training examples
  3. Save the model
  4. Test the model