Skip to content

Benchmarking topic classification performance of an LDA model as well as a discriminative Neural Network using data collected on seven Twitter topics (hashtags).

Notifications You must be signed in to change notification settings

dai-anna/Duke-NLP-FinalProject

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Duke NLP Final Project

We collect data on seven different topics on twitter to build an LDA model as well as a discriminative Neural Network to benchmark their performance in topic classification.

Read our final report here.

Hashtags

  1. crypto
  2. tesla
  3. championsleague
  4. formula1
  5. thanksgiving
  6. holidays
  7. covid19

Steps to Reproduce

1) Create & activate virtual environment

python3 -m venv venv
source venv/bin/activate

2) Install dependencies

make install

3) Collect data (takes long, please skip and use content in data/)

make data-collect

4) Train LDA model

cd src
python3 lda_modeling.py

5) Train Neural Network (Requires multiple hours on a GPU enabled device)

cd src
# tune hyperparameters
python3 tf_hyperparameter_tuning.py

Visually inspect results in visualize_study.ipynb

# run neural network with chosen params
cd src
python3 tf_train_model_with_best_params.py

Contributors

Name Reference
Anna Dai GitHub Profile
Satvik Kishore GitHub Profile
Moritz Wilksch GitHub Profile

About

Benchmarking topic classification performance of an LDA model as well as a discriminative Neural Network using data collected on seven Twitter topics (hashtags).

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 3

  •  
  •  
  •