Week4_TODO.txt

TODO:
- 1. Use lime to look at what my model likes...AND ADD SLIDE ABOUT THIS
x 2. add slides to website
/ 2. Try to combine the good answer finding system and the highlighting systems
/ 3. Maybe look into training my own method for finding sentence breaks...
x 4. Make sure the linear plots are actually from my linear regression and not from random forest
x 5. combine roc plots in inkscape (if using the plots from the linear model)
x 6. Make neural network graph slide
x 7. make technical error backup slides
x 8. edit axes on slides (answer histograms and viewcount x answerscore)
x 9. change slide link to the most recent version of slides
x 10. make the app say something if nothing to highlight
x 11. look into how much I can shrink the size of my aws instance.
x 12. add icon to tabs!
x 13. make new classification vs regression slide (show what percent of answers are above a ratings of 2...)
x 14. remake all plots with seaborn style... (just turn off the center part for confusion matrices. there's an easy style command for this)
x 15. get 2016 data and look at how often my model likes the first answer vs the second ect.
x 16. put stackoverflow password on my t2 aws instance (bash_profile or bashrc)
x 17. build model with residuals after removing variance explained by view count.
x 18. replace rnn ipynb notebook...
  19. retrain net but with binary crossentropy an single output

Presentation notes:
 * make the summary more summarary
x* Lots of questions about how decided to label answers with a score of 2 as "helpful." basically, how much can we trust this. how could i improve groundtruth? regressing out number of views...
 * unclear why bad recall is ok
x* what answers tend to be highlighted? how often not the first answer? (how often not first and why does this happen)
 * description of RNN
x* would the model improve with more data?
x* what could be a business application

FOR LIME PLOT - MAYBE THIS SHOULD BE NUMBER OF TIMES A WORD APPEARED IN THIS TOP LIST... weighted by the frequency of the word??

x XX. update blog post and images to match presentation!
  XX. work on linkedin
x XX. clean github
  XX. reread bengio chapter on recurrent neural nets

TAGLINES:
  Highlight helpful parts of Stack Overflow answers

x = done
/ = on ice
- = in progress

Threshold. And color items according to how helpful (shades of yellow)
Maybe look at only the code and learn good code vs bad code

LINKS TO USE:
  http://stackoverflow.com/questions/13784192/creating-an-empty-pandas-dataframe-then-filling-it
  Find some other sites to use for this.
  http://stackoverflow.com/questions/9236926/concatenating-two-one-dimensional-numpy-arrays - this is where the site works less well
  http://stackoverflow.com/questions/21887754/numpy-concatenate-two-arrays-vertically - similar topic but likes all the answers
  http://stackoverflow.com/questions/306400/how-do-i-randomly-select-an-item-from-a-list-using-python
  http://stackoverflow.com/questions/3501382/checking-whether-a-variable-is-an-integer-or-not


To run the server as a background app run (from ~/application & insight environment):
  gunicorn Web_App:app -D
To turn it off run:
  pkill gunicorn

this is from my p2
$ scp ubuntu@ec2-34-195-103-84.compute-1.amazonaws.com:/home/ubuntu/nbs/stackex_gru.h5 /home/dan-laptop/Insight/stackex_gru.h5
scp /home/dan-laptop/Insight/stackex_sum/Modeling/answer_db.csv ubuntu@ec2-34-195-103-84.compute-1.amazonaws.com:/home/ubuntu/nbs/answer_db.csv
this is from my t2
$ scp ubuntu@ec2-34-198-227-154.compute-1.amazonaws.com:/home/ubuntu/application/Web_App/views.py /home/dan-laptop/Insight/stackex_sum/Web_App/views.py
$ scp /home/dan-laptop/Insight/stackex_sum/Web_App/StackExchange_Query.py ubuntu@ec2-34-198-227-154.compute-1.amazonaws.com:/home/ubuntu/application/Web_App/StackExchange_Query.py


http://www.appscare.com/2015/03/embed-google-slides-webpage/

Presentation:
be quicker in the demo. take more time on the model validation slide (make sure to explain). put the results up one at time - roc curve then confusion matrix.
don't say that I would like to talk about this more later so often.