-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathWeek4_TODO.txt
75 lines (62 loc) · 4.03 KB
/
Week4_TODO.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
TODO:
- 1. Use lime to look at what my model likes...AND ADD SLIDE ABOUT THIS
x 2. add slides to website
/ 2. Try to combine the good answer finding system and the highlighting systems
/ 3. Maybe look into training my own method for finding sentence breaks...
x 4. Make sure the linear plots are actually from my linear regression and not from random forest
x 5. combine roc plots in inkscape (if using the plots from the linear model)
x 6. Make neural network graph slide
x 7. make technical error backup slides
x 8. edit axes on slides (answer histograms and viewcount x answerscore)
x 9. change slide link to the most recent version of slides
x 10. make the app say something if nothing to highlight
x 11. look into how much I can shrink the size of my aws instance.
x 12. add icon to tabs!
x 13. make new classification vs regression slide (show what percent of answers are above a ratings of 2...)
x 14. remake all plots with seaborn style... (just turn off the center part for confusion matrices. there's an easy style command for this)
x 15. get 2016 data and look at how often my model likes the first answer vs the second ect.
x 16. put stackoverflow password on my t2 aws instance (bash_profile or bashrc)
x 17. build model with residuals after removing variance explained by view count.
x 18. replace rnn ipynb notebook...
19. retrain net but with binary crossentropy an single output
Presentation notes:
* make the summary more summarary
x* Lots of questions about how decided to label answers with a score of 2 as "helpful." basically, how much can we trust this. how could i improve groundtruth? regressing out number of views...
* unclear why bad recall is ok
x* what answers tend to be highlighted? how often not the first answer? (how often not first and why does this happen)
* description of RNN
x* would the model improve with more data?
x* what could be a business application
FOR LIME PLOT - MAYBE THIS SHOULD BE NUMBER OF TIMES A WORD APPEARED IN THIS TOP LIST... weighted by the frequency of the word??
x XX. update blog post and images to match presentation!
XX. work on linkedin
x XX. clean github
XX. reread bengio chapter on recurrent neural nets
TAGLINES:
Highlight helpful parts of Stack Overflow answers
x = done
/ = on ice
- = in progress
Threshold. And color items according to how helpful (shades of yellow)
Maybe look at only the code and learn good code vs bad code
LINKS TO USE:
http://stackoverflow.com/questions/13784192/creating-an-empty-pandas-dataframe-then-filling-it
Find some other sites to use for this.
http://stackoverflow.com/questions/9236926/concatenating-two-one-dimensional-numpy-arrays - this is where the site works less well
http://stackoverflow.com/questions/21887754/numpy-concatenate-two-arrays-vertically - similar topic but likes all the answers
http://stackoverflow.com/questions/306400/how-do-i-randomly-select-an-item-from-a-list-using-python
http://stackoverflow.com/questions/3501382/checking-whether-a-variable-is-an-integer-or-not
To run the server as a background app run (from ~/application & insight environment):
gunicorn Web_App:app -D
To turn it off run:
pkill gunicorn
this is from my p2
$ scp [email protected]:/home/ubuntu/nbs/stackex_gru.h5 /home/dan-laptop/Insight/stackex_gru.h5
scp /home/dan-laptop/Insight/stackex_sum/Modeling/answer_db.csv [email protected]:/home/ubuntu/nbs/answer_db.csv
this is from my t2
$ scp [email protected]:/home/ubuntu/application/Web_App/views.py /home/dan-laptop/Insight/stackex_sum/Web_App/views.py
$ scp /home/dan-laptop/Insight/stackex_sum/Web_App/StackExchange_Query.py [email protected]:/home/ubuntu/application/Web_App/StackExchange_Query.py
http://www.appscare.com/2015/03/embed-google-slides-webpage/
Presentation:
be quicker in the demo. take more time on the model validation slide (make sure to explain). put the results up one at time - roc curve then confusion matrix.
don't say that I would like to talk about this more later so often.