New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Self-trained fasttext embeddings #113

Closed

5 tasks done

michelole opened this issue Jun 5, 2019 · 2 comments · Fixed by #121

Closed

5 tasks done

Self-trained fasttext embeddings #113

michelole opened this issue Jun 5, 2019 · 2 comments · Fixed by #121

Assignees

Labels

P0

Member

michelole commented Jun 5, 2019 •

edited

Loading

To better compare with pre-trained embeddings.

Draft idea:

Dump n2c2 train data into a format legible by fasttext
Harmonize pre-processing (DRY Cleaning #105 and DRY NN Tokenizers #86)
Train fasttext embeddings using command line
Extract subword embeddings for train/test data
Retrain LSTM using such embeddings

michelole added the P0 label

michelole self-assigned this

Member Author

michelole commented Jun 11, 2019

fasttext training follows https://github.com/ncbi-nlp/BioSentVec/blob/master/src/train_biowordvec.sh

michelole mentioned this issue

Merged

michelole closed this as completed in #121

Member Author

michelole commented Jun 12, 2019

Reopening while models don't finish training.

michelole reopened this

michelole closed this as completed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment