Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Train models for each major entity class #6

Open
9 tasks
JohnGiorgi opened this issue Aug 2, 2018 · 3 comments
Open
9 tasks

Train models for each major entity class #6

JohnGiorgi opened this issue Aug 2, 2018 · 3 comments
Assignees
Labels
enhancement New feature or request production

Comments

@JohnGiorgi
Copy link
Contributor

JohnGiorgi commented Aug 2, 2018

Need to train models for each major entity class: PRGE, LIVB, DISO, CHED. The first three are fairly straight-forward. As for the last, there are multiple levels of granularity to the entity annotations, for now, might just cheat and collapse everything under the CHED tag.

For relations, we are at the mercy of what datasets are available. Right now, we could train a model for adverse drug events using the ADE corpus.

There should be a base and large version for each model. In the case of BERT, this would correspond to whether the BERT base or large model was used. Any model not implemented should raise a NotImplementedError (see #155).

Finally, the model names should follow a convention. Maybe [model-name]-[entity or relation]-[base or large], e.g. bert-for-ner-prge, bert-for-ner-prge-lg. See PyTorch Transformers or SpaCy for inspiration.

BERT

Entities

  • Train PRGE-base
  • Train PRGE-large
  • Train LIVB-base
  • Train LIVB-large
  • Train DISO-base
  • Train DISO-large
  • Train CHED-base
  • Train CHED-large

Relations

  • Train ADE
@JohnGiorgi JohnGiorgi added the enhancement New feature or request label Aug 2, 2018
@JohnGiorgi JohnGiorgi self-assigned this Aug 2, 2018
@JohnGiorgi JohnGiorgi pinned this issue Jan 22, 2019
@nleguillarme
Copy link

Hi @JohnGiorgi.

I am currently working on a review of taxon mentions recognition tools for ecological information extraction, and I have just discovered Saber which I'd like too include as an example of state-of-the-art deep learning-based approach.

Unfortunately, it seems that the LIVB pre-trained model does not exist at the moment. Any idea when it might be available? Or should I consider training my own model?

Thank you for your help.

@JohnGiorgi
Copy link
Contributor Author

Hi @nleguillarme,

Thanks for your interest. Unfortunately, we are no longer maintaining the project. I would suggest checking out AllenNLP, Transformers or ScispaCy for state-of-the-art NER. ScispaCy has pretrained models that will detect organism names (see the model trained on BIONLP13CG specifically).

@nleguillarme
Copy link

Too bad the project is dead, it seemed like a great tool.
Thanks for the pointers.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request production
Projects
None yet
Development

No branches or pull requests

2 participants