Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

saber.load_dataset() should be able to pull from pubannotation. #146

Open
JohnGiorgi opened this issue Jun 15, 2019 · 0 comments
Open

saber.load_dataset() should be able to pull from pubannotation. #146

JohnGiorgi opened this issue Jun 15, 2019 · 0 comments
Assignees
Labels
enhancement New feature or request feature

Comments

@JohnGiorgi
Copy link
Contributor

JohnGiorgi commented Jun 15, 2019

Saber.load_dataset() should be able to pull from pubannotation.org given a projects URL.

E.g.

saber.load_dataset('http://pubannotation.org/projects/AGAC_training/annotations.tgz')

should download the dataset to ~/saber/datasets, convert it to the CoNLL 2003 format, and load it into a Dataset object. Furthermore, if this URL is ever supplied again, load_dataset() should use the cached version of the dataset in ~/saber/datasets.

Considering pubannotation.org contains most of the most popular datasets for BioNLP, this would nearly eliminate the need to maintain datasets locally.

@JohnGiorgi JohnGiorgi added enhancement New feature or request feature labels Jun 15, 2019
@JohnGiorgi JohnGiorgi self-assigned this Jun 15, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request feature
Projects
None yet
Development

No branches or pull requests

1 participant