Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Disease synonyms/hyponyms via synonym list #107

Closed
michelole opened this issue Jul 17, 2017 · 7 comments
Closed

Disease synonyms/hyponyms via synonym list #107

michelole opened this issue Jul 17, 2017 · 7 comments
Assignees

Comments

@michelole
Copy link
Member

michelole commented Jul 17, 2017

Take a look at mesh2solrsyn

@michelole michelole self-assigned this Jul 27, 2017
@michelole michelole changed the title Take a look at https://github.com/Shugyousha/mesh2solrsyn Take a look at mesh2solrsyn Jul 27, 2017
@michelole
Copy link
Member Author

I got a valid synonym file with the 2015 MeSH version:

go run mesh2solrsyn.go d2015.bin < mtrees2015.bin > synonyms.txt

@michelole michelole changed the title Take a look at mesh2solrsyn Disease synonyms Jul 27, 2017
@michelole michelole added the P0 label Jul 27, 2017
@michelole
Copy link
Member Author

@steschu63 has provided a python script to create a Solr synonym file out of MeSH 2017. Use it.

@michelole michelole changed the title Disease synonyms Disease synonyms via synonym list Jul 28, 2017
@michelole
Copy link
Member Author

This relates to #85 .

@michelole michelole changed the title Disease synonyms via synonym list Disease synonyms/hyponyms via synonym list Jul 28, 2017
michelole pushed a commit to michelole/trec2017 that referenced this issue Jul 29, 2017
@michelole
Copy link
Member Author

The synonym list worsened results from 0,7693 to 0,4347.

It could still benefit, however, from most-fields (#97).

@steschu63
Copy link

steschu63 commented Jul 31, 2017 via email

@steschu63
Copy link

steschu63 commented Jul 31, 2017 via email

@michelole
Copy link
Member Author

Hi @steschu63 !

We're using it only for diseases (comorbidities were never explored).

I took a manual look at the results from the worse topics (see gold standard where source column is set to "Synonym list") and the results are still not promising. It's basically matching first rare diseases or papers about virus (?) that rank higher because of tf-idf.

The metrics refer already to the gold standard with these samples.

Ps.: All your comments are public in Github when you reply by email. ;)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants