-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Normalize German variants #87
Labels
P2
High priority issues, a COULD
Comments
Maybe normalize all to k also during training to make models denser? |
If normalizing before training, cleaning routines should be applied to the GS (at runtime) to avoid false negatives. |
Open
This actually may require annotating the new expansions, since some of them could be considered typos, e.g. "becannt"/"druccausgleich", "karotis"/"kava". |
Open
michelole
added a commit
to michelole/acres
that referenced
this issue
Aug 21, 2020
Spelling variants are better handled with a normalization step instead of an exponential increase of expansion candidates, which led to very slow processing and several bugs. This refs bst-mug#87 and closes bst-mug#98. Also, `get_acro_def_pair_score` was originally intended for web-based (i.e. text with acronym-definition pairs) inputs, now removed.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Normalize, e.g., c <-> k, z <-> c before applying filtering rules.
The text was updated successfully, but these errors were encountered: