German Stemmer #123

michelole · 2019-12-03T17:53:43Z

Since plural and grammatical case are all considered perfect matches in our annotation guidelines, we could apply a stemmer to the data to make our models denser.

However, we might need to annotate the new expansions because some pairs might decrease ranking during stemming due to it being considered an abbreviation (e.g. "Vorbefund" -> "Vorbefu", "Vesikuläratmen" -> "Vesikuläratm", "Operation" -> "Operatio").

The CISTEM stemmer seems to improve results over Porter stemmer and has a Python NLTK implementation.

Relates to #87.

michelole mentioned this issue Dec 3, 2019

Soundex #124

Open

michelole added the P2 High priority issues, a COULD label Dec 3, 2019

michelole mentioned this issue Dec 3, 2019

Normalize German umlaut #125

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

German Stemmer #123

German Stemmer #123

michelole commented Dec 3, 2019 •

edited

Loading

German Stemmer #123

German Stemmer #123

Comments

michelole commented Dec 3, 2019 • edited Loading

michelole commented Dec 3, 2019 •

edited

Loading