You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Since plural and grammatical case are all considered perfect matches in our annotation guidelines, we could apply a stemmer to the data to make our models denser.
However, we might need to annotate the new expansions because some pairs might decrease ranking during stemming due to it being considered an abbreviation (e.g. "Vorbefund" -> "Vorbefu", "Vesikuläratmen" -> "Vesikuläratm", "Operation" -> "Operatio").
The CISTEM stemmer seems to improve results over Porter stemmer and has a Python NLTK implementation.
Since plural and grammatical case are all considered perfect matches in our annotation guidelines, we could apply a stemmer to the data to make our models denser.
However, we might need to annotate the new expansions because some pairs might decrease ranking during stemming due to it being considered an abbreviation (e.g. "Vorbefund" -> "Vorbefu", "Vesikuläratmen" -> "Vesikuläratm", "Operation" -> "Operatio").
The CISTEM stemmer seems to improve results over Porter stemmer and has a Python NLTK implementation.
Relates to #87.
The text was updated successfully, but these errors were encountered: