Skip to content

Fix raw norm#156

Open
tomo-oga wants to merge 3 commits into
gyorilab:masterfrom
tomo-oga:fix_raw_norm
Open

Fix raw norm#156
tomo-oga wants to merge 3 commits into
gyorilab:masterfrom
tomo-oga:fix_raw_norm

Conversation

@tomo-oga
Copy link
Copy Markdown

When annotating, gilda will reference the ner_stoplist, however this comparison is done with the normalized strings in the text, against non-normalized stopwords in the file, which causes them to be missed. This pull request resolves this by having the unnormalized word be referenced against the stoplist.

@bgyori
Copy link
Copy Markdown
Member

bgyori commented May 22, 2025

Thanks this looks good. In fact, I changed this to remove the original normalized check which doesn't make sense. However, given this change, we would want to look at the stopword list and for many cases add both capitalized and lowercase versions where that makes sense and one is missing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants