Skip to content

Add more careful norm_text check#133

Open
cthoyt wants to merge 1 commit into
masterfrom
more-term-checks
Open

Add more careful norm_text check#133
cthoyt wants to merge 1 commit into
masterfrom
more-term-checks

Conversation

@cthoyt
Copy link
Copy Markdown
Contributor

@cthoyt cthoyt commented Feb 21, 2024

This makes sure it's not none before checking if it can be stripped. Response to this error:

...
  File "/Users/cthoyt/dev/gilda/gilda/grounder.py", line 685, in load_entries_from_terms_file
    yield Term(*row_nones)
          ^^^^^^^^^^^^^^^^
  File "/Users/cthoyt/dev/gilda/gilda/term.py", line 55, in __init__
    if not norm_text.strip():
           ^^^^^^^^^^^^^^^
AttributeError: 'NoneType' object has no attribute 'strip'

@cthoyt cthoyt requested a review from bgyori February 21, 2024 14:43
@bgyori
Copy link
Copy Markdown
Member

bgyori commented Feb 21, 2024

This probably doesn't hurt, but assuming that norm_text is produced by the normalize function, under what conditions could it be None?

@cthoyt
Copy link
Copy Markdown
Contributor Author

cthoyt commented Feb 21, 2024

It's happening in a place where I'm using load_entries_from_terms_file() - something in that process is leaving a None in the norm_text column. I'm trying to track that down now too

@cthoyt
Copy link
Copy Markdown
Contributor Author

cthoyt commented Feb 22, 2024

Tracked down the culprit by hacking some logging in to the load_entries_from_terms_file() function:

WARNING: [2024-02-22 01:40:56] gilda.grounder - [None, '-', 'umls', 'C5187432', '-', 'synonym', 'umls', None, None, None] Normalized text for Term cannot be None nor empty

https://uts.nlm.nih.gov/uts/umls/concept/C5187432

nice name -

@bgyori
Copy link
Copy Markdown
Member

bgyori commented Feb 22, 2024

The odd thing is that normalize('-') returns '', as expected, so the None is likely coming from some other code. Is it clear where that happens?

@cthoyt
Copy link
Copy Markdown
Contributor Author

cthoyt commented Feb 22, 2024

I am thinking this might be a weird syncing issue - I don't have UMLS update on the normal cycle since it's so big and hard to work with. I think what happened is that the UMLS index was dumped before we put the check inside the Term object, and so it's indexed with an empty string at the moment, which the load function turns into none since it does a truthy check. I'll try rebuilding UMLS and see what happens

@bgyori bgyori force-pushed the more-term-checks branch from e7637ca to f600a9a Compare July 19, 2024 01:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants