-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Suggest the correct name when no key matches in the dataset #9943
base: main
Are you sure you want to change the base?
Conversation
@@ -1611,6 +1611,11 @@ def __getitem__( | |||
return self._construct_dataarray(key) | |||
except KeyError as e: | |||
message = f"No variable named {key!r}. Variables on the dataset include {shorten_list_repr(list(self.variables.keys()), max_items=10)}" | |||
|
|||
best_guess = utils.did_you_mean(key, self.variables.keys()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Amazing idea. I would print the best guess first, and then any others so that's it's easy to see
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe should just remove the "Variables on the dataset include ..." ? They try to do the same thing I think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah you could sort the whole list by similarity and then print that (truncated as above)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now it prioritizes best_guess. If best_guess is empty you could be working in the wrong dataset, so it's still nice to get some kind of clue which dataset you're using.
Big +1 on this; I'd also enjoy this as a user. Is there any concern that some processes might be running |
We could add an LRU cache over |
Though I'm thinking that someone could query whether different keys exist; i.e. Overall I say let's go ahead and we can reassess if we hear reports of slowdowns. Folks can use |
I thought about this case as well, my initial idea was to just use |
I found the error when I make a typo on the dataset keys not so helpful. The truncated list of variables hides all the ones that I wanted to see. Instead, add a fuzzy matching function that does the typical "Did you mean X".
whats-new.rst
Further reading:
python/cpython#16850
matplotlib/matplotlib#28115
https://en.wikipedia.org/wiki/Levenshtein_distance