-
Notifications
You must be signed in to change notification settings - Fork 119
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SNOW-1897441: Fix missing row position sort in DataFrame.describe #2950
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
sfc-gh-joshi
added
NO-CHANGELOG-UPDATES
This pull request does not need to update CHANGELOG.md
NO-PANDAS-CHANGEDOC-UPDATES
This PR does not update Snowpark pandas docs
labels
Jan 28, 2025
sfc-gh-lmukhopadhyay
approved these changes
Jan 28, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks Jonathan!
src/snowflake/snowpark/modin/plugin/compiler/snowflake_query_compiler.py
Outdated
Show resolved
Hide resolved
src/snowflake/snowpark/modin/plugin/compiler/snowflake_query_compiler.py
Outdated
Show resolved
Hide resolved
src/snowflake/snowpark/modin/plugin/compiler/snowflake_query_compiler.py
Outdated
Show resolved
Hide resolved
sfc-gh-jjiao
approved these changes
Jan 28, 2025
sfc-gh-helmeleegy
approved these changes
Jan 28, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, thanks Johnathan.
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Labels
NO-CHANGELOG-UPDATES
This pull request does not need to update CHANGELOG.md
NO-PANDAS-CHANGEDOC-UPDATES
This PR does not update Snowpark pandas docs
snowpark-pandas
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Which Jira issue is this PR addressing? Make sure that there is an accompanying issue to your PR.
Fixes SNOW-1897441
Fill out the following pre-review checklist:
Please describe how your code solves the related issue.
When calling
describe
on a DataFrame with object columns, pandas will report atop
column identifying the value that appears most frequently. If the top two values share the same frequency, pandas documentation indicates that it actually does not provide any stability guarantees:Tests involving this behavior are currently failing on QA6, where it appears that the order of results returned by a GROUP BY/COUNT query has changed. This PR adds an additional sort on the row position column to ensure that the object value that appears first is always chosen first; this may not always agree with pandas (though pandas does this in all of our current tests), but at least keeps results the same between prod and qa6.
I ran
tests/integ/modin/frame/test_describe.py
with a qa6 account to verify everything passes.