-
Notifications
You must be signed in to change notification settings - Fork 971
Fix warnings in dask-cudf test suite #19993
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: branch-25.10
Are you sure you want to change the base?
Fix warnings in dask-cudf test suite #19993
Conversation
dask/backends.py:140: UserWarning: Warning gzip compression does not support breaking apart files
ParserWarning: Falling back to the 'python' engine because the 'c' engine does not support skipfooter; you can avoid this warning by specifying engine='python'.
UserWarning: You did not provide metadata, so Dask is running ...
UserWarning: Using CPU via PyArrow to read ORC dataset.
- cudf/core/dataframe.py:7708: RuntimeWarning: Degrees of freedom <= 0 for slice - cupy/_statistics/correlation.py:210: RuntimeWarning: divide by zero encountered in scalar divide
RuntimeWarning: invalid value encountered in cast
Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually. Contributors can view more details about this message here. |
/ok to test da7cb07 |
{"a": list(range(15)) + [None] * 5, "b": list(reversed(range(20)))}, | ||
], | ||
) | ||
# This warning comes from dask-expr, and is probably a consequence of |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This one is probably a bug in dask-cudf / dask.dataframe. It occurs as part of normal dask-cudf operations:
>>> import cudf, dask.dataframe as dd, cupy as cp
>>> data = {
... "a": [None] * 100 + list(range(100, 150)),
... "b": list(range(50)) + [None] * 50 + list(range(50, 100)),
... }
>>> df = cudf.DataFrame(data)
>>> ddf = dd.from_pandas(df, npartitions=5)
>>> ddf.sort_values(by="a", na_position="first")
/raid/toaugspurger/envs/gh/rapidsai/cudf/lib/python3.13/site-packages/pandas/core/arrays/numpy_.py:130: RuntimeWarning: invalid value encountered in cast
result = np.asarray(scalars, dtype=dtype) # type: ignore[arg-type]
The equivalent operation on a pandas.DataFrame doesn't emit the warning. I think this is from cudf's not handling NA / NaN the same for a column that would otherwise be integer dtype.
/ok to test 3f0633c |
Description
This PR fixes some warnings in the dask-cudf test suite and elevates any unhandled warnings to errors.