Skip to content

Conversation

TomAugspurger
Copy link
Contributor

Description

This PR fixes some warnings in the dask-cudf test suite and elevates any unhandled warnings to errors.

dask/backends.py:140: UserWarning: Warning gzip compression does not support breaking apart files
ParserWarning: Falling back to the 'python' engine because the 'c' engine does not support skipfooter; you can avoid this warning by specifying engine='python'.
 UserWarning:
  You did not provide metadata, so Dask is running ...
UserWarning: Using CPU via PyArrow to read ORC dataset.
- cudf/core/dataframe.py:7708: RuntimeWarning: Degrees of freedom <= 0 for slice
- cupy/_statistics/correlation.py:210: RuntimeWarning: divide by zero encountered in scalar divide
RuntimeWarning: invalid value encountered in cast
Copy link

copy-pr-bot bot commented Sep 17, 2025

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

@TomAugspurger
Copy link
Contributor Author

/ok to test da7cb07

@github-actions github-actions bot added the Python Affects Python cuDF API. label Sep 17, 2025
@GPUtester GPUtester moved this to In Progress in cuDF Python Sep 17, 2025
{"a": list(range(15)) + [None] * 5, "b": list(reversed(range(20)))},
],
)
# This warning comes from dask-expr, and is probably a consequence of
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one is probably a bug in dask-cudf / dask.dataframe. It occurs as part of normal dask-cudf operations:

>>> import cudf, dask.dataframe as dd, cupy as cp
>>> data = {
...     "a": [None] * 100 + list(range(100, 150)),
...     "b": list(range(50)) + [None] * 50 + list(range(50, 100)),
... }
>>> df = cudf.DataFrame(data)
>>> ddf = dd.from_pandas(df, npartitions=5)
>>> ddf.sort_values(by="a", na_position="first")
/raid/toaugspurger/envs/gh/rapidsai/cudf/lib/python3.13/site-packages/pandas/core/arrays/numpy_.py:130: RuntimeWarning: invalid value encountered in cast
  result = np.asarray(scalars, dtype=dtype)  # type: ignore[arg-type]

The equivalent operation on a pandas.DataFrame doesn't emit the warning. I think this is from cudf's not handling NA / NaN the same for a column that would otherwise be integer dtype.

@TomAugspurger
Copy link
Contributor Author

/ok to test 3f0633c

@TomAugspurger TomAugspurger added improvement Improvement / enhancement to an existing function non-breaking Non-breaking change labels Sep 18, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
improvement Improvement / enhancement to an existing function non-breaking Non-breaking change Python Affects Python cuDF API.
Projects
Status: In Progress
Development

Successfully merging this pull request may close these issues.

1 participant