Run cudf-polars conda unit tests with more than 1 process #19980

mroeschke · 2025-09-16T16:46:14Z

Description

Now that cudf-polars uses managed memory by default, the prior comment here should no longer be applicable and we should be able to run these tests with more than 1 process for a hopeful improvement in runtime.

Probably depends on #20042 so each xdist process doesn't set the initial_pool_size of the memory resource to 80% of the available device memory.

Checklist

I am familiar with the Contributing Guidelines.
New or existing tests cover these changes.
The documentation is up to date with these changes.

…ars/xdist_conda

…guration

…ars/xdist_conda

…for error test

…ars/xdist_conda

copy-pr-bot · 2025-09-19T17:14:50Z

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

mroeschke · 2025-09-19T17:24:45Z

/ok to test 59d144d

TomAugspurger · 2025-09-24T15:26:53Z

As mentioned in #19895 (comment), we'll want to update

cudf/python/cudf_polars/tests/test_config.py

Line 104 in be76be8

    
           def test_cudf_polars_enable_disable_managed_memory(monkeypatch, enable_managed_memory):

to clear cudf-polars memory pool cache by calling

default_memory_resource.cache_clear()

somewhere near the end of that test. Otherwise, there will be a memory pool with 50% of GPU memory sitting around doing nothing. That should perhaps be a fixture so that we know it runs, even if the test fails somewhere after creating that memory resource.

There might still be some risk in two tests running that at the same time, so that should perhaps rerun on failures a few times.

I can push those changes here if you'd like.

mroeschke · 2025-09-24T20:32:37Z

/ok to test 29b3b41

TomAugspurger · 2025-09-25T13:54:53Z

That one passed. @mroeschke do you remember how consistently you saw the test suite OOM previously?

mroeschke · 2025-09-25T18:21:42Z

do you remember how consistently you saw the test suite OOM previously?

Each run was OOMing even with just in memory executor (before limiting the initial pool size to 1GB). I'm going to rerun these test in an "ideal" setup (8 processes, no limiting of initial pool size) to see if test_cudf_polars_enable_disable_managed_memory was the only blocker to running these test in multiple processes.

mroeschke · 2025-09-25T18:21:57Z

/ok to test 4c51a08

This reverts commit 187c946.

This reverts commit 017bf31.

…ars/xdist_conda

mroeschke · 2025-09-25T23:50:14Z

python/cudf_polars/tests/experimental/conftest.py

+import pytest
+
+
+@pytest.fixture(autouse=True)


I put this fixture in the experimental directory since currently these are the only tests (with the CI script) that purposefully test with the distributed executor. Is that OK @TomAugspurger given your thoughts on reorganizing the cudf_polars test suite in the future?

Yep, that sounds good to me.

TomAugspurger · 2025-09-26T12:16:58Z

python/cudf_polars/tests/experimental/conftest.py

+import pytest
+
+
+@pytest.fixture(autouse=True)


Yep, that sounds good to me.

mroeschke · 2025-09-26T18:50:47Z

Just documenting some runtime observations for a follow up running the cudf polars wheel tests with multiple processes

With 6 processes, the runtime of the 4 cudf polars test variants were essentially equivalent except for the distributed variant where the runtime increased by 1-3 minutes
With 8 processes, the runtime of the for the in memory, streaming, stream + small block size decreased by 40 sec - 1.5 minutes while the streaming + distributed variant increased by ~40 seconds (overall net decrease)

So in the follow up, I'll probably use 8 processes for these unit test with conda (and wheels) and maybe disable running the distributed variant with multiple processes

mroeschke · 2025-09-26T18:50:55Z

/merge

…9980) Now that cudf-polars uses managed memory by default, the prior comment here should no longer be applicable and we should be able to run these tests with more than 1 process for a hopeful improvement in runtime. Probably depends on rapidsai#20042 so each xdist process doesn't set the `initial_pool_size` of the memory resource to 80% of the available device memory. Authors: - Matthew Roeschke (https://github.com/mroeschke) - Tom Augspurger (https://github.com/TomAugspurger) Approvers: - Kyle Edwards (https://github.com/KyleFromNVIDIA) - Bradley Dice (https://github.com/bdice) - Tom Augspurger (https://github.com/TomAugspurger) URL: rapidsai#19980

Run cudf-polars conda unit test with more than 1 process

25aca25

mroeschke self-assigned this Sep 16, 2025

mroeschke requested a review from a team as a code owner September 16, 2025 16:46

mroeschke added the improvement Improvement / enhancement to an existing function label Sep 16, 2025

mroeschke requested a review from KyleFromNVIDIA September 16, 2025 16:46

mroeschke added the non-breaking Non-breaking change label Sep 16, 2025

KyleFromNVIDIA approved these changes Sep 16, 2025

View reviewed changes

mroeschke added 2 commits September 16, 2025 14:21

Merge remote-tracking branch 'upstream/branch-25.10' into ci/cudf_pol…

816a133

…ars/xdist_conda

Use a dask_cluster session fixture instead of session_start/end confi…

534883e

…guration

mroeschke requested a review from a team as a code owner September 16, 2025 21:50

mroeschke requested review from wence- and vyasr September 16, 2025 21:50

github-actions bot added Python Affects Python cuDF API. cudf-polars Issues specific to cudf-polars labels Sep 16, 2025

github-project-automation bot added this to cuDF Python Sep 16, 2025

GPUtester moved this to In Progress in cuDF Python Sep 16, 2025

mroeschke added 12 commits September 17, 2025 09:31

Merge remote-tracking branch 'upstream/branch-25.10' into ci/cudf_pol…

f09d147

…ars/xdist_conda

Use less processes for OOMs

a1458cf

Move autouse fixture to experimental conftest.py, allocate less data …

fa9a77f

…for error test

Merge remote-tracking branch 'upstream/branch-25.10' into ci/cudf_pol…

ba0179b

…ars/xdist_conda

Allocate less memory in test_dask_serialization_roundtrip

bb33390

Merge remote-tracking branch 'upstream/branch-25.10' into ci/cudf_pol…

6df7cf9

…ars/xdist_conda

Undo test changes to test_dask_serialize

5e37c93

Try only 2 processes?

bdc912a

Try function scoping to cleanup after tests

f3372ad

Merge remote-tracking branch 'upstream/branch-25.10' into ci/cudf_pol…

ad26cd3

…ars/xdist_conda

Merge remote-tracking branch 'upstream/branch-25.10' into ci/cudf_pol…

e5c7b38

…ars/xdist_conda

Merge remote-tracking branch 'upstream/branch-25.10' into ci/cudf_pol…

399a730

…ars/xdist_conda

mroeschke marked this pull request as draft September 19, 2025 17:14

mroeschke added 3 commits September 19, 2025 10:18

Disable some jobs to isolate polars conda jobs

017bf31

Comment out also on pr builder

187c946

try hardcoding free memory for OOMs

59d144d

TomAugspurger mentioned this pull request Sep 24, 2025

Trace node execution in cudf-polars #19895

Open

3 tasks

Add fixture to cleare MR cache

29b3b41

github-actions bot assigned TomAugspurger Sep 24, 2025

vyasr changed the base branch from branch-25.10 to branch-25.12 September 24, 2025 17:45

Undo limiting free_memory initial pool size, try 8 processes

4c51a08

mroeschke added 3 commits September 25, 2025 16:47

Revert "Comment out also on pr builder"

840d206

This reverts commit 187c946.

Revert "Disable some jobs to isolate polars conda jobs"

2e99088

This reverts commit 017bf31.

Merge remote-tracking branch 'upstream/branch-25.12' into ci/cudf_pol…

5d8c909

…ars/xdist_conda

mroeschke marked this pull request as ready for review September 25, 2025 23:48

mroeschke commented Sep 25, 2025

View reviewed changes

Try using 6 processes

814a8ac

bdice approved these changes Sep 26, 2025

View reviewed changes

TomAugspurger approved these changes Sep 26, 2025

View reviewed changes

python/cudf_polars/tests/experimental/conftest.py

import pytest

@pytest.fixture(autouse=True)

Copy link

Contributor

TomAugspurger Sep 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, that sounds good to me.

rapids-bot bot merged commit c82f0fc into rapidsai:branch-25.12 Sep 26, 2025
135 checks passed

github-project-automation bot moved this from In Progress to Done in cuDF Python Sep 26, 2025

mroeschke deleted the ci/cudf_polars/xdist_conda branch September 26, 2025 18:51

mroeschke mentioned this pull request Sep 26, 2025

Run cudf-polars wheels unit tests with more than 1 process #20124

Open

3 tasks

mroeschke mentioned this pull request Sep 29, 2025

Fix slowdown in cudf-polars distributed tests #20137

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Run cudf-polars conda unit tests with more than 1 process #19980

Run cudf-polars conda unit tests with more than 1 process #19980

Uh oh!

mroeschke commented Sep 16, 2025 •

edited

Loading

Uh oh!

copy-pr-bot bot commented Sep 19, 2025

Uh oh!

mroeschke commented Sep 19, 2025

Uh oh!

TomAugspurger commented Sep 24, 2025 •

edited

Loading

Uh oh!

mroeschke commented Sep 24, 2025

Uh oh!

TomAugspurger commented Sep 25, 2025

Uh oh!

mroeschke commented Sep 25, 2025

Uh oh!

mroeschke commented Sep 25, 2025

Uh oh!

mroeschke Sep 25, 2025

Uh oh!

TomAugspurger Sep 26, 2025

Uh oh!

TomAugspurger Sep 26, 2025

Uh oh!

mroeschke commented Sep 26, 2025

Uh oh!

mroeschke commented Sep 26, 2025

Uh oh!

Uh oh!

Uh oh!

Run cudf-polars conda unit tests with more than 1 process #19980

Run cudf-polars conda unit tests with more than 1 process #19980

Uh oh!

Conversation

mroeschke commented Sep 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Checklist

Uh oh!

copy-pr-bot bot commented Sep 19, 2025

Uh oh!

mroeschke commented Sep 19, 2025

Uh oh!

TomAugspurger commented Sep 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mroeschke commented Sep 24, 2025

Uh oh!

TomAugspurger commented Sep 25, 2025

Uh oh!

mroeschke commented Sep 25, 2025

Uh oh!

mroeschke commented Sep 25, 2025

Uh oh!

mroeschke Sep 25, 2025

Choose a reason for hiding this comment

Uh oh!

TomAugspurger Sep 26, 2025

Choose a reason for hiding this comment

Uh oh!

TomAugspurger Sep 26, 2025

Choose a reason for hiding this comment

Uh oh!

mroeschke commented Sep 26, 2025

Uh oh!

mroeschke commented Sep 26, 2025

Uh oh!

Uh oh!

Uh oh!

mroeschke commented Sep 16, 2025 •

edited

Loading

TomAugspurger commented Sep 24, 2025 •

edited

Loading