Skip to content

[Data] Lower constant at which high memory issue detector emits warnings#64124

Open
bveeramani wants to merge 1 commit into
masterfrom
data-refactor-safe-default-logical-memory
Open

[Data] Lower constant at which high memory issue detector emits warnings#64124
bveeramani wants to merge 1 commit into
masterfrom
data-refactor-safe-default-logical-memory

Conversation

@bveeramani

@bveeramani bveeramani commented Jun 16, 2026

Copy link
Copy Markdown
Member

Description

A long time ago, we added an "issue detector" that emits warnings when UDFs use more than 4 GiB USS per logical CPU. The rationale is that nodes typically have 4 GiB of physical memory per core, and if you use more than that you'll likely encounter OOMs.

Recently, I added default_map_logical_memory_enabled flag in #63814. If enabled, it causes tasks to default to a slightly different constant of ~2.6 GiB USS per logical CPU.

In this PR, I'm unifying the issue detector and default map memory to use the 2.6 GiB USS constant. My motivation is to avoid duplicate and divergent code paths for similar logic. Also, the 4 GiB was too permissive, because it doesn't account for memory from object store and system.

Related issues

Signed-off-by: Balaji Veeramani <bveeramani@berkeley.edu>
@bveeramani bveeramani requested a review from a team as a code owner June 16, 2026 00:40

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request refactors the logical memory calculation by extracting the default logical memory logic from MapOperator into a reusable module-level helper function get_safe_default_logical_memory. This helper is then integrated into both MapOperator and HighMemoryIssueDetector, reducing code duplication. The review feedback identifies a potential issue where a non-numeric num_cpus argument could trigger sequence repetition and cause a memory crash, and suggests adding defensive type validation to prevent this.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

Comment thread python/ray/data/_internal/execution/operators/map_operator.py
@@ -87,6 +87,50 @@

logger = logging.getLogger(__name__)

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No semantic changes in this file. Just moving code around and exposing

@bveeramani bveeramani changed the title [Data] Extract safe default logical memory into a shared utility [Data] Lower constant at which high memory issue detector emits warnings Jun 16, 2026
@ray-gardener ray-gardener Bot added the data Ray Data-related issues label Jun 16, 2026

@ayushk7102 ayushk7102 left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, can we check if the test_high_memory_detection test in test_issue_detection.py requires any modification in the params given the new low mark?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

data Ray Data-related issues

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants