[Data] Lower constant at which high memory issue detector emits warnings#64124
[Data] Lower constant at which high memory issue detector emits warnings#64124bveeramani wants to merge 1 commit into
Conversation
Signed-off-by: Balaji Veeramani <bveeramani@berkeley.edu>
There was a problem hiding this comment.
Code Review
This pull request refactors the logical memory calculation by extracting the default logical memory logic from MapOperator into a reusable module-level helper function get_safe_default_logical_memory. This helper is then integrated into both MapOperator and HighMemoryIssueDetector, reducing code duplication. The review feedback identifies a potential issue where a non-numeric num_cpus argument could trigger sequence repetition and cause a memory crash, and suggests adding defensive type validation to prevent this.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
| @@ -87,6 +87,50 @@ | |||
|
|
|||
| logger = logging.getLogger(__name__) | |||
|
|
|||
There was a problem hiding this comment.
No semantic changes in this file. Just moving code around and exposing
ayushk7102
left a comment
There was a problem hiding this comment.
LGTM, can we check if the test_high_memory_detection test in test_issue_detection.py requires any modification in the params given the new low mark?
Description
A long time ago, we added an "issue detector" that emits warnings when UDFs use more than 4 GiB USS per logical CPU. The rationale is that nodes typically have 4 GiB of physical memory per core, and if you use more than that you'll likely encounter OOMs.
Recently, I added
default_map_logical_memory_enabledflag in #63814. If enabled, it causes tasks to default to a slightly different constant of ~2.6 GiB USS per logical CPU.In this PR, I'm unifying the issue detector and default map memory to use the 2.6 GiB USS constant. My motivation is to avoid duplicate and divergent code paths for similar logic. Also, the 4 GiB was too permissive, because it doesn't account for memory from object store and system.
Related issues