Skip to content

feat(podgrouper): support externally created pod groups#1552

Queued
enoodle wants to merge 6 commits into
kai-scheduler:mainfrom
enoodle:erez/bring-your-own-pod-group-implementation
Queued

feat(podgrouper): support externally created pod groups#1552
enoodle wants to merge 6 commits into
kai-scheduler:mainfrom
enoodle:erez/bring-your-own-pod-group-implementation

Conversation

@enoodle
Copy link
Copy Markdown
Collaborator

@enoodle enoodle commented May 5, 2026

Description

Implement #1493 - support externally created pod groups

Related Issues

Fixes #1420

Checklist

  • Self-reviewed
  • Added/updated tests (if needed)
  • Updated documentation (if needed)

Breaking Changes

None

Additional Notes

Adding kai.scheduler/skip-podgrouper annotation to tell the pod grouper to skip a pod / resource to support externally created PodGroups by not overwriting the annotation.

Summary by CodeRabbit

  • New Features

    • Added support for externally-created PodGroups to coordinate multiple workloads
    • Introduced opt-out annotation to skip automatic pod grouping
    • Enhanced pod condition reporting for invalid subgroup references
  • Documentation

    • Added guides and examples for using external PodGroups with batch scheduling
  • Tests

    • Expanded test coverage for external PodGroup scenarios and invalid subgroup handling

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 5, 2026

Important

Review skipped

Auto reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: bb162e33-77a7-4c80-b58b-9c6100e23ffa

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

The PR adds support for externally-created PodGroups, enabling pods from multiple independent workloads to be grouped and scheduled atomically together. Users can opt out of podgrouper reconciliation via the kai.scheduler/skip-podgrouper annotation, and the scheduler tracks invalid subgroup references and emits pod conditions.

Changes

External PodGroup Support

Layer / File(s) Summary
Constants & Type Definitions
pkg/common/constants/constants.go, pkg/scheduler/api/podgroup_info/job_info.go
Added SkipPodGrouperAnnotation constant for opt-out annotation. Added InvalidSubGroupTasks field and accessor methods (GetInvalidSubGroupTasks, IsInvalidSubGroupTask) to PodGroupInfo for tracking pods with missing subgroups.
Pod Controller Skip Logic
pkg/podgrouper/pod_controller.go
Added shouldSkipPodGrouper and shouldSkipAnyOwner helpers to detect skip annotations. Integrated early-return checks in Reconcile to bypass processing for pods or owners marked to skip. Added nil-check for metadata retrieval. Updated isOrphanPodWithPodGroup to check owner reference length.
Scheduler Invalid Subgroup Handling
pkg/scheduler/api/podgroup_info/job_info.go
Modified AddTaskInfo to route missing-subgroup tasks to new addInvalidSubGroupTask helper instead of failing. Updated initialization paths (NewPodGroupInfoWithVectorMap, CloneWithTasks) to initialize InvalidSubGroupTasks.
Status Event Recording
pkg/scheduler/cache/status_updater/default_status_updater.go
Added recordInvalidSubGroupPodsEvents to mark invalid-subgroup pods as unschedulable with fit errors. Integrated into RecordJobStatusEvent flow. Updated recordUnschedulablePodsEvents to skip invalid-subgroup pods.
Tests
pkg/podgrouper/pod_controller_test.go, pkg/scheduler/api/podgroup_info/job_info_test.go, pkg/scheduler/cache/cluster_info/cluster_info_test.go, pkg/scheduler/cache/record_job_status_event_test.go
Extended pod controller tests with skip annotation scenarios (annotated pod, annotated owner, intermediate owner, nil metadata, false annotation values). Added test for invalid subgroup task tracking. Enhanced snapshot and event-recording tests to verify invalid subgroup pod handling.
E2E Integration & Documentation
test/e2e/suites/integrations/k8s_native/k8s_native_specs.go, examples/batch/external-podgroup-job.yaml, examples/batch/README.md, docs/batch/README.md, docs/developer/pod-grouper.md, CHANGELOG.md
New E2E test validates external PodGroup creation and pod scheduling. Added YAML example showing external PodGroup with subgroups and Job opt-out. Documented external PodGroup usage, required annotations/labels, and scheduler behavior.

Sequence Diagram

sequenceDiagram
    participant User
    participant PodController as Pod Controller
    participant K8s as Kubernetes
    participant Scheduler as Scheduler
    participant StatusUpdater as Status Updater
    
    User->>K8s: Create external PodGroup (with subgroups)
    User->>K8s: Create Workload (Job/RayJob) with pod-group-name annotation
    K8s->>PodController: Reconcile pod
    
    alt Pod has skip-podgrouper annotation
        PodController->>PodController: Check shouldSkipPodGrouper
        PodController-->>K8s: Skip reconciliation
    else Owner has skip-podgrouper annotation
        PodController->>K8s: Fetch pod owners
        PodController->>PodController: Check shouldSkipAnyOwner
        PodController-->>K8s: Skip reconciliation
    else Normal flow
        PodController-->>K8s: Allow normal processing
        K8s->>Scheduler: Discover pod via pod-group-name annotation
        Scheduler->>Scheduler: AddTaskInfo: lookup subgroup-name
        
        alt Subgroup exists
            Scheduler->>Scheduler: Add pod to subgroup
        else Subgroup missing
            Scheduler->>Scheduler: Track as InvalidSubGroupTask
            Scheduler->>Scheduler: Attach fit error
        end
        
        Scheduler->>StatusUpdater: Report pod status
        StatusUpdater->>StatusUpdater: recordInvalidSubGroupPodsEvents
        StatusUpdater->>K8s: Set pod Unschedulable condition with error
    end
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

  • kai-scheduler/KAI-Scheduler#1450: Related — both PRs modify skip-owner/skip-podgrouper behavior; the main PR adds skip-podgrouper checks in the pod controller while the retrieved PR fixes propagation of priority/preemptibility defaults for skipped owners.
  • kai-scheduler/KAI-Scheduler#970: Both PRs modify pkg/podgrouper/pod_controller.go and its tests, affecting pod reconciliation and group/subgroup assignment logic in the same reconcile flow.

Suggested labels

backport v0.12, backport v0.13, backport v0.14

Suggested reviewers

  • SiorMeir

Poem

🐰 A PodGroup bounces free,
No podgrouper's decree,
External and wise, it spans multiple sighs,
Cross-workloads atomically! ✨

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and concisely summarizes the main objective: adding support for externally created PodGroups in the podgrouper component.
Description check ✅ Passed The description references linked issues, follows the template structure with all key sections completed: description, related issues, checklist marked done, breaking changes stated as none, and additional notes provided.
Linked Issues check ✅ Passed The PR implements all primary objectives from issue #1420: adding skip-podgrouper annotation to prevent podgrouper interference [#1420], enabling external PodGroup creation and usage, documenting usage patterns, adding scheduler validation for invalid subgroups, and providing e2e test coverage for the feature.
Out of Scope Changes check ✅ Passed All changes directly support the stated objectives: skip-annotation constant and logic, invalid-subgroup tracking in scheduler, documentation updates, examples, and comprehensive tests. No extraneous modifications detected.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Warning

Review ran into problems

🔥 Problems

Timed out fetching pipeline failures after 30000ms


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 5, 2026

📊 Performance Benchmark Results

Comparing PR (erez/bring-your-own-pod-group-implementation) vs main branch — click to expand
goos: linux
goarch: amd64
pkg: github.com/kai-scheduler/KAI-scheduler/pkg/scheduler/actions
cpu: Intel(R) Xeon(R) Platinum 8370C CPU @ 2.80GHz
                                    │ main-bench.txt │            pr-bench.txt            │
                                    │     sec/op     │    sec/op     vs base              │
AllocateAction_SmallCluster-4            108.0m ± 0%   108.1m ±  6%       ~ (p=0.310 n=6)
AllocateAction_MediumCluster-4           135.6m ± 0%   136.2m ±  1%       ~ (p=0.180 n=6)
AllocateAction_LargeCluster-4            212.6m ± 6%   227.4m ± 10%  +6.97% (p=0.009 n=6)
ReclaimAction_SmallCluster-4             102.8m ± 0%   102.9m ±  0%       ~ (p=0.132 n=6)
ReclaimAction_MediumCluster-4            105.9m ± 1%   105.8m ±  0%       ~ (p=0.589 n=6)
PreemptAction_SmallCluster-4             103.7m ± 0%   103.6m ±  0%       ~ (p=0.589 n=6)
PreemptAction_MediumCluster-4            111.4m ± 1%   111.6m ±  1%       ~ (p=0.937 n=6)
ConsolidationAction_SmallCluster-4       123.9m ± 0%   124.0m ±  0%       ~ (p=0.589 n=6)
ConsolidationAction_MediumCluster-4      294.5m ± 1%   293.8m ±  1%       ~ (p=1.000 n=6)
FullSchedulingCycle_SmallCluster-4       105.1m ± 0%   105.3m ±  0%  +0.20% (p=0.015 n=6)
FullSchedulingCycle_MediumCluster-4      119.8m ± 2%   120.9m ±  1%       ~ (p=0.485 n=6)
FullSchedulingCycle_LargeCluster-4       161.6m ± 1%   165.4m ±  2%  +2.40% (p=0.002 n=6)
ManyQueues_MediumCluster-4               137.9m ± 1%   138.8m ±  2%       ~ (p=0.093 n=6)
GangScheduling_MediumCluster-4           157.3m ± 1%   158.2m ±  1%  +0.59% (p=0.041 n=6)
geomean                                  134.4m        135.6m        +0.86%

                                    │ main-bench.txt │            pr-bench.txt            │
                                    │      B/op      │     B/op      vs base              │
AllocateAction_SmallCluster-4           2.225Mi ± 1%   2.225Mi ± 1%       ~ (p=0.818 n=6)
AllocateAction_MediumCluster-4          12.09Mi ± 0%   12.09Mi ± 0%       ~ (p=0.394 n=6)
AllocateAction_LargeCluster-4           41.70Mi ± 0%   41.70Mi ± 0%       ~ (p=1.000 n=6)
ReclaimAction_SmallCluster-4            916.4Ki ± 1%   912.8Ki ± 1%       ~ (p=0.394 n=6)
ReclaimAction_MediumCluster-4           3.000Mi ± 0%   2.998Mi ± 0%       ~ (p=0.180 n=6)
PreemptAction_SmallCluster-4            1.133Mi ± 1%   1.133Mi ± 0%       ~ (p=0.485 n=6)
PreemptAction_MediumCluster-4           4.599Mi ± 0%   4.603Mi ± 0%  +0.09% (p=0.009 n=6)
ConsolidationAction_SmallCluster-4      9.776Mi ± 0%   9.782Mi ± 0%  +0.06% (p=0.015 n=6)
ConsolidationAction_MediumCluster-4     88.72Mi ± 0%   88.73Mi ± 0%       ~ (p=0.180 n=6)
FullSchedulingCycle_SmallCluster-4      1.419Mi ± 0%   1.420Mi ± 0%       ~ (p=1.000 n=6)
FullSchedulingCycle_MediumCluster-4     7.042Mi ± 0%   7.043Mi ± 0%       ~ (p=0.310 n=6)
FullSchedulingCycle_LargeCluster-4      23.11Mi ± 0%   23.11Mi ± 0%       ~ (p=0.589 n=6)
ManyQueues_MediumCluster-4              16.54Mi ± 0%   16.55Mi ± 0%       ~ (p=0.699 n=6)
GangScheduling_MediumCluster-4          17.49Mi ± 0%   17.49Mi ± 0%       ~ (p=0.589 n=6)
geomean                                 7.148Mi        7.147Mi       -0.02%

                                    │ main-bench.txt │           pr-bench.txt            │
                                    │   allocs/op    │  allocs/op   vs base              │
AllocateAction_SmallCluster-4            35.02k ± 0%   35.02k ± 0%       ~ (p=0.667 n=6)
AllocateAction_MediumCluster-4           312.8k ± 0%   312.8k ± 0%       ~ (p=0.905 n=6)
AllocateAction_LargeCluster-4            1.338M ± 0%   1.338M ± 0%       ~ (p=0.738 n=6)
ReclaimAction_SmallCluster-4             8.205k ± 0%   8.205k ± 0%       ~ (p=0.866 n=6)
ReclaimAction_MediumCluster-4            26.15k ± 0%   26.15k ± 0%       ~ (p=0.106 n=6)
PreemptAction_SmallCluster-4             11.90k ± 0%   11.92k ± 0%  +0.21% (p=0.002 n=6)
PreemptAction_MediumCluster-4            41.94k ± 0%   42.04k ± 0%  +0.24% (p=0.002 n=6)
ConsolidationAction_SmallCluster-4       127.7k ± 0%   127.8k ± 0%  +0.08% (p=0.002 n=6)
ConsolidationAction_MediumCluster-4      1.298M ± 0%   1.298M ± 0%  +0.02% (p=0.002 n=6)
FullSchedulingCycle_SmallCluster-4       20.68k ± 0%   20.68k ± 0%       ~ (p=1.000 n=6)
FullSchedulingCycle_MediumCluster-4      168.3k ± 0%   168.3k ± 0%       ~ (p=0.621 n=6)
FullSchedulingCycle_LargeCluster-4       698.7k ± 0%   698.7k ± 0%       ~ (p=1.000 n=6)
ManyQueues_MediumCluster-4               350.7k ± 0%   350.7k ± 0%       ~ (p=0.699 n=6)
GangScheduling_MediumCluster-4           571.8k ± 0%   571.8k ± 0%       ~ (p=0.236 n=6)
geomean                                  119.9k        119.9k       +0.04%

pkg: github.com/kai-scheduler/KAI-scheduler/pkg/scheduler/actions/integration_tests/reclaim
                            │ main-bench.txt │           pr-bench.txt            │
                            │     sec/op     │   sec/op     vs base              │
ReclaimLargeJobs_10Node-4        105.3m ± 0%   105.3m ± 0%       ~ (p=1.000 n=6)
ReclaimLargeJobs_50Node-4        229.8m ± 0%   231.2m ± 1%  +0.59% (p=0.015 n=6)
ReclaimLargeJobs_100Node-4       382.5m ± 3%   382.2m ± 8%       ~ (p=0.699 n=6)
ReclaimLargeJobs_200Node-4       749.2m ± 2%   749.8m ± 8%       ~ (p=0.818 n=6)
ReclaimLargeJobs_500Node-4        2.372 ± 3%    2.347 ± 1%  -1.06% (p=0.026 n=6)
ReclaimLargeJobs_1000Node-4       6.844 ± 3%    6.753 ± 1%       ~ (p=0.180 n=6)
geomean                          694.9m        692.8m       -0.30%

                            │ main-bench.txt │            pr-bench.txt            │
                            │      B/op      │     B/op      vs base              │
ReclaimLargeJobs_10Node-4       1.989Mi ± 3%   1.990Mi ± 3%       ~ (p=0.937 n=6)
ReclaimLargeJobs_50Node-4       59.62Mi ± 0%   59.70Mi ± 0%  +0.14% (p=0.026 n=6)
ReclaimLargeJobs_100Node-4      119.3Mi ± 0%   119.5Mi ± 0%  +0.13% (p=0.002 n=6)
ReclaimLargeJobs_200Node-4      241.0Mi ± 0%   241.3Mi ± 0%  +0.13% (p=0.002 n=6)
ReclaimLargeJobs_500Node-4      618.1Mi ± 0%   619.0Mi ± 0%  +0.15% (p=0.002 n=6)
ReclaimLargeJobs_1000Node-4     1.262Gi ± 0%   1.264Gi ± 0%  +0.14% (p=0.002 n=6)
geomean                         118.2Mi        118.3Mi       +0.13%

                            │ main-bench.txt │           pr-bench.txt            │
                            │   allocs/op    │  allocs/op   vs base              │
ReclaimLargeJobs_10Node-4        21.97k ± 2%   22.05k ± 2%  +0.36% (p=0.037 n=6)
ReclaimLargeJobs_50Node-4        801.9k ± 0%   803.8k ± 0%  +0.23% (p=0.002 n=6)
ReclaimLargeJobs_100Node-4       1.596M ± 0%   1.599M ± 0%  +0.23% (p=0.002 n=6)
ReclaimLargeJobs_200Node-4       3.182M ± 0%   3.189M ± 0%  +0.23% (p=0.002 n=6)
ReclaimLargeJobs_500Node-4       7.963M ± 0%   7.982M ± 0%  +0.23% (p=0.002 n=6)
ReclaimLargeJobs_1000Node-4      16.01M ± 0%   16.05M ± 0%  +0.23% (p=0.002 n=6)
geomean                          1.500M        1.504M       +0.25%

pkg: github.com/kai-scheduler/KAI-scheduler/pkg/scheduler/actions/reclaim
                            │ main-bench.txt │         pr-bench.txt         │
                            │     sec/op     │   sec/op     vs base         │
ReclaimWithMissingPVCJobs-4     2.447m ± 65%   2.371m ± 5%  ~ (p=0.310 n=6)

                            │ main-bench.txt │            pr-bench.txt            │
                            │      B/op      │     B/op      vs base              │
ReclaimWithMissingPVCJobs-4     8.023Ki ± 2%   8.195Ki ± 2%  +2.14% (p=0.045 n=6)

                            │ main-bench.txt │        pr-bench.txt         │
                            │   allocs/op    │ allocs/op   vs base         │
ReclaimWithMissingPVCJobs-4       153.0 ± 1%   154.0 ± 1%  ~ (p=0.177 n=6)

Legend

  • 📉 Negative delta = Performance improvement (faster)
  • 📈 Positive delta = Performance regression (slower)
  • p-value < 0.05 indicates statistically significant change
Raw benchmark data

PR branch:

goos: linux
goarch: amd64
pkg: github.com/kai-scheduler/KAI-scheduler/pkg/scheduler/actions
cpu: Intel(R) Xeon(R) Platinum 8370C CPU @ 2.80GHz
BenchmarkAllocateAction_SmallCluster-4         	       9	 114423787 ns/op	 2358018 B/op	   35039 allocs/op
BenchmarkAllocateAction_SmallCluster-4         	      10	 108314965 ns/op	 2333545 B/op	   35029 allocs/op
BenchmarkAllocateAction_SmallCluster-4         	      10	 108110226 ns/op	 2332220 B/op	   35020 allocs/op
BenchmarkAllocateAction_SmallCluster-4         	      10	 107809983 ns/op	 2332744 B/op	   35025 allocs/op
BenchmarkAllocateAction_SmallCluster-4         	      10	 108111961 ns/op	 2331391 B/op	   35020 allocs/op
BenchmarkAllocateAction_SmallCluster-4         	      10	 107963813 ns/op	 2333903 B/op	   35022 allocs/op
BenchmarkAllocateAction_MediumCluster-4        	       8	 136748624 ns/op	12674170 B/op	  312811 allocs/op
BenchmarkAllocateAction_MediumCluster-4        	       8	 136736340 ns/op	12674080 B/op	  312809 allocs/op
BenchmarkAllocateAction_MediumCluster-4        	       8	 135606491 ns/op	12676087 B/op	  312805 allocs/op
BenchmarkAllocateAction_MediumCluster-4        	       8	 135618331 ns/op	12673189 B/op	  312804 allocs/op
BenchmarkAllocateAction_MediumCluster-4        	       8	 135712914 ns/op	12676719 B/op	  312797 allocs/op
BenchmarkAllocateAction_MediumCluster-4        	       8	 137272276 ns/op	12672359 B/op	  312799 allocs/op
BenchmarkAllocateAction_LargeCluster-4         	       5	 220393337 ns/op	43723467 B/op	 1338191 allocs/op
BenchmarkAllocateAction_LargeCluster-4         	       5	 215071294 ns/op	43727558 B/op	 1338188 allocs/op
BenchmarkAllocateAction_LargeCluster-4         	       5	 230158282 ns/op	43722121 B/op	 1338189 allocs/op
BenchmarkAllocateAction_LargeCluster-4         	       5	 227083205 ns/op	43737342 B/op	 1338187 allocs/op
BenchmarkAllocateAction_LargeCluster-4         	       5	 227632505 ns/op	43750483 B/op	 1338176 allocs/op
BenchmarkAllocateAction_LargeCluster-4         	       5	 249804650 ns/op	43722819 B/op	 1338190 allocs/op
BenchmarkReclaimAction_SmallCluster-4          	      10	 102880130 ns/op	  930039 B/op	    8179 allocs/op
BenchmarkReclaimAction_SmallCluster-4          	      10	 102903929 ns/op	  933687 B/op	    8189 allocs/op
BenchmarkReclaimAction_SmallCluster-4          	      10	 102949504 ns/op	  937992 B/op	    8205 allocs/op
BenchmarkReclaimAction_SmallCluster-4          	      10	 102894093 ns/op	  938639 B/op	    8207 allocs/op
BenchmarkReclaimAction_SmallCluster-4          	      10	 102869151 ns/op	  934722 B/op	    8205 allocs/op
BenchmarkReclaimAction_SmallCluster-4          	      10	 102852412 ns/op	  934719 B/op	    8205 allocs/op
BenchmarkReclaimAction_MediumCluster-4         	      10	 106074249 ns/op	 3142118 B/op	   26146 allocs/op
BenchmarkReclaimAction_MediumCluster-4         	      10	 106027512 ns/op	 3142040 B/op	   26147 allocs/op
BenchmarkReclaimAction_MediumCluster-4         	      10	 105725817 ns/op	 3150828 B/op	   26148 allocs/op
BenchmarkReclaimAction_MediumCluster-4         	      10	 105695886 ns/op	 3145949 B/op	   26148 allocs/op
BenchmarkReclaimAction_MediumCluster-4         	      10	 105941332 ns/op	 3146199 B/op	   26150 allocs/op
BenchmarkReclaimAction_MediumCluster-4         	      10	 105694889 ns/op	 3138084 B/op	   26146 allocs/op
BenchmarkPreemptAction_SmallCluster-4          	      10	 103643554 ns/op	 1185661 B/op	   11922 allocs/op
BenchmarkPreemptAction_SmallCluster-4          	      10	 103556212 ns/op	 1189375 B/op	   11923 allocs/op
BenchmarkPreemptAction_SmallCluster-4          	      10	 103501710 ns/op	 1181936 B/op	   11921 allocs/op
BenchmarkPreemptAction_SmallCluster-4          	      10	 103688354 ns/op	 1185524 B/op	   11922 allocs/op
BenchmarkPreemptAction_SmallCluster-4          	      10	 103565358 ns/op	 1189602 B/op	   11924 allocs/op
BenchmarkPreemptAction_SmallCluster-4          	      10	 103668753 ns/op	 1189540 B/op	   11923 allocs/op
BenchmarkPreemptAction_MediumCluster-4         	      10	 111427452 ns/op	 4823500 B/op	   42035 allocs/op
BenchmarkPreemptAction_MediumCluster-4         	      10	 110798115 ns/op	 4827476 B/op	   42036 allocs/op
BenchmarkPreemptAction_MediumCluster-4         	      10	 110567481 ns/op	 4826558 B/op	   42035 allocs/op
BenchmarkPreemptAction_MediumCluster-4         	      10	 112247896 ns/op	 4827430 B/op	   42037 allocs/op
BenchmarkPreemptAction_MediumCluster-4         	      10	 111723501 ns/op	 4823996 B/op	   42037 allocs/op
BenchmarkPreemptAction_MediumCluster-4         	      10	 111702453 ns/op	 4828113 B/op	   42039 allocs/op
BenchmarkConsolidationAction_SmallCluster-4    	       9	 123946858 ns/op	10258087 B/op	  127848 allocs/op
BenchmarkConsolidationAction_SmallCluster-4    	       9	 124283826 ns/op	10256451 B/op	  127851 allocs/op
BenchmarkConsolidationAction_SmallCluster-4    	       9	 124600288 ns/op	10256568 B/op	  127852 allocs/op
BenchmarkConsolidationAction_SmallCluster-4    	       9	 124107571 ns/op	10250496 B/op	  127802 allocs/op

Main branch:

goos: linux
goarch: amd64
pkg: github.com/kai-scheduler/KAI-scheduler/pkg/scheduler/actions
cpu: Intel(R) Xeon(R) Platinum 8370C CPU @ 2.80GHz
BenchmarkAllocateAction_SmallCluster-4         	      10	 108111463 ns/op	 2362683 B/op	   35036 allocs/op
BenchmarkAllocateAction_SmallCluster-4         	      10	 107940448 ns/op	 2332321 B/op	   35023 allocs/op
BenchmarkAllocateAction_SmallCluster-4         	      10	 107842705 ns/op	 2332983 B/op	   35024 allocs/op
BenchmarkAllocateAction_SmallCluster-4         	      10	 108092700 ns/op	 2330239 B/op	   35015 allocs/op
BenchmarkAllocateAction_SmallCluster-4         	      10	 108172568 ns/op	 2337722 B/op	   35027 allocs/op
BenchmarkAllocateAction_SmallCluster-4         	      10	 107761182 ns/op	 2331207 B/op	   35018 allocs/op
BenchmarkAllocateAction_MediumCluster-4        	       8	 135267608 ns/op	12675956 B/op	  312819 allocs/op
BenchmarkAllocateAction_MediumCluster-4        	       8	 135335881 ns/op	12672784 B/op	  312800 allocs/op
BenchmarkAllocateAction_MediumCluster-4        	       8	 135521836 ns/op	12675041 B/op	  312801 allocs/op
BenchmarkAllocateAction_MediumCluster-4        	       8	 136121022 ns/op	12673139 B/op	  312798 allocs/op
BenchmarkAllocateAction_MediumCluster-4        	       8	 135760515 ns/op	12673933 B/op	  312807 allocs/op
BenchmarkAllocateAction_MediumCluster-4        	       8	 136053707 ns/op	12672079 B/op	  312800 allocs/op
BenchmarkAllocateAction_LargeCluster-4         	       5	 212427153 ns/op	43745140 B/op	 1338186 allocs/op
BenchmarkAllocateAction_LargeCluster-4         	       5	 212674599 ns/op	43726768 B/op	 1338176 allocs/op
BenchmarkAllocateAction_LargeCluster-4         	       5	 212923283 ns/op	43724179 B/op	 1338199 allocs/op
BenchmarkAllocateAction_LargeCluster-4         	       5	 211295614 ns/op	43722955 B/op	 1338194 allocs/op
BenchmarkAllocateAction_LargeCluster-4         	       5	 225375112 ns/op	43745456 B/op	 1338192 allocs/op
BenchmarkAllocateAction_LargeCluster-4         	       5	 211169359 ns/op	43721252 B/op	 1338179 allocs/op
BenchmarkReclaimAction_SmallCluster-4          	      10	 102925118 ns/op	  929922 B/op	    8178 allocs/op
BenchmarkReclaimAction_SmallCluster-4          	      10	 102778326 ns/op	  934287 B/op	    8193 allocs/op
BenchmarkReclaimAction_SmallCluster-4          	      10	 102806119 ns/op	  941674 B/op	    8205 allocs/op
BenchmarkReclaimAction_SmallCluster-4          	      10	 102789108 ns/op	  938373 B/op	    8205 allocs/op
BenchmarkReclaimAction_SmallCluster-4          	      10	 102888225 ns/op	  938868 B/op	    8207 allocs/op
BenchmarkReclaimAction_SmallCluster-4          	      10	 102797366 ns/op	  938455 B/op	    8206 allocs/op
BenchmarkReclaimAction_MediumCluster-4         	      10	 106531764 ns/op	 3142476 B/op	   26149 allocs/op
BenchmarkReclaimAction_MediumCluster-4         	      10	 106193669 ns/op	 3146379 B/op	   26150 allocs/op
BenchmarkReclaimAction_MediumCluster-4         	      10	 105985344 ns/op	 3151028 B/op	   26149 allocs/op
BenchmarkReclaimAction_MediumCluster-4         	      10	 105735556 ns/op	 3146257 B/op	   26149 allocs/op
BenchmarkReclaimAction_MediumCluster-4         	      10	 105617437 ns/op	 3145984 B/op	   26148 allocs/op
BenchmarkReclaimAction_MediumCluster-4         	      10	 105762184 ns/op	 3146037 B/op	   26148 allocs/op
BenchmarkPreemptAction_SmallCluster-4          	      10	 103611249 ns/op	 1188400 B/op	   11898 allocs/op
BenchmarkPreemptAction_SmallCluster-4          	      10	 103642015 ns/op	 1188386 B/op	   11898 allocs/op
BenchmarkPreemptAction_SmallCluster-4          	      10	 103675799 ns/op	 1180637 B/op	   11896 allocs/op
BenchmarkPreemptAction_SmallCluster-4          	      10	 103751708 ns/op	 1188471 B/op	   11899 allocs/op
BenchmarkPreemptAction_SmallCluster-4          	      10	 103500223 ns/op	 1184485 B/op	   11897 allocs/op
BenchmarkPreemptAction_SmallCluster-4          	      10	 103686111 ns/op	 1188292 B/op	   11898 allocs/op
BenchmarkPreemptAction_MediumCluster-4         	      10	 111907147 ns/op	 4822875 B/op	   41938 allocs/op
BenchmarkPreemptAction_MediumCluster-4         	      10	 112119761 ns/op	 4818947 B/op	   41935 allocs/op
BenchmarkPreemptAction_MediumCluster-4         	      10	 110927870 ns/op	 4825753 B/op	   41938 allocs/op
BenchmarkPreemptAction_MediumCluster-4         	      10	 110681490 ns/op	 4822369 B/op	   41935 allocs/op
BenchmarkPreemptAction_MediumCluster-4         	      10	 110663490 ns/op	 4822808 B/op	   41937 allocs/op
BenchmarkPreemptAction_MediumCluster-4         	      10	 111901657 ns/op	 4819048 B/op	   41937 allocs/op
BenchmarkConsolidationAction_SmallCluster-4    	       9	 123341046 ns/op	10253395 B/op	  127756 allocs/op
BenchmarkConsolidationAction_SmallCluster-4    	       9	 124345407 ns/op	10249306 B/op	  127739 allocs/op
BenchmarkConsolidationAction_SmallCluster-4    	       9	 124380668 ns/op	10247491 B/op	  127710 allocs/op
BenchmarkConsolidationAction_SmallCluster-4    	       9	 124128081 ns/op	10252620 B/op	  127745 allocs/op

@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 5, 2026

Merging this branch changes the coverage (1 decrease, 2 increase)

Impacted Packages Coverage Δ 🤖
github.com/kai-scheduler/KAI-scheduler/pkg/common/constants 0.00% (ø)
github.com/kai-scheduler/KAI-scheduler/pkg/podgrouper 70.19% (+6.46%) 👍
github.com/kai-scheduler/KAI-scheduler/pkg/scheduler/api/podgroup_info 67.44% (+6.70%) 👍
github.com/kai-scheduler/KAI-scheduler/pkg/scheduler/cache 34.40% (ø)
github.com/kai-scheduler/KAI-scheduler/pkg/scheduler/cache/cluster_info 83.18% (ø)
github.com/kai-scheduler/KAI-scheduler/pkg/scheduler/cache/status_updater 77.23% (-2.59%) 👎

Coverage by file

Changed files (no unit tests)

Changed File Coverage Δ Total Covered Missed 🤖
github.com/kai-scheduler/KAI-scheduler/pkg/common/constants/constants.go 0.00% (ø) 2 0 2
github.com/kai-scheduler/KAI-scheduler/pkg/podgrouper/pod_controller.go 70.19% (+6.46%) 104 (+13) 73 (+15) 31 (-2) 👍
github.com/kai-scheduler/KAI-scheduler/pkg/scheduler/api/podgroup_info/job_info.go 54.59% (+12.05%) 229 (+8) 125 (+31) 104 (-23) 🎉
github.com/kai-scheduler/KAI-scheduler/pkg/scheduler/cache/status_updater/default_status_updater.go 69.16% (-2.70%) 214 (+15) 148 (+5) 66 (+10) 👎

Please note that the "Total", "Covered", and "Missed" counts above refer to code statements instead of lines of code. The value in brackets refers to the test coverage of that file in the old version of the code.

Changed unit test files

  • github.com/kai-scheduler/KAI-scheduler/pkg/podgrouper/pod_controller_test.go
  • github.com/kai-scheduler/KAI-scheduler/pkg/scheduler/api/podgroup_info/job_info_test.go
  • github.com/kai-scheduler/KAI-scheduler/pkg/scheduler/cache/cluster_info/cluster_info_test.go
  • github.com/kai-scheduler/KAI-scheduler/pkg/scheduler/cache/record_job_status_event_test.go

@enoodle
Copy link
Copy Markdown
Collaborator Author

enoodle commented May 6, 2026

@coderabbitai review

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 6, 2026

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
pkg/scheduler/api/podgroup_info/job_info.go (1)

482-531: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Clone() silently drops InvalidSubGroupTasks.

addInvalidSubGroupTask (line 602) only writes to pgi.InvalidSubGroupTasks — invalid tasks are never added to any PodSet or to PodStatusIndex. Because GetAllPodsMap iterates pgi.PodSets only, Clone()'s call CloneWithTasks(maps.Values(pgi.GetAllPodsMap())) will not pass invalid-subgroup tasks to the clone, and CloneWithTasks initializes the new InvalidSubGroupTasks empty. The result is that any code path that consumes a cloned PodGroupInfo (scenario simulations, fork/restore in actions) will see those tasks as completely absent rather than as invalid, which can desync the status updater path that relies on GetInvalidSubGroupTasks/IsInvalidSubGroupTask.

Either include them in the task list, or carry the map across explicitly:

🔧 Suggested fix
 func (pgi *PodGroupInfo) CloneWithTasks(tasks []*pod_info.PodInfo) *PodGroupInfo {
 	info := &PodGroupInfo{
 		...
 		PodStatusIndex:       map[pod_status.PodStatus]pod_info.PodsMap{},
 		InvalidSubGroupTasks: pod_info.PodsMap{},
 		activeAllocatedCount: ptr.To(0),
 	}

 	pgi.CreationTimestamp.DeepCopyInto(&info.CreationTimestamp)

 	info.RootSubGroupSet = pgi.RootSubGroupSet.Clone()
 	info.PodSets = info.RootSubGroupSet.GetDescendantPodSets()

 	for _, task := range tasks {
 		info.AddTaskInfo(task.Clone())
 	}
+
+	// Carry over invalid-subgroup tasks so callers of Clone()/CloneWithTasks
+	// see the same view as the source. AddTaskInfo will route them back to
+	// InvalidSubGroupTasks because the subgroup is still missing on the clone.
+	for _, task := range pgi.InvalidSubGroupTasks {
+		info.AddTaskInfo(task.Clone())
+	}

 	return info
 }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@pkg/scheduler/api/podgroup_info/job_info.go` around lines 482 - 531, Clone()
currently omits pgi.InvalidSubGroupTasks because it builds the task list from
GetAllPodsMap; fix by carrying InvalidSubGroupTasks into the clone instead of
dropping it: in Clone() either include maps.Values(pgi.InvalidSubGroupTasks)
when calling CloneWithTasks or, better, after creating info in CloneWithTasks
copy/clone pgi.InvalidSubGroupTasks into info.InvalidSubGroupTasks (preserving
the same pod IDs and pod_info.PodInfo pointers) so that methods like
GetInvalidSubGroupTasks/IsInvalidSubGroupTask continue to work; update the Clone
and/or CloneWithTasks implementations to perform this explicit map copy
(referencing PodGroupInfo.InvalidSubGroupTasks, Clone(), CloneWithTasks(), and
addInvalidSubGroupTask).
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@examples/batch/external-podgroup-job.yaml`:
- Around line 8-18: Remove the misleading top-level minMember from the PodGroup
example (the declared subGroups: - name: workers minMember: 2 means the
top-level minMember is ignored by setSubGroups) and instead document that
per-subGroup members drive sizing; also add a one-line comment in the YAML next
to the PodGroup metadata (or directly above the resource) stating that this
external PodGroup has no ownerReferences and must be deleted manually or by the
external controller after the Job completes (to warn about GC footgun).

In `@pkg/scheduler/cache/status_updater/default_status_updater.go`:
- Around line 384-404: In defaultStatusUpdater.recordInvalidSubGroupPodsEvents,
make the MarkUnschedulable behavior consistent with
recordUnschedulablePodsEvents by using the PodGroup's MarkUnschedulable flag
instead of hardcoding true when calling markTaskUnschedulable (use
job.PodGroup.MarkUnschedulable or equivalent), and add the missing debug branch
that logs the full detailed fit error (log.InfraLogger.V(6).Infof("Full fit
error: %s", fitError.DetailedError())) when detailedFitErrors is false/when
fitError exists, mirroring the logging in recordUnschedulablePodsEvents; keep
addNodePoolPrefixIfNeeded and existing error aggregation unchanged.

---

Outside diff comments:
In `@pkg/scheduler/api/podgroup_info/job_info.go`:
- Around line 482-531: Clone() currently omits pgi.InvalidSubGroupTasks because
it builds the task list from GetAllPodsMap; fix by carrying InvalidSubGroupTasks
into the clone instead of dropping it: in Clone() either include
maps.Values(pgi.InvalidSubGroupTasks) when calling CloneWithTasks or, better,
after creating info in CloneWithTasks copy/clone pgi.InvalidSubGroupTasks into
info.InvalidSubGroupTasks (preserving the same pod IDs and pod_info.PodInfo
pointers) so that methods like GetInvalidSubGroupTasks/IsInvalidSubGroupTask
continue to work; update the Clone and/or CloneWithTasks implementations to
perform this explicit map copy (referencing PodGroupInfo.InvalidSubGroupTasks,
Clone(), CloneWithTasks(), and addInvalidSubGroupTask).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 43eebb4f-1b5a-48ea-87eb-84b4df202c64

📥 Commits

Reviewing files that changed from the base of the PR and between 5a7cd8e and e98b4cc.

📒 Files selected for processing (14)
  • CHANGELOG.md
  • docs/batch/README.md
  • docs/developer/pod-grouper.md
  • examples/batch/README.md
  • examples/batch/external-podgroup-job.yaml
  • pkg/common/constants/constants.go
  • pkg/podgrouper/pod_controller.go
  • pkg/podgrouper/pod_controller_test.go
  • pkg/scheduler/api/podgroup_info/job_info.go
  • pkg/scheduler/api/podgroup_info/job_info_test.go
  • pkg/scheduler/cache/cluster_info/cluster_info_test.go
  • pkg/scheduler/cache/record_job_status_event_test.go
  • pkg/scheduler/cache/status_updater/default_status_updater.go
  • test/e2e/suites/integrations/k8s_native/k8s_native_specs.go

Comment thread examples/batch/external-podgroup-job.yaml
Comment thread pkg/scheduler/cache/status_updater/default_status_updater.go
@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 6, 2026

Merging this branch changes the coverage (1 decrease, 2 increase)

Impacted Packages Coverage Δ 🤖
github.com/kai-scheduler/KAI-scheduler/pkg/common/constants 0.00% (ø)
github.com/kai-scheduler/KAI-scheduler/pkg/podgrouper 70.19% (+6.46%) 👍
github.com/kai-scheduler/KAI-scheduler/pkg/scheduler/api/podgroup_info 67.44% (+6.70%) 👍
github.com/kai-scheduler/KAI-scheduler/pkg/scheduler/cache 34.40% (ø)
github.com/kai-scheduler/KAI-scheduler/pkg/scheduler/cache/cluster_info 83.18% (ø)
github.com/kai-scheduler/KAI-scheduler/pkg/scheduler/cache/status_updater 77.23% (-2.59%) 👎
github.com/kai-scheduler/KAI-scheduler/test/e2e/suites/integrations/k8s_native 0.00% (ø)

Coverage by file

Changed files (no unit tests)

Changed File Coverage Δ Total Covered Missed 🤖
github.com/kai-scheduler/KAI-scheduler/pkg/common/constants/constants.go 0.00% (ø) 2 0 2
github.com/kai-scheduler/KAI-scheduler/pkg/podgrouper/pod_controller.go 70.19% (+6.46%) 104 (+13) 73 (+15) 31 (-2) 👍
github.com/kai-scheduler/KAI-scheduler/pkg/scheduler/api/podgroup_info/job_info.go 54.59% (+12.05%) 229 (+8) 125 (+31) 104 (-23) 🎉
github.com/kai-scheduler/KAI-scheduler/pkg/scheduler/cache/status_updater/default_status_updater.go 69.16% (-2.70%) 214 (+15) 148 (+5) 66 (+10) 👎
github.com/kai-scheduler/KAI-scheduler/test/e2e/suites/integrations/k8s_native/k8s_native_specs.go 0.00% (ø) 0 0 0

Please note that the "Total", "Covered", and "Missed" counts above refer to code statements instead of lines of code. The value in brackets refers to the test coverage of that file in the old version of the code.

Changed unit test files

  • github.com/kai-scheduler/KAI-scheduler/pkg/podgrouper/pod_controller_test.go
  • github.com/kai-scheduler/KAI-scheduler/pkg/scheduler/api/podgroup_info/job_info_test.go
  • github.com/kai-scheduler/KAI-scheduler/pkg/scheduler/cache/cluster_info/cluster_info_test.go
  • github.com/kai-scheduler/KAI-scheduler/pkg/scheduler/cache/record_job_status_event_test.go

Comment thread pkg/scheduler/cache/status_updater/default_status_updater.go
@github-actions
Copy link
Copy Markdown

Merging this branch changes the coverage (1 decrease, 2 increase)

Impacted Packages Coverage Δ 🤖
github.com/kai-scheduler/KAI-scheduler/pkg/common/constants 0.00% (ø)
github.com/kai-scheduler/KAI-scheduler/pkg/podgrouper 70.19% (+6.46%) 👍
github.com/kai-scheduler/KAI-scheduler/pkg/scheduler/api/podgroup_info 67.44% (+6.70%) 👍
github.com/kai-scheduler/KAI-scheduler/pkg/scheduler/cache 34.40% (ø)
github.com/kai-scheduler/KAI-scheduler/pkg/scheduler/cache/cluster_info 83.18% (ø)
github.com/kai-scheduler/KAI-scheduler/pkg/scheduler/cache/status_updater 77.23% (-1.68%) 👎
github.com/kai-scheduler/KAI-scheduler/test/e2e/suites/integrations/k8s_native 0.00% (ø)

Coverage by file

Changed files (no unit tests)

Changed File Coverage Δ Total Covered Missed 🤖
github.com/kai-scheduler/KAI-scheduler/pkg/common/constants/constants.go 0.00% (ø) 2 0 2
github.com/kai-scheduler/KAI-scheduler/pkg/podgrouper/pod_controller.go 70.19% (+6.46%) 104 (+13) 73 (+15) 31 (-2) 👍
github.com/kai-scheduler/KAI-scheduler/pkg/scheduler/api/podgroup_info/job_info.go 54.59% (+12.05%) 229 (+8) 125 (+31) 104 (-23) 🎉
github.com/kai-scheduler/KAI-scheduler/pkg/scheduler/cache/status_updater/default_status_updater.go 69.16% (-2.70%) 214 (+15) 148 (+5) 66 (+10) 👎
github.com/kai-scheduler/KAI-scheduler/test/e2e/suites/integrations/k8s_native/k8s_native_specs.go 0.00% (ø) 0 0 0

Please note that the "Total", "Covered", and "Missed" counts above refer to code statements instead of lines of code. The value in brackets refers to the test coverage of that file in the old version of the code.

Changed unit test files

  • github.com/kai-scheduler/KAI-scheduler/pkg/podgrouper/pod_controller_test.go
  • github.com/kai-scheduler/KAI-scheduler/pkg/scheduler/api/podgroup_info/job_info_test.go
  • github.com/kai-scheduler/KAI-scheduler/pkg/scheduler/cache/cluster_info/cluster_info_test.go
  • github.com/kai-scheduler/KAI-scheduler/pkg/scheduler/cache/record_job_status_event_test.go

@enoodle enoodle enabled auto-merge May 12, 2026 22:17
@enoodle enoodle added this pull request to the merge queue May 12, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Support externally-created PodGroups for cross-workload atomic scheduling

2 participants