Skip to content

fix(scheduler): account for native sidecar requests in pod resources#1556

Open
joeltg wants to merge 3 commits into
kai-scheduler:mainfrom
joeltg:fix/native-sidecar-accounting
Open

fix(scheduler): account for native sidecar requests in pod resources#1556
joeltg wants to merge 3 commits into
kai-scheduler:mainfrom
joeltg:fix/native-sidecar-accounting

Conversation

@joeltg
Copy link
Copy Markdown

@joeltg joeltg commented May 6, 2026

Summary

getPodResourceRequest (in pkg/scheduler/api/pod_info/pod_info.go) treated every initContainer as sequential — max(running_sum, init.Requests) — which under-counts pods that use native sidecars (initContainers with restartPolicy: Always, KEP-753).

Per kubelet's admission accounting in k8s.io/component-helpers/resource.PodRequests (AggregateContainerRequests), native sidecars run concurrently with regular containers, so:

  • Steady-state running sum = main containers + all native sidecars
  • Init-phase peak = for each non-restartable initContainer, init.Requests + sum(native sidecars declared before it in spec order) — those sidecars are already started and running concurrently with the init
  • Result = max(steady-state, init-phase peak) plus pod overhead

Diverging from this lets the scheduler bind pods that kubelet then rejects with OutOfCpu/OutOfGpu/etc. at admission, leaving them in phase=Failed.

Production symptom

We hit this on a GKE cluster running pods with a 250m-CPU gke-gcsfuse-sidecar (restartPolicy: Always) alongside a 4-CPU main container. KAI under-counted each pod by 250m. Across many pods on the same n4-standard-80 node, the cumulative phantom headroom exceeded the node's slack and KAI bound pods that kubelet then rejected with OutOfCpu. Once a pod transitioned to Failed, IsActiveUsedStatus excluded it from nodeInfo.Requested on the next cycle, freeing the same phantom slot, and KAI bound another pod into it — a livelock that left 200+ Failed pods on a single node in our worst case.

After deploying this fix to our fork, Failed-pod accumulation on a comparable node dropped to single digits, and admission rejections went from constant to occasional.

Implementation

Single pass over Spec.InitContainers in spec order:

  • Native sidecar (restartPolicy: Always) → accumulate into a sidecarPrefix running-sum
  • Regular initContainer → initPhasePeak = max(initPhasePeak, init.Requests + sidecarPrefix)

After the loop:

  • result += sidecarPrefix (steady-state includes all sidecars)
  • result = max(result, initPhasePeak) (peak across init phases)

This required a new Add method:

  • GpuResourceRequirement.Add — sums count, portion (when both non-zero portions match, else error, mirroring SetMaxResource), draGpuCounts, migResources
  • ResourceRequirements.Add — wrapper that sums both BaseResource and GpuResourceRequirement. Without this, BaseResource.Add was reachable via Go method promotion but silently dropped GPU/MIG resources from sidecars.

Test plan

  • New unit cases in TestGetPodResourceRequest for:
    • Pod with one native sidecar (CPU+memory only) — verifies sidecar is added to running sum, not max'd
    • Regular init dominates and includes preceding native sidecar — verifies init + sidecarPrefix math
    • Native sidecar with GPU — verifies GPU/MIG half is summed (not silently dropped via method promotion)
  • All existing pkg/scheduler/... tests pass
  • Verified in production on Reflection AI fork (see "Production symptom" above)

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 6, 2026

Important

Review skipped

Auto reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 8bdbaa5e-99cd-4aee-9d29-8079bc1c86a7

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Tip

💬 Introducing Slack Agent: The best way for teams to turn conversations into code.

Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.

  • Generate code and open pull requests
  • Plan features and break down work
  • Investigate incidents and troubleshoot customer tickets together
  • Automate recurring tasks and respond to alerts with triggers
  • Summarize progress and report instantly

Built for teams:

  • Shared memory across your entire org—no repeating context
  • Per-thread sandboxes to safely plan and execute work
  • Governance built-in—scoped access, auditability, and budget controls

One agent for your entire SDLC. Right inside Slack.

👉 Get started


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Comment thread pkg/scheduler/api/resource_info/resource_requirment.go Outdated
Comment thread pkg/scheduler/api/pod_info/pod_info.go Outdated
@enoodle
Copy link
Copy Markdown
Collaborator

enoodle commented May 7, 2026

Looks good,

I have a few minor comments
Can you also update the change log file ? ( we are working on something to replace the need to update it , but it is not there yet)

@joeltg joeltg marked this pull request as ready for review May 7, 2026 14:07
enoodle
enoodle previously approved these changes May 7, 2026
@enoodle
Copy link
Copy Markdown
Collaborator

enoodle commented May 7, 2026

Thanks for the fix @joeltg !
We need you to "sign off" on the commits

@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 7, 2026

📊 Performance Benchmark Results

Comparing PR (fix/native-sidecar-accounting) vs main branch — click to expand
goos: linux
goarch: amd64
pkg: github.com/kai-scheduler/KAI-scheduler/pkg/scheduler/actions
cpu: AMD EPYC 7763 64-Core Processor                
                                    │ main-bench.txt │           pr-bench.txt            │
                                    │     sec/op     │   sec/op     vs base              │
AllocateAction_SmallCluster-4            108.3m ± 1%   108.3m ± 0%       ~ (p=0.937 n=6)
AllocateAction_MediumCluster-4           136.8m ± 0%   137.2m ± 1%       ~ (p=0.180 n=6)
AllocateAction_LargeCluster-4            213.3m ± 8%   214.7m ± 3%       ~ (p=0.699 n=6)
ReclaimAction_SmallCluster-4             103.0m ± 0%   103.0m ± 0%       ~ (p=0.093 n=6)
ReclaimAction_MediumCluster-4            106.0m ± 0%   106.0m ± 1%       ~ (p=0.818 n=6)
PreemptAction_SmallCluster-4             103.8m ± 0%   103.7m ± 0%  -0.07% (p=0.002 n=6)
PreemptAction_MediumCluster-4            112.4m ± 1%   111.0m ± 1%  -1.29% (p=0.002 n=6)
ConsolidationAction_SmallCluster-4       124.8m ± 0%   124.9m ± 1%       ~ (p=0.485 n=6)
ConsolidationAction_MediumCluster-4      289.2m ± 2%   288.4m ± 2%       ~ (p=0.699 n=6)
FullSchedulingCycle_SmallCluster-4       105.6m ± 0%   105.6m ± 0%       ~ (p=0.937 n=6)
FullSchedulingCycle_MediumCluster-4      121.5m ± 1%   120.9m ± 2%       ~ (p=0.240 n=6)
FullSchedulingCycle_LargeCluster-4       162.9m ± 2%   162.2m ± 1%       ~ (p=0.589 n=6)
ManyQueues_MediumCluster-4               138.8m ± 1%   139.9m ± 1%       ~ (p=0.093 n=6)
GangScheduling_MediumCluster-4           161.5m ± 2%   161.0m ± 2%       ~ (p=0.699 n=6)
geomean                                  135.2m        135.1m       -0.07%

                                    │ main-bench.txt │            pr-bench.txt            │
                                    │      B/op      │     B/op      vs base              │
AllocateAction_SmallCluster-4           2.224Mi ± 0%   2.233Mi ± 0%  +0.40% (p=0.041 n=6)
AllocateAction_MediumCluster-4          12.09Mi ± 0%   12.12Mi ± 0%  +0.30% (p=0.002 n=6)
AllocateAction_LargeCluster-4           41.70Mi ± 0%   41.84Mi ± 0%  +0.35% (p=0.002 n=6)
ReclaimAction_SmallCluster-4            911.7Ki ± 1%   930.5Ki ± 1%  +2.06% (p=0.002 n=6)
ReclaimAction_MediumCluster-4           2.998Mi ± 0%   3.076Mi ± 0%  +2.59% (p=0.002 n=6)
PreemptAction_SmallCluster-4            1.131Mi ± 0%   1.061Mi ± 0%  -6.26% (p=0.002 n=6)
PreemptAction_MediumCluster-4           4.598Mi ± 0%   4.323Mi ± 0%  -6.00% (p=0.002 n=6)
ConsolidationAction_SmallCluster-4      9.775Mi ± 0%   9.998Mi ± 0%  +2.28% (p=0.002 n=6)
ConsolidationAction_MediumCluster-4     88.71Mi ± 0%   89.59Mi ± 0%  +0.99% (p=0.002 n=6)
FullSchedulingCycle_SmallCluster-4      1.424Mi ± 0%   1.434Mi ± 0%  +0.72% (p=0.002 n=6)
FullSchedulingCycle_MediumCluster-4     7.042Mi ± 0%   7.100Mi ± 0%  +0.81% (p=0.002 n=6)
FullSchedulingCycle_LargeCluster-4      23.11Mi ± 0%   23.29Mi ± 0%  +0.75% (p=0.002 n=6)
ManyQueues_MediumCluster-4              16.55Mi ± 0%   16.58Mi ± 0%  +0.19% (p=0.002 n=6)
GangScheduling_MediumCluster-4          17.49Mi ± 0%   17.56Mi ± 0%  +0.39% (p=0.002 n=6)
geomean                                 7.146Mi        7.141Mi       -0.07%

                                    │ main-bench.txt │           pr-bench.txt            │
                                    │   allocs/op    │  allocs/op   vs base              │
AllocateAction_SmallCluster-4            35.02k ± 0%   35.81k ± 0%  +2.24% (p=0.002 n=6)
AllocateAction_MediumCluster-4           312.8k ± 0%   316.0k ± 0%  +1.02% (p=0.002 n=6)
AllocateAction_LargeCluster-4            1.338M ± 0%   1.346M ± 0%  +0.60% (p=0.002 n=6)
ReclaimAction_SmallCluster-4             8.204k ± 0%   8.601k ± 0%  +4.85% (p=0.002 n=6)
ReclaimAction_MediumCluster-4            26.15k ± 0%   27.74k ± 0%  +6.11% (p=0.002 n=6)
PreemptAction_SmallCluster-4             11.90k ± 0%   11.44k ± 0%  -3.82% (p=0.002 n=6)
PreemptAction_MediumCluster-4            41.94k ± 0%   40.18k ± 0%  -4.20% (p=0.002 n=6)
ConsolidationAction_SmallCluster-4       127.7k ± 0%   131.0k ± 0%  +2.56% (p=0.002 n=6)
ConsolidationAction_MediumCluster-4      1.298M ± 0%   1.311M ± 0%  +1.00% (p=0.002 n=6)
FullSchedulingCycle_SmallCluster-4       20.68k ± 0%   21.27k ± 0%  +2.83% (p=0.002 n=6)
FullSchedulingCycle_MediumCluster-4      168.3k ± 0%   170.7k ± 0%  +1.42% (p=0.002 n=6)
FullSchedulingCycle_LargeCluster-4       698.7k ± 0%   704.6k ± 0%  +0.86% (p=0.002 n=6)
ManyQueues_MediumCluster-4               350.7k ± 0%   353.8k ± 0%  +0.90% (p=0.002 n=6)
GangScheduling_MediumCluster-4           571.8k ± 0%   578.2k ± 0%  +1.11% (p=0.002 n=6)
geomean                                  119.9k        121.3k       +1.21%

pkg: github.com/kai-scheduler/KAI-scheduler/pkg/scheduler/actions/integration_tests/reclaim
                            │ main-bench.txt │           pr-bench.txt            │
                            │     sec/op     │   sec/op     vs base              │
ReclaimLargeJobs_10Node-4        105.4m ± 1%   105.6m ± 1%  +0.22% (p=0.026 n=6)
ReclaimLargeJobs_50Node-4        230.3m ± 1%   234.6m ± 0%  +1.87% (p=0.002 n=6)
ReclaimLargeJobs_100Node-4       385.3m ± 7%   389.1m ± 1%       ~ (p=0.065 n=6)
ReclaimLargeJobs_200Node-4       782.3m ± 2%   779.9m ± 6%       ~ (p=0.937 n=6)
ReclaimLargeJobs_500Node-4        2.463 ± 2%    2.503 ± 2%  +1.63% (p=0.041 n=6)
ReclaimLargeJobs_1000Node-4       7.194 ± 1%    7.574 ± 1%  +5.27% (p=0.002 n=6)
geomean                          711.4m        722.7m       +1.60%

                            │ main-bench.txt │            pr-bench.txt            │
                            │      B/op      │     B/op      vs base              │
ReclaimLargeJobs_10Node-4       1.990Mi ± 3%   2.047Mi ± 3%  +2.90% (p=0.026 n=6)
ReclaimLargeJobs_50Node-4       59.63Mi ± 0%   61.04Mi ± 0%  +2.36% (p=0.002 n=6)
ReclaimLargeJobs_100Node-4      119.3Mi ± 0%   122.1Mi ± 0%  +2.29% (p=0.002 n=6)
ReclaimLargeJobs_200Node-4      241.0Mi ± 0%   246.4Mi ± 0%  +2.23% (p=0.002 n=6)
ReclaimLargeJobs_500Node-4      618.5Mi ± 0%   631.5Mi ± 0%  +2.11% (p=0.002 n=6)
ReclaimLargeJobs_1000Node-4     1.262Gi ± 0%   1.288Gi ± 0%  +2.05% (p=0.002 n=6)
geomean                         118.2Mi        120.9Mi       +2.32%

                            │ main-bench.txt │           pr-bench.txt            │
                            │   allocs/op    │  allocs/op   vs base              │
ReclaimLargeJobs_10Node-4        21.97k ± 2%   23.00k ± 2%  +4.71% (p=0.002 n=6)
ReclaimLargeJobs_50Node-4        801.9k ± 0%   821.3k ± 0%  +2.42% (p=0.002 n=6)
ReclaimLargeJobs_100Node-4       1.596M ± 0%   1.633M ± 0%  +2.37% (p=0.002 n=6)
ReclaimLargeJobs_200Node-4       3.182M ± 0%   3.256M ± 0%  +2.34% (p=0.002 n=6)
ReclaimLargeJobs_500Node-4       7.963M ± 0%   8.148M ± 0%  +2.32% (p=0.002 n=6)
ReclaimLargeJobs_1000Node-4      16.01M ± 0%   16.38M ± 0%  +2.30% (p=0.002 n=6)
geomean                          1.500M        1.541M       +2.74%

pkg: github.com/kai-scheduler/KAI-scheduler/pkg/scheduler/actions/reclaim
                            │ main-bench.txt │                pr-bench.txt                │
                            │     sec/op     │     sec/op       vs base                   │
ReclaimWithMissingPVCJobs-4     2.380m ± 16%   13327.731m ± 2%  +559985.29% (p=0.002 n=6)

                            │ main-bench.txt │                  pr-bench.txt                   │
                            │      B/op      │        B/op         vs base                     │
ReclaimWithMissingPVCJobs-4     8.109Ki ± 1%   4324527.414Ki ± 0%  +53327405.68% (p=0.002 n=6)

                            │ main-bench.txt │                 pr-bench.txt                 │
                            │   allocs/op    │    allocs/op     vs base                     │
ReclaimWithMissingPVCJobs-4       154.0 ± 1%   21149156.5 ± 0%  +13733118.51% (p=0.002 n=6)

Legend

  • 📉 Negative delta = Performance improvement (faster)
  • 📈 Positive delta = Performance regression (slower)
  • p-value < 0.05 indicates statistically significant change
Raw benchmark data

PR branch:

goos: linux
goarch: amd64
pkg: github.com/kai-scheduler/KAI-scheduler/pkg/scheduler/actions
cpu: AMD EPYC 7763 64-Core Processor                
BenchmarkAllocateAction_SmallCluster-4         	      10	 108595298 ns/op	 2343344 B/op	   35812 allocs/op
BenchmarkAllocateAction_SmallCluster-4         	      10	 108307664 ns/op	 2340288 B/op	   35806 allocs/op
BenchmarkAllocateAction_SmallCluster-4         	      10	 108247688 ns/op	 2341562 B/op	   35810 allocs/op
BenchmarkAllocateAction_SmallCluster-4         	      10	 108410746 ns/op	 2342047 B/op	   35807 allocs/op
BenchmarkAllocateAction_SmallCluster-4         	      10	 108277148 ns/op	 2341978 B/op	   35805 allocs/op
BenchmarkAllocateAction_SmallCluster-4         	      10	 108378900 ns/op	 2341644 B/op	   35807 allocs/op
BenchmarkAllocateAction_MediumCluster-4        	       8	 137118287 ns/op	12714978 B/op	  315994 allocs/op
BenchmarkAllocateAction_MediumCluster-4        	       8	 136295352 ns/op	12711203 B/op	  315985 allocs/op
BenchmarkAllocateAction_MediumCluster-4        	       8	 137009353 ns/op	12709443 B/op	  315988 allocs/op
BenchmarkAllocateAction_MediumCluster-4        	       8	 137352292 ns/op	12707907 B/op	  315978 allocs/op
BenchmarkAllocateAction_MediumCluster-4        	       8	 138596085 ns/op	12709061 B/op	  315986 allocs/op
BenchmarkAllocateAction_MediumCluster-4        	       8	 137569953 ns/op	12718099 B/op	  315990 allocs/op
BenchmarkAllocateAction_LargeCluster-4         	       5	 214195929 ns/op	43876718 B/op	 1346160 allocs/op
BenchmarkAllocateAction_LargeCluster-4         	       5	 210540733 ns/op	43876480 B/op	 1346157 allocs/op
BenchmarkAllocateAction_LargeCluster-4         	       5	 215725009 ns/op	43876668 B/op	 1346164 allocs/op
BenchmarkAllocateAction_LargeCluster-4         	       5	 214354670 ns/op	43877219 B/op	 1346170 allocs/op
BenchmarkAllocateAction_LargeCluster-4         	       5	 214998033 ns/op	43876224 B/op	 1346164 allocs/op
BenchmarkAllocateAction_LargeCluster-4         	       5	 220781972 ns/op	43882760 B/op	 1346159 allocs/op
BenchmarkReclaimAction_SmallCluster-4          	      10	 103345995 ns/op	  951338 B/op	    8581 allocs/op
BenchmarkReclaimAction_SmallCluster-4          	      10	 103125172 ns/op	  953960 B/op	    8588 allocs/op
BenchmarkReclaimAction_SmallCluster-4          	      10	 103036963 ns/op	  951663 B/op	    8601 allocs/op
BenchmarkReclaimAction_SmallCluster-4          	      10	 103030121 ns/op	  951581 B/op	    8601 allocs/op
BenchmarkReclaimAction_SmallCluster-4          	      10	 103032467 ns/op	  959124 B/op	    8603 allocs/op
BenchmarkReclaimAction_SmallCluster-4          	      10	 103019965 ns/op	  959391 B/op	    8604 allocs/op
BenchmarkReclaimAction_MediumCluster-4         	      10	 106911900 ns/op	 3225492 B/op	   27745 allocs/op
BenchmarkReclaimAction_MediumCluster-4         	      10	 105973705 ns/op	 3230092 B/op	   27744 allocs/op
BenchmarkReclaimAction_MediumCluster-4         	      10	 106088503 ns/op	 3229152 B/op	   27745 allocs/op
BenchmarkReclaimAction_MediumCluster-4         	      10	 106013656 ns/op	 3225404 B/op	   27745 allocs/op
BenchmarkReclaimAction_MediumCluster-4         	      10	 105975463 ns/op	 3221144 B/op	   27742 allocs/op
BenchmarkReclaimAction_MediumCluster-4         	      10	 105903304 ns/op	 3221340 B/op	   27743 allocs/op
BenchmarkPreemptAction_SmallCluster-4          	      10	 103752980 ns/op	 1116020 B/op	   11445 allocs/op
BenchmarkPreemptAction_SmallCluster-4          	      10	 103754347 ns/op	 1115919 B/op	   11444 allocs/op
BenchmarkPreemptAction_SmallCluster-4          	      10	 103713850 ns/op	 1112036 B/op	   11443 allocs/op
BenchmarkPreemptAction_SmallCluster-4          	      10	 103769167 ns/op	 1112170 B/op	   11444 allocs/op
BenchmarkPreemptAction_SmallCluster-4          	      10	 103566489 ns/op	 1107999 B/op	   11441 allocs/op
BenchmarkPreemptAction_SmallCluster-4          	      10	 103588874 ns/op	 1108399 B/op	   11443 allocs/op
BenchmarkPreemptAction_MediumCluster-4         	      10	 111247812 ns/op	 4532739 B/op	   40178 allocs/op
BenchmarkPreemptAction_MediumCluster-4         	      10	 110559860 ns/op	 4535068 B/op	   40175 allocs/op
BenchmarkPreemptAction_MediumCluster-4         	      10	 111066687 ns/op	 4532545 B/op	   40177 allocs/op
BenchmarkPreemptAction_MediumCluster-4         	      10	 110525376 ns/op	 4528915 B/op	   40176 allocs/op
BenchmarkPreemptAction_MediumCluster-4         	      10	 111606265 ns/op	 4532528 B/op	   40178 allocs/op
BenchmarkPreemptAction_MediumCluster-4         	      10	 110911887 ns/op	 4528579 B/op	   40176 allocs/op
BenchmarkConsolidationAction_SmallCluster-4    	       9	 124374608 ns/op	10483125 B/op	  130992 allocs/op
BenchmarkConsolidationAction_SmallCluster-4    	       9	 123581271 ns/op	10483248 B/op	  130987 allocs/op
BenchmarkConsolidationAction_SmallCluster-4    	       9	 125206278 ns/op	10490432 B/op	  130961 allocs/op
BenchmarkConsolidationAction_SmallCluster-4    	       9	 124934228 ns/op	10481968 B/op	  130983 allocs/op

Main branch:

goos: linux
goarch: amd64
pkg: github.com/kai-scheduler/KAI-scheduler/pkg/scheduler/actions
cpu: AMD EPYC 7763 64-Core Processor                
BenchmarkAllocateAction_SmallCluster-4         	      10	 108335240 ns/op	 2342544 B/op	   35031 allocs/op
BenchmarkAllocateAction_SmallCluster-4         	      10	 108193990 ns/op	 2331954 B/op	   35021 allocs/op
BenchmarkAllocateAction_SmallCluster-4         	      10	 108196800 ns/op	 2332607 B/op	   35022 allocs/op
BenchmarkAllocateAction_SmallCluster-4         	      10	 108472189 ns/op	 2333591 B/op	   35026 allocs/op
BenchmarkAllocateAction_SmallCluster-4         	      10	 109328966 ns/op	 2331621 B/op	   35021 allocs/op
BenchmarkAllocateAction_SmallCluster-4         	      10	 108308191 ns/op	 2332293 B/op	   35019 allocs/op
BenchmarkAllocateAction_MediumCluster-4        	       8	 137028705 ns/op	12674458 B/op	  312811 allocs/op
BenchmarkAllocateAction_MediumCluster-4        	       8	 136485707 ns/op	12672716 B/op	  312801 allocs/op
BenchmarkAllocateAction_MediumCluster-4        	       8	 136854275 ns/op	12677185 B/op	  312814 allocs/op
BenchmarkAllocateAction_MediumCluster-4        	       8	 136792645 ns/op	12672111 B/op	  312796 allocs/op
BenchmarkAllocateAction_MediumCluster-4        	       8	 137234160 ns/op	12672765 B/op	  312805 allocs/op
BenchmarkAllocateAction_MediumCluster-4        	       8	 136807617 ns/op	12673075 B/op	  312803 allocs/op
BenchmarkAllocateAction_LargeCluster-4         	       5	 212349420 ns/op	43721851 B/op	 1338177 allocs/op
BenchmarkAllocateAction_LargeCluster-4         	       5	 213092034 ns/op	43722819 B/op	 1338185 allocs/op
BenchmarkAllocateAction_LargeCluster-4         	       5	 211311004 ns/op	43724924 B/op	 1338205 allocs/op
BenchmarkAllocateAction_LargeCluster-4         	       5	 213582436 ns/op	43722137 B/op	 1338185 allocs/op
BenchmarkAllocateAction_LargeCluster-4         	       5	 230157763 ns/op	43721643 B/op	 1338184 allocs/op
BenchmarkAllocateAction_LargeCluster-4         	       5	 216176754 ns/op	43721580 B/op	 1338182 allocs/op
BenchmarkReclaimAction_SmallCluster-4          	      10	 102932686 ns/op	  926266 B/op	    8180 allocs/op
BenchmarkReclaimAction_SmallCluster-4          	      10	 102980493 ns/op	  933367 B/op	    8190 allocs/op
BenchmarkReclaimAction_SmallCluster-4          	      10	 103033640 ns/op	  933746 B/op	    8203 allocs/op
BenchmarkReclaimAction_SmallCluster-4          	      10	 102995436 ns/op	  934828 B/op	    8206 allocs/op
BenchmarkReclaimAction_SmallCluster-4          	      10	 103111639 ns/op	  938354 B/op	    8205 allocs/op
BenchmarkReclaimAction_SmallCluster-4          	      10	 102982346 ns/op	  930917 B/op	    8204 allocs/op
BenchmarkReclaimAction_MediumCluster-4         	      10	 106066977 ns/op	 3138523 B/op	   26147 allocs/op
BenchmarkReclaimAction_MediumCluster-4         	      10	 105973677 ns/op	 3146446 B/op	   26150 allocs/op
BenchmarkReclaimAction_MediumCluster-4         	      10	 106013750 ns/op	 3150669 B/op	   26148 allocs/op
BenchmarkReclaimAction_MediumCluster-4         	      10	 106276972 ns/op	 3145956 B/op	   26148 allocs/op
BenchmarkReclaimAction_MediumCluster-4         	      10	 105957780 ns/op	 3142149 B/op	   26148 allocs/op
BenchmarkReclaimAction_MediumCluster-4         	      10	 105955799 ns/op	 3142296 B/op	   26148 allocs/op
BenchmarkPreemptAction_SmallCluster-4          	      10	 103791854 ns/op	 1184447 B/op	   11897 allocs/op
BenchmarkPreemptAction_SmallCluster-4          	      10	 103798580 ns/op	 1188280 B/op	   11898 allocs/op
BenchmarkPreemptAction_SmallCluster-4          	      10	 103913983 ns/op	 1188228 B/op	   11898 allocs/op
BenchmarkPreemptAction_SmallCluster-4          	      10	 103822758 ns/op	 1184501 B/op	   11897 allocs/op
BenchmarkPreemptAction_SmallCluster-4          	      10	 103796666 ns/op	 1188676 B/op	   11900 allocs/op
BenchmarkPreemptAction_SmallCluster-4          	      10	 103986888 ns/op	 1184320 B/op	   11896 allocs/op
BenchmarkPreemptAction_MediumCluster-4         	      10	 112921651 ns/op	 4822864 B/op	   41938 allocs/op
BenchmarkPreemptAction_MediumCluster-4         	       9	 112064661 ns/op	 4820575 B/op	   41937 allocs/op
BenchmarkPreemptAction_MediumCluster-4         	       9	 112129788 ns/op	 4827958 B/op	   41938 allocs/op
BenchmarkPreemptAction_MediumCluster-4         	      10	 111936856 ns/op	 4822824 B/op	   41937 allocs/op
BenchmarkPreemptAction_MediumCluster-4         	       9	 113193269 ns/op	 4820296 B/op	   41936 allocs/op
BenchmarkPreemptAction_MediumCluster-4         	       9	 112742244 ns/op	 4816219 B/op	   41936 allocs/op
BenchmarkConsolidationAction_SmallCluster-4    	       9	 124394853 ns/op	10251859 B/op	  127735 allocs/op
BenchmarkConsolidationAction_SmallCluster-4    	       9	 124832078 ns/op	10247521 B/op	  127717 allocs/op
BenchmarkConsolidationAction_SmallCluster-4    	       9	 124865638 ns/op	10243521 B/op	  127682 allocs/op
BenchmarkConsolidationAction_SmallCluster-4    	       9	 124961160 ns/op	10243180 B/op	  127687 allocs/op

@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 7, 2026

Merging this branch changes the coverage (1 decrease, 1 increase)

Impacted Packages Coverage Δ 🤖
github.com/kai-scheduler/KAI-scheduler/pkg/scheduler/api/pod_info 69.14% (+1.46%) 👍
github.com/kai-scheduler/KAI-scheduler/pkg/scheduler/api/resource_info 48.55% (-1.75%) 👎

Coverage by file

Changed files (no unit tests)

Changed File Coverage Δ Total Covered Missed 🤖
github.com/kai-scheduler/KAI-scheduler/pkg/scheduler/api/pod_info/pod_info.go 66.29% (+2.26%) 175 (+11) 116 (+11) 59 👍
github.com/kai-scheduler/KAI-scheduler/pkg/scheduler/api/resource_info/gpu_resource_requirment.go 52.13% (-9.12%) 94 (+14) 49 45 (+14) 👎
github.com/kai-scheduler/KAI-scheduler/pkg/scheduler/api/resource_info/resource_requirment.go 50.00% (-2.94%) 72 (+4) 36 36 (+4) 👎

Please note that the "Total", "Covered", and "Missed" counts above refer to code statements instead of lines of code. The value in brackets refers to the test coverage of that file in the old version of the code.

Changed unit test files

  • github.com/kai-scheduler/KAI-scheduler/pkg/scheduler/api/pod_info/pod_info_test.go

joeltg added 3 commits May 8, 2026 13:27
`getPodResourceRequest` previously treated every initContainer as
sequential (max'd against the regular-container sum), which under-counts
pods that use native sidecars (initContainers with `restartPolicy:
Always`, KEP-753). Native sidecars run concurrently with regular
containers, so their requests must be added to the running sum — this
is how kubelet computes admission via
`k8s.io/component-helpers/resource.PodRequests`.

Symptom in production: pods with a 250m gcs-fuse native sidecar were
under-counted by 250m each. Once the cumulative undercount on a node
exceeded the available headroom, KAI bound a pod that kubelet rejected
with `OutOfCpu`. The rejected pod transitioned to `phase=Failed`,
`IsActiveUsedStatus` excluded it from `nodeInfo.Requested` on the next
cycle, and KAI re-bound another pod into the same phantom slot — a
livelock that produced 200+ Failed pods on a single node.

Signed-off-by: joeltg <joel@reflection.ai>
Two correctness gaps in the prior commit, both flagged in PR review:

1. `result.Add(&sidecarReq.BaseResource)` invoked the embedded
   `BaseResource.Add` via Go method promotion, which only sums
   CPU/memory/scalar resources. Native sidecars that request
   `nvidia.com/gpu`, `amd.com/gpu`, or MIG resources had their GPU
   contribution silently dropped — strictly worse than pre-fix, where
   `SetMaxResource` on the full `*ResourceRequirements` at least max'd
   the GPU half.

   Fix: introduce `GpuResourceRequirement.Add` and a
   `ResourceRequirements.Add` wrapper that sums both halves; use the
   wrapper in `getPodResourceRequest`.

2. The regular-init loop did `max(result, init.Requests)`, but a regular
   initContainer's peak demand is `init.Requests + sum(sidecars
   declared before it)` — those sidecars are already running
   concurrently per KEP-753. Diverged from upstream
   `AggregateContainerRequests` and under-counted pods whose regular
   init dominates main+sidecars (e.g., 8-CPU model-download init beside
   a 250m gcsfuse sidecar and a 4-CPU main).

   Fix: single pass over InitContainers tracking a `sidecarPrefix`
   accumulator, with regular inits max'd as `init.Requests +
   sidecarPrefix` into a separate `initPhasePeak`. Final result is
   max(steady-state, init-phase peak).

Adds two test cases covering the GPU-on-sidecar path and the
regular-init-after-sidecar prefix path.

Signed-off-by: joeltg <joel@reflection.ai>
- Extract init-container resource math into `initContainerEffects` helper
  with a short docstring linking to KEP-753.
- Drop wrapper-justification framing on `ResourceRequirements.Add` doc.
- Add CHANGELOG entry under Unreleased / Fixed.

Signed-off-by: joeltg <joel@reflection.ai>
@joeltg joeltg force-pushed the fix/native-sidecar-accounting branch from 1e41f7d to 312e1b5 Compare May 8, 2026 13:27
@joeltg
Copy link
Copy Markdown
Author

joeltg commented May 8, 2026

Thanks for the quick review!

@github-actions
Copy link
Copy Markdown

Merging this branch changes the coverage (1 decrease, 1 increase)

Impacted Packages Coverage Δ 🤖
github.com/kai-scheduler/KAI-scheduler/pkg/scheduler/api/pod_info 69.14% (+1.46%) 👍
github.com/kai-scheduler/KAI-scheduler/pkg/scheduler/api/resource_info 48.55% (-1.75%) 👎

Coverage by file

Changed files (no unit tests)

Changed File Coverage Δ Total Covered Missed 🤖
github.com/kai-scheduler/KAI-scheduler/pkg/scheduler/api/pod_info/pod_info.go 66.29% (+2.26%) 175 (+11) 116 (+11) 59 👍
github.com/kai-scheduler/KAI-scheduler/pkg/scheduler/api/resource_info/gpu_resource_requirment.go 52.13% (-9.12%) 94 (+14) 49 45 (+14) 👎
github.com/kai-scheduler/KAI-scheduler/pkg/scheduler/api/resource_info/resource_requirment.go 50.00% (-2.94%) 72 (+4) 36 36 (+4) 👎

Please note that the "Total", "Covered", and "Missed" counts above refer to code statements instead of lines of code. The value in brackets refers to the test coverage of that file in the old version of the code.

Changed unit test files

  • github.com/kai-scheduler/KAI-scheduler/pkg/scheduler/api/pod_info/pod_info_test.go

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants