Fix _aggregate_resources to read from clusters, not requirements#223
Merged
homatthew merged 1 commit intoNetflix-Skunkworks:mainfrom Feb 9, 2026
Merged
Conversation
f852b62 to
e462b66
Compare
_aggregate_resources was reading mem/disk/network from CapacityRequirement (the demand) instead of candidate_clusters (the actual provisioned instances). This meant comparisons were based on what was requested, not what was actually deployed — producing incorrect results when requirement values differed from cluster instance values. Changes: - Use instance.ram_gib, instance.net_mbps, and get_disk_size_gib() instead of req.mem_gib.mid, req.disk_gib.mid, req.network_mbps.mid - Simplify compare_plans with dict comprehension over resource types - Fix test helper _create_plan to set resource values on the instance - Add decoy-requirement tests that assert baseline_value through the public API, catching any regression to reading from requirements - Remove redundant tests and parametrize where appropriate Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
e462b66 to
ec748ea
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What am I trying to do?
_aggregate_resourceswas reading memory, disk, and network fromCapacityRequirement(the demand) instead ofcandidate_clusters(the actual provisioned instances). This produced incorrect comparison results when requirement values differed from cluster instance values — which is the normal case for baseline plans extracted from real deployments.Why did I do it this way?
Read all resources from clusters, not requirements
The bug was that CPU was already correctly sourced from
candidate_clusters(to enable IPC/GHz normalization), but memory, disk, and network were still read fromrequirements. Now all four resource types are aggregated in one loop over clusters:Disk uses
get_disk_size_gib(fromcommon.py) which handles both local instance drives and attached EBS drives — matching the same pattern used incapacity_planner.py.Decoy requirements in test helper to catch regressions
Instead of a separate test class,
_create_plannow sets all requirement values to9999(decoy). Existing tests assertbaseline_value/comparison_valueon thePlanComparisonResult. If someone accidentally reverts to reading from requirements, every value assertion fails with9999instead of the expected value.Parametrize by count to test aggregation scaling
Key tests are parametrized with
count=[1, 3]to verify thatinstance.ram_gib * count(etc.) produces the expected aggregated values, not just the single-instance case.Are there any tests?
Yes — 37 tests (up from 35 before the change). Requirements use decoy values (
9999) so any regression to reading from requirements produces unmissable failures in the existingbaseline_value/comparison_valueassertions.How would I use the new code?
No API changes —
compare_plans()has the same signature and return type. The only difference is that the returnedResourceComparisonobjects now contain correctbaseline_value/comparison_valuesourced from clusters.🤖 Generated with Claude Code