Fix _aggregate_resources to read from clusters, not requirements by homatthew · Pull Request #223 · Netflix-Skunkworks/service-capacity-modeling

homatthew · 2026-02-07T05:44:46Z

What am I trying to do?

_aggregate_resources was reading memory, disk, and network from CapacityRequirement (the demand) instead of candidate_clusters (the actual provisioned instances). This produced incorrect comparison results when requirement values differed from cluster instance values — which is the normal case for baseline plans extracted from real deployments.

Why did I do it this way?

Read all resources from clusters, not requirements

The bug was that CPU was already correctly sourced from candidate_clusters (to enable IPC/GHz normalization), but memory, disk, and network were still read from requirements. Now all four resource types are aggregated in one loop over clusters:

for cluster in chain(plan.candidate_clusters.zonal, plan.candidate_clusters.regional):
    totals[ResourceType.cpu] += to_reference_cores(...)
    totals[ResourceType.mem_gib] += cluster.instance.ram_gib * cluster.count
    totals[ResourceType.network_mbps] += cluster.instance.net_mbps * cluster.count
    totals[ResourceType.disk_gib] += get_disk_size_gib(cluster_drive, cluster.instance) * cluster.count

Disk uses get_disk_size_gib (from common.py) which handles both local instance drives and attached EBS drives — matching the same pattern used in capacity_planner.py.

Decoy requirements in test helper to catch regressions

Instead of a separate test class, _create_plan now sets all requirement values to 9999 (decoy). Existing tests assert baseline_value / comparison_value on the PlanComparisonResult. If someone accidentally reverts to reading from requirements, every value assertion fails with 9999 instead of the expected value.

Parametrize by count to test aggregation scaling

Key tests are parametrized with count=[1, 3] to verify that instance.ram_gib * count (etc.) produces the expected aggregated values, not just the single-instance case.

Are there any tests?

Yes — 37 tests (up from 35 before the change). Requirements use decoy values (9999) so any regression to reading from requirements produces unmissable failures in the existing baseline_value / comparison_value assertions.

How would I use the new code?

No API changes — compare_plans() has the same signature and return type. The only difference is that the returned ResourceComparison objects now contain correct baseline_value / comparison_value sourced from clusters.

result = compare_plans(baseline_plan, comparison_plan)
result.memory.baseline_value   # now: sum(instance.ram_gib * count), was: sum(req.mem_gib.mid)
result.disk.baseline_value     # now: sum(get_disk_size_gib(...) * count), was: sum(req.disk_gib.mid)
result.network.baseline_value  # now: sum(instance.net_mbps * count), was: sum(req.network_mbps.mid)

🤖 Generated with Claude Code

_aggregate_resources was reading mem/disk/network from CapacityRequirement (the demand) instead of candidate_clusters (the actual provisioned instances). This meant comparisons were based on what was requested, not what was actually deployed — producing incorrect results when requirement values differed from cluster instance values. Changes: - Use instance.ram_gib, instance.net_mbps, and get_disk_size_gib() instead of req.mem_gib.mid, req.disk_gib.mid, req.network_mbps.mid - Simplify compare_plans with dict comprehension over resource types - Fix test helper _create_plan to set resource values on the instance - Add decoy-requirement tests that assert baseline_value through the public API, catching any regression to reading from requirements - Remove redundant tests and parametrize where appropriate Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

homatthew force-pushed the mho/compare-req-bug-fix branch 2 times, most recently from f852b62 to e462b66 Compare February 7, 2026 17:09

homatthew force-pushed the mho/compare-req-bug-fix branch from e462b66 to ec748ea Compare February 7, 2026 17:12

homatthew marked this pull request as ready for review February 7, 2026 17:13

homatthew merged commit 3691c1d into Netflix-Skunkworks:main Feb 9, 2026
4 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix _aggregate_resources to read from clusters, not requirements#223

Fix _aggregate_resources to read from clusters, not requirements#223
homatthew merged 1 commit intoNetflix-Skunkworks:mainfrom
homatthew:mho/compare-req-bug-fix

homatthew commented Feb 7, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

homatthew commented Feb 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What am I trying to do?

Why did I do it this way?

Read all resources from clusters, not requirements

Decoy requirements in test helper to catch regressions

Parametrize by count to test aggregation scaling

Are there any tests?

How would I use the new code?

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

homatthew commented Feb 7, 2026 •

edited

Loading