Skip to content

Fix _aggregate_resources to read from clusters, not requirements#223

Merged
homatthew merged 1 commit intoNetflix-Skunkworks:mainfrom
homatthew:mho/compare-req-bug-fix
Feb 9, 2026
Merged

Fix _aggregate_resources to read from clusters, not requirements#223
homatthew merged 1 commit intoNetflix-Skunkworks:mainfrom
homatthew:mho/compare-req-bug-fix

Conversation

@homatthew
Copy link
Contributor

@homatthew homatthew commented Feb 7, 2026

What am I trying to do?

_aggregate_resources was reading memory, disk, and network from CapacityRequirement (the demand) instead of candidate_clusters (the actual provisioned instances). This produced incorrect comparison results when requirement values differed from cluster instance values — which is the normal case for baseline plans extracted from real deployments.

Why did I do it this way?

Read all resources from clusters, not requirements

The bug was that CPU was already correctly sourced from candidate_clusters (to enable IPC/GHz normalization), but memory, disk, and network were still read from requirements. Now all four resource types are aggregated in one loop over clusters:

for cluster in chain(plan.candidate_clusters.zonal, plan.candidate_clusters.regional):
    totals[ResourceType.cpu] += to_reference_cores(...)
    totals[ResourceType.mem_gib] += cluster.instance.ram_gib * cluster.count
    totals[ResourceType.network_mbps] += cluster.instance.net_mbps * cluster.count
    totals[ResourceType.disk_gib] += get_disk_size_gib(cluster_drive, cluster.instance) * cluster.count

Disk uses get_disk_size_gib (from common.py) which handles both local instance drives and attached EBS drives — matching the same pattern used in capacity_planner.py.

Decoy requirements in test helper to catch regressions

Instead of a separate test class, _create_plan now sets all requirement values to 9999 (decoy). Existing tests assert baseline_value / comparison_value on the PlanComparisonResult. If someone accidentally reverts to reading from requirements, every value assertion fails with 9999 instead of the expected value.

Parametrize by count to test aggregation scaling

Key tests are parametrized with count=[1, 3] to verify that instance.ram_gib * count (etc.) produces the expected aggregated values, not just the single-instance case.

Are there any tests?

Yes — 37 tests (up from 35 before the change). Requirements use decoy values (9999) so any regression to reading from requirements produces unmissable failures in the existing baseline_value / comparison_value assertions.

How would I use the new code?

No API changes — compare_plans() has the same signature and return type. The only difference is that the returned ResourceComparison objects now contain correct baseline_value / comparison_value sourced from clusters.

result = compare_plans(baseline_plan, comparison_plan)
result.memory.baseline_value   # now: sum(instance.ram_gib * count), was: sum(req.mem_gib.mid)
result.disk.baseline_value     # now: sum(get_disk_size_gib(...) * count), was: sum(req.disk_gib.mid)
result.network.baseline_value  # now: sum(instance.net_mbps * count), was: sum(req.network_mbps.mid)

🤖 Generated with Claude Code

@homatthew homatthew force-pushed the mho/compare-req-bug-fix branch 2 times, most recently from f852b62 to e462b66 Compare February 7, 2026 17:09
_aggregate_resources was reading mem/disk/network from CapacityRequirement
(the demand) instead of candidate_clusters (the actual provisioned
instances). This meant comparisons were based on what was requested, not
what was actually deployed — producing incorrect results when requirement
values differed from cluster instance values.

Changes:
- Use instance.ram_gib, instance.net_mbps, and get_disk_size_gib() instead
  of req.mem_gib.mid, req.disk_gib.mid, req.network_mbps.mid
- Simplify compare_plans with dict comprehension over resource types
- Fix test helper _create_plan to set resource values on the instance
- Add decoy-requirement tests that assert baseline_value through the
  public API, catching any regression to reading from requirements
- Remove redundant tests and parametrize where appropriate

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@homatthew homatthew force-pushed the mho/compare-req-bug-fix branch from e462b66 to ec748ea Compare February 7, 2026 17:12
@homatthew homatthew marked this pull request as ready for review February 7, 2026 17:13
@homatthew homatthew merged commit 3691c1d into Netflix-Skunkworks:main Feb 9, 2026
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant