Closed-form RF calculation for read-only-kv by homatthew · Pull Request #216 · Netflix-Skunkworks/service-capacity-modeling

homatthew · 2026-01-27T01:11:18Z

What am I trying to do?

Improve the partition-aware capacity planning algorithm in read_only_kv.py:

Replace iterative while loop with closed-form RF calculation
Extract algorithm to standalone partition_capacity.py module for clarity
Add comprehensive test suite showing algorithm equivalence and differences

Why did I do it this way?

Standalone `partition_capacity.py` module (new!)

The algorithm logic was buried in private methods inside read_only_kv.py. This made it hard to:

Understand what the algorithm does
See where different variants are equivalent vs. different
Test the algorithm in isolation

Solution: Extract to partition_capacity.py with three clear algorithm variants:

from partition_capacity import CapacityProblem, search_algorithm

result = search_algorithm(CapacityProblem(
    n_partitions=100,
    partition_size_gib=50.0,
    disk_per_node_gib=1000.0,
    cpu_needed=50,
    cpu_per_node=2,
    min_rf=2,
    max_nodes=1000,
))
# Returns: CapacityResult(node_count=25, rf=5, partitions_per_node=20, base_nodes=5)

Three algorithm variants with clear relationships

Algorithm	Description	Complexity
`original_algorithm`	While-loop from legacy code (greedy, max PPn only)	O(rf)
`closed_form_algorithm`	Mathematical equivalent to original	O(1)
`search_algorithm`	Searches PPn from max to 1, finds solutions original misses	O(ppn)

Key findings documented in tests:

CLOSED_FORM == ORIGINAL (always, by mathematical proof)
SEARCH ⊇ ORIGINAL (search finds everything original finds)
SEARCH ≠ ORIGINAL when max_nodes is tight (search finds solutions original misses)

Closed-form RF calculation (commit 1)

The original while True loop incremented RF until CPU was satisfied. This obscures the underlying math.

The insight: For a given PPn, we can directly compute the minimum RF:

# Instead of incrementing rf in a loop...
rf = max(min_rf, math.ceil(cpu_needed / (base * cpu_per_node)))

Extract `_PartitionSearchInputs` dataclass (commit 2)

Clear boundary between "model domain" (Instance, CapacityRequirement) and "algorithm domain" (pure numbers).

Are there any tests?

Yes - 79 tests total including:

Equivalence tests: Verify closed_form == original for 500 random problems (Hypothesis)
Subsumption tests: Verify search finds everything original finds, plus more
Difference tests: Concrete examples where search succeeds but original fails
Constraint tests: All algorithms satisfy CPU, disk, RF, and size constraints

How would I use the new code?

The public interface (nflx_read_only_kv_capacity_model.capacity_plan()) is unchanged.

For direct algorithm testing:

from service_capacity_modeling.models.org.netflix.partition_capacity import (
    CapacityProblem,
    original_algorithm,
    closed_form_algorithm,
    search_algorithm,
)

problem = CapacityProblem(
    n_partitions=21,
    partition_size_gib=100.0,
    disk_per_node_gib=1000.0,  # max_ppn = 10
    cpu_needed=10,
    cpu_per_node=1,
    min_rf=2,
    max_nodes=10,
)

# Original/closed_form: returns None (greedy exceeds max_nodes)
# Search: returns CapacityResult(node_count=10, rf=2, ppn=5, base=5)

Architecture

graph TB
    subgraph partition_capacity.py
        A[CapacityProblem] --> B[original_algorithm]
        A --> C[closed_form_algorithm]
        A --> D[search_algorithm]
        B --> E[CapacityResult]
        C --> E
        D --> E
    end

    subgraph read_only_kv.py
        F[Model Objects] --> G[_extract_planning_inputs]
        G --> H[_PartitionSearchInputs]
        H --> I[CapacityProblem]
        I --> D
        E --> J[RegionClusterCapacity]
    end

🤖 Generated with Claude Code

- Same algorithm, same results, cleaner code - O(1) instead of O(rf) iterations - All 29 existing tests pass (including Hypothesis property tests) Co-Authored-By: Claude Opus 4.5 <[email protected]>

- Add _PartitionSearchInputs dataclass to represent pure numeric inputs - Extract _extract_planning_inputs() to handle model→algorithm transformation - Simplify _compute_read_only_kv_regional_cluster() to 3 clear steps: 1. Extract inputs, 2. Run algorithm, 3. Build result - Make _find_valid_cluster_config use keyword-only args (pylint fix) - Improves testability and documents data flow explicitly Co-Authored-By: Claude Opus 4.5 <[email protected]>

…ations - Create partition_capacity.py with three algorithm variants: - original_algorithm: while-loop (greedy, max PPn only) - closed_form_algorithm: O(1) mathematical equivalent - search_algorithm: searches PPn from max to 1, finds solutions original misses - Add test_partition_capacity.py demonstrating: - CLOSED_FORM == ORIGINAL (always, by mathematical proof) - SEARCH ⊇ ORIGINAL (search finds everything original finds) - SEARCH ≠ ORIGINAL when max_nodes is tight (search finds solutions original misses) - Update read_only_kv.py to use search_algorithm from new module - Update test_read_only_kv.py to import from partition_capacity module The extracted module makes it easy to understand the algorithm without following private methods, and the parametrized tests clearly show where the algorithms are equivalent vs. different. Co-Authored-By: Claude Opus 4.5 <[email protected]>

homatthew changed the base branch from main to chengw/oodm January 27, 2026 01:11

homatthew force-pushed the mho/chengw-oodm-closed-form branch from 2196dcc to d355c43 Compare January 27, 2026 01:24

Replace while loop with closed-form RF calculation

29797a3

- Same algorithm, same results, cleaner code - O(1) instead of O(rf) iterations - All 29 existing tests pass (including Hypothesis property tests) Co-Authored-By: Claude Opus 4.5 <[email protected]>

homatthew force-pushed the mho/chengw-oodm-closed-form branch from d355c43 to 29797a3 Compare January 27, 2026 01:36

homatthew and others added 2 commits January 27, 2026 17:49

homatthew closed this Jan 29, 2026

homatthew deleted the mho/chengw-oodm-closed-form branch February 18, 2026 17:38

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Closed-form RF calculation for read-only-kv#216

Closed-form RF calculation for read-only-kv#216
homatthew wants to merge 3 commits intoNetflix-Skunkworks:chengw/oodmfrom
homatthew:mho/chengw-oodm-closed-form

homatthew commented Jan 27, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Comments

Conversation

homatthew commented Jan 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What am I trying to do?

Why did I do it this way?

Standalone partition_capacity.py module (new!)

Three algorithm variants with clear relationships

Closed-form RF calculation (commit 1)

Extract _PartitionSearchInputs dataclass (commit 2)

Are there any tests?

How would I use the new code?

Architecture

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Comments

homatthew commented Jan 27, 2026 •

edited

Loading

Standalone `partition_capacity.py` module (new!)

Extract `_PartitionSearchInputs` dataclass (commit 2)