Skip to content

Closed-form RF calculation for read-only-kv#216

Closed
homatthew wants to merge 3 commits intoNetflix-Skunkworks:chengw/oodmfrom
homatthew:mho/chengw-oodm-closed-form
Closed

Closed-form RF calculation for read-only-kv#216
homatthew wants to merge 3 commits intoNetflix-Skunkworks:chengw/oodmfrom
homatthew:mho/chengw-oodm-closed-form

Conversation

@homatthew
Copy link
Contributor

@homatthew homatthew commented Jan 27, 2026

What am I trying to do?

Improve the partition-aware capacity planning algorithm in read_only_kv.py:

  1. Replace iterative while loop with closed-form RF calculation
  2. Extract algorithm to standalone partition_capacity.py module for clarity
  3. Add comprehensive test suite showing algorithm equivalence and differences

Why did I do it this way?

Standalone partition_capacity.py module (new!)

The algorithm logic was buried in private methods inside read_only_kv.py. This made it hard to:

  • Understand what the algorithm does
  • See where different variants are equivalent vs. different
  • Test the algorithm in isolation

Solution: Extract to partition_capacity.py with three clear algorithm variants:

from partition_capacity import CapacityProblem, search_algorithm

result = search_algorithm(CapacityProblem(
    n_partitions=100,
    partition_size_gib=50.0,
    disk_per_node_gib=1000.0,
    cpu_needed=50,
    cpu_per_node=2,
    min_rf=2,
    max_nodes=1000,
))
# Returns: CapacityResult(node_count=25, rf=5, partitions_per_node=20, base_nodes=5)

Three algorithm variants with clear relationships

Algorithm Description Complexity
original_algorithm While-loop from legacy code (greedy, max PPn only) O(rf)
closed_form_algorithm Mathematical equivalent to original O(1)
search_algorithm Searches PPn from max to 1, finds solutions original misses O(ppn)

Key findings documented in tests:

  • CLOSED_FORM == ORIGINAL (always, by mathematical proof)
  • SEARCH ⊇ ORIGINAL (search finds everything original finds)
  • SEARCH ≠ ORIGINAL when max_nodes is tight (search finds solutions original misses)

Closed-form RF calculation (commit 1)

The original while True loop incremented RF until CPU was satisfied. This obscures the underlying math.

The insight: For a given PPn, we can directly compute the minimum RF:

# Instead of incrementing rf in a loop...
rf = max(min_rf, math.ceil(cpu_needed / (base * cpu_per_node)))

Extract _PartitionSearchInputs dataclass (commit 2)

Clear boundary between "model domain" (Instance, CapacityRequirement) and "algorithm domain" (pure numbers).

Are there any tests?

Yes - 79 tests total including:

  • Equivalence tests: Verify closed_form == original for 500 random problems (Hypothesis)
  • Subsumption tests: Verify search finds everything original finds, plus more
  • Difference tests: Concrete examples where search succeeds but original fails
  • Constraint tests: All algorithms satisfy CPU, disk, RF, and size constraints

How would I use the new code?

The public interface (nflx_read_only_kv_capacity_model.capacity_plan()) is unchanged.

For direct algorithm testing:

from service_capacity_modeling.models.org.netflix.partition_capacity import (
    CapacityProblem,
    original_algorithm,
    closed_form_algorithm,
    search_algorithm,
)

problem = CapacityProblem(
    n_partitions=21,
    partition_size_gib=100.0,
    disk_per_node_gib=1000.0,  # max_ppn = 10
    cpu_needed=10,
    cpu_per_node=1,
    min_rf=2,
    max_nodes=10,
)

# Original/closed_form: returns None (greedy exceeds max_nodes)
# Search: returns CapacityResult(node_count=10, rf=2, ppn=5, base=5)

Architecture

graph TB
    subgraph partition_capacity.py
        A[CapacityProblem] --> B[original_algorithm]
        A --> C[closed_form_algorithm]
        A --> D[search_algorithm]
        B --> E[CapacityResult]
        C --> E
        D --> E
    end

    subgraph read_only_kv.py
        F[Model Objects] --> G[_extract_planning_inputs]
        G --> H[_PartitionSearchInputs]
        H --> I[CapacityProblem]
        I --> D
        E --> J[RegionClusterCapacity]
    end
Loading

🤖 Generated with Claude Code

@homatthew homatthew changed the base branch from main to chengw/oodm January 27, 2026 01:11
@homatthew homatthew force-pushed the mho/chengw-oodm-closed-form branch from 2196dcc to d355c43 Compare January 27, 2026 01:24
- Same algorithm, same results, cleaner code
- O(1) instead of O(rf) iterations
- All 29 existing tests pass (including Hypothesis property tests)

Co-Authored-By: Claude Opus 4.5 <[email protected]>
@homatthew homatthew force-pushed the mho/chengw-oodm-closed-form branch from d355c43 to 29797a3 Compare January 27, 2026 01:36
homatthew and others added 2 commits January 27, 2026 17:49
- Add _PartitionSearchInputs dataclass to represent pure numeric inputs
- Extract _extract_planning_inputs() to handle model→algorithm transformation
- Simplify _compute_read_only_kv_regional_cluster() to 3 clear steps:
  1. Extract inputs, 2. Run algorithm, 3. Build result
- Make _find_valid_cluster_config use keyword-only args (pylint fix)
- Improves testability and documents data flow explicitly

Co-Authored-By: Claude Opus 4.5 <[email protected]>
…ations

- Create partition_capacity.py with three algorithm variants:
  - original_algorithm: while-loop (greedy, max PPn only)
  - closed_form_algorithm: O(1) mathematical equivalent
  - search_algorithm: searches PPn from max to 1, finds solutions original misses

- Add test_partition_capacity.py demonstrating:
  - CLOSED_FORM == ORIGINAL (always, by mathematical proof)
  - SEARCH ⊇ ORIGINAL (search finds everything original finds)
  - SEARCH ≠ ORIGINAL when max_nodes is tight (search finds solutions original misses)

- Update read_only_kv.py to use search_algorithm from new module
- Update test_read_only_kv.py to import from partition_capacity module

The extracted module makes it easy to understand the algorithm without
following private methods, and the parametrized tests clearly show
where the algorithms are equivalent vs. different.

Co-Authored-By: Claude Opus 4.5 <[email protected]>
@homatthew homatthew closed this Jan 29, 2026
@homatthew homatthew deleted the mho/chengw-oodm-closed-form branch February 18, 2026 17:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant

Comments