feat: integrate Bet-Optimal Drafting (BOD) for dynamic block-size optimization by 0xClandestine · Pull Request #27 · bstnxbt/dflash-mlx

0xClandestine · 2026-05-04T20:00:49Z

Summary

Integrates Bet-Optimal Drafting (BOD) — a unified bet-size optimizer for chain (vanilla DFlash) and tree (DDTree) speculative decoding — throughout the codebase. The draft model is the gambler; the target model is the house. BOD finds the optimal bet size (γ for chain / B for tree) to maximize throughput using a unified mathematical framework.

Core Algorithm

Both modes reduce to the same optimization problem:

T(x) = (E[tokens | x] + 1) / (c_fixed + c_scale · x)

Where x is the bet size, E[tokens] is a concave increasing function, and the denominator is linear. The optimal x maximizes this ratio.

Chain mode (γ optimization) has three tiers:

Verify-dominated — max γ immediately (zero math)
ρ = 0 — closed-form Lambert W (one log + one Lambert W call)
ρ > 0 — fused Metal kernel sweep (GPU dispatch)

Tree mode (B optimization) has three tiers:

Draft-dominated — max B immediately
Enough observations — Lambert W on log-acceptance model
Cold start — Metal kernel sweep

Integration Points

File	Change
`dflash_mlx/bet_optimal_drafting.py`	New file — `BODConfig`, `BODController`, `BODObservation`, `bod_optimal_bet()` convenience API, Metal kernels, analytical solvers
`runtime_profiles.py`	6 new `bod_*` fields on `RuntimeProfile` / `EffectiveRuntimeConfig`
`runtime_context.py`	`runtime_config_from_profile()` and `build_offline_runtime_context()` thread BOD params
`runtime.py`	`stream_dflash_generate()` auto-creates `BODController` when enabled
`engine/spec_epoch.py`	Accepts `bod_controller`; queries per-cycle for block size; records observations after each cycle
`server/config.py`	6 CLI flags: `--bod-enabled`, `--bod-mode`, `--bod-min-bet`, `--bod-max-bet`, `--bod-default-scale-cost`, `--bod-default-fixed-cost`
`generate.py`	Same 6 CLI flags + kwargs on `run_generate()`
`__init__.py`	Exports `BODConfig`, `BODController`, `BODObservation`
`doctor.py`	Registers BOD fields in config/CLI registries

Usage

# Serve with BOD enabled (chain mode — default)
dflash --model Qwen3.5-27B --bod-enabled

# Tree mode with custom bet range
dflash --model Qwen3.5-27B --bod-enabled --bod-mode tree --bod-min-bet 16 --bod-max-bet 256

# Generate with custom cost estimates
dflash generate --model Qwen3.5-27B --prompt "Hello" --bod-enabled --bod-default-scale-cost 5.0

# Default (BOD disabled) — zero behavioral change
dflash --model Qwen3.5-27B

Testing

All 343 existing tests pass (3 pre-existing skipped)
Added BOD default kwargs to test_generate_cli.py expected dict
Verified: import chain, data model completeness, CLI parse/normalize, BOD controller lifecycle, dynamic bet adaptation through 20 simulation cycles

Add BOD as an opt-in dynamic block-size optimizer for speculative decoding. The draft model is the gambler; the target model is the house. BOD finds the optimal bet size (γ for chain / B for tree) to maximize throughput using a unified mathematical framework. Integration points: - EffectiveRuntimeConfig / RuntimeProfile: 6 new bod_* fields (disabled by default, zero behavioral change). - CLI (serve + generate): --bod-enabled, --bod-mode, --bod-min-bet, --bod-max-bet, --bod-default-scale-cost, --bod-default-fixed-cost. - spec_epoch.py: accepts optional bod_controller; queries it per-cycle for dynamic block sizing and records observations (bet, accepted, cycle_time, draft_time, verify_time). - runtime.py: auto-creates BODController when bod_enabled=True. - __init__.py: exports BODConfig, BODController, BODObservation. - doctor.py: registers BOD fields in config/CLI registries. 343 tests pass, 3 skipped (pre-existing).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: integrate Bet-Optimal Drafting (BOD) for dynamic block-size optimization#27

feat: integrate Bet-Optimal Drafting (BOD) for dynamic block-size optimization#27
0xClandestine wants to merge 1 commit into
bstnxbt:mainfrom
0xClandestine:feat/bet-optimal-drafting

0xClandestine commented May 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

0xClandestine commented May 4, 2026

Summary

Core Algorithm

Integration Points

Usage

Testing

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant