Wire OutputRouter into streaming chat#259
Conversation
|
SOP §0–§2 review. Requesting changes — 1 P0 + 4 P1s found in codex round 1. §0 Necessity: ✅ advances the OutputRouter migration tracked in #63. Goal is correct. §2 Codex round 1 — P0:
Since Fix: explicit allowlist gate. E.g., in ROUTER_ALLOWLIST = {"gemma4", "harmony"} # extend per #64, #65 as those land
def _create_output_router(self):
try:
router = OutputRouter.from_tokenizer(self.tokenizer)
if router is None:
return None
if router.token_map.format_tag not in ROUTER_ALLOWLIST:
return None
return router
except Exception as e:
logger.debug("OutputRouter unavailable: %s", e)
return NoneOr invert it — pass an explicit §2 Codex round 1 — P1s (also need addressing):
Suggested next iteration:
Holding merge until P0 is fixed. P1s strongly preferred but I can take them as follow-up issues if you'd rather scope them separately. Thanks for moving the migration forward — this is the right direction, just needs the gate so it doesn't blast-radius onto Qwen3/DeepSeek users. |
|
Thanks for picking this up @masonjames. Reviewed against current
The shape of the change is right (per-request router factory, channel-tagged P0 — blocker1. Inherits #197 / #343 — accumulated router state silently dropped on stream end
This is the same data-loss class as #197 and #343. Suggest either:
Either way please add a regression test for "stream ends in TOOL_CALL state" — that's the exact failure mode users will hit on P1 — should fix2. Streaming granularity changes from chunk-level to token-levelEvery token that triggers a If this is intentional (it gives the postprocessor cleaner per-token boundaries), document it in the docstring of 3. Test coverage gapsThe new test file covers the supported-tokenizer happy path (reasoning→content transition) and the unsupported-tokenizer passthrough. Missing:
4. Magic token IDs
P2 — nits
VerdictI'd love to see this land — channel routing in the engine layer is the right architecture and the refactor is small. Asking for the finalize/drain fix (#1) plus the tool_call + mid-stream-end tests (#3) before merge. Items 2 and 4 are nice-to-haves; everything in P2 is reviewer preference. Happy to pair on the finalize hook if useful — it's small enough to add as a follow-up commit on this PR. Reviewed by @raullenchai (Rapid-MLX maintainer) with adversarial second-pass via DeepSeek. |
|
Thanks for the detailed review. I pushed a follow-up commit that keeps this PR scoped to the validated engine-router formats and addresses the stream-safety issues raised here. What changed:
Validation:
This should address the P0, the requested P1 coverage, and the P2 nits without needing a secondary PR. Excited about Rapid-MLX and happy to help! |
Summary
OutputRouter.from_tokenizer()inBatchedEngine.stream_chat()GenerationOutput.channelchunksoutput.channel is not NoneTests
uv run pytest tests/test_batched_engine_output_router.py tests/test_postprocessor.py tests/test_output_router.pyuv run ruff check vllm_mlx/engine/batched.py vllm_mlx/service/postprocessor.py tests/test_batched_engine_output_router.py tests/test_postprocessor.pyuv run ruff format --check vllm_mlx/engine/batched.py vllm_mlx/service/postprocessor.py tests/test_batched_engine_output_router.py tests/test_postprocessor.pyAddresses the engine-wiring follow-up from #63.