[Idea] IOChain — a request/response filter pipeline for the inference layer #20545
mukesh-hai
started this conversation in
Ideas
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
SGLang currently has no extensible way to inspect or act on requests and responses at the inference layer. The existing Starlette/FastAPI middlewares only see raw HTTP — they can't access tokenised inputs, generated completions, or usage stats. There are several open requests for this (#13825, #6621) but no solution yet.
Idea: an IOChain filter pipeline
Inspired by the IOChain pattern in network stacks — every request passes
through an ordered pipeline of filters on ingress (before inference) and
egress (after the response is built).
Each filter is a simple class with two async hooks:
Beta Was this translation helpful? Give feedback.
All reactions