api: initial skeleton of LLMRoute and LLMBackend#20
Conversation
Signed-off-by: Takeshi Yoneda <t.y.mathetake@gmail.com>
|
cc @arkodg |
|
cc @sanjeewa-malalgoda (i would appreciate it if you can ping other folks from your org) |
Signed-off-by: Takeshi Yoneda <t.y.mathetake@gmail.com>
Signed-off-by: Takeshi Yoneda <t.y.mathetake@gmail.com>
|
@mathetake We reviewed this and it looks good for us. |
| // APISchema specifies the API schema of the input that the target Gateway(s) will receive. | ||
| // Based on this schema, the ai-gateway will perform the necessary transformation to the | ||
| // output schema specified in the selected LLMBackend during the routing process. | ||
| APISchema LLMAPISchema `json:"inputSchema"` |
There was a problem hiding this comment.
| APISchema LLMAPISchema `json:"inputSchema"` | |
| APISchema LLMAPISchema `json:"apiSchema"` |
There was a problem hiding this comment.
Are we going to allow different APISchema for different LLM route if we define at the route level?
There was a problem hiding this comment.
If my understanding is correct, this indicates which vendor and model the route belongs to. Therefore, a single LLMRoute can correspond to only one vendor and model
There was a problem hiding this comment.
Are we going to allow different APISchema for different LLM route if we define at the route level?
I think the answer would be no at the moment since i am not sure why a user wants to access API like that - clients need to use different set of API clients and switching it from their end depending on a path? idk.
Signed-off-by: Takeshi Yoneda <t.y.mathetake@gmail.com>
Signed-off-by: Takeshi Yoneda <t.y.mathetake@gmail.com>
|
on second thought i kept the field name |
| Based on this schema, the ai-gateway will perform the necessary transformation to the | ||
| output schema specified in the selected LLMBackend during the routing process. | ||
|
|
||
| Currently, the only supported schema is OpenAI as the input schema. |
There was a problem hiding this comment.
would it be less problematic to remove this text constraint and introduce bedrock later once supported?
There was a problem hiding this comment.
the constraint will be enforced in the cel validation rule that happens at k8s API server - will do the follow up soon
There was a problem hiding this comment.
good comment anyways adrian as usual
Co-authored-by: Adrian Cole <64215+codefromthecrypt@users.noreply.github.com> Signed-off-by: Takeshi Yoneda <t.y.mathetake@gmail.com>
This commit is a follow up on #20. Basically, this makes LLMRoute a pure "addition" to the existing standardized HTTPRoute. This makes it possible to configure something like ``` kind: LLMRoute metadata: name: llm-route spec: inputSchema: OpenAI httpRouteRef: name: my-llm-route --- kind: HTTPRoute metadata: name: my-llm-route spec: matches: - headers: key: x-envoy-ai-gateway-llm-model value: llama3-70b backendRefs: - kserve: weight: 20 - aws-bedrock: weight: 80 ``` where LLMRoute is purely referencing HTTPRoute and users can configure whatever routing condition in a standardized way via HTTPRoute while leveraging the LLM specific information, in this case x-envoy-ai-gateway-llm-model header. In the implementation, though it's not merged yet, we have to do the routing calculation in the extproc by actually analyzing the referenced HTTPRoute, and emulate the behavior in order to do the transformation. The reason is that the routing decision is made at the very end of filter chain in general, and by the time we invoke extproc, we don't have that info. Furthermore, `x-envoy-ai-gateway-llm-model` is not available before extproc. As a bonus of this, we no longer need TargetRef at LLMRoute level since that's within the HTTPRoute resources. This will really simplify the PoC implementation. --------- Signed-off-by: Takeshi Yoneda <t.y.mathetake@gmail.com>
This commit is a follow up on #20. Basically, this makes LLMRoute a pure "addition" to the existing standardized HTTPRoute. This makes it possible to configure something like ``` kind: LLMRoute metadata: name: llm-route spec: inputSchema: OpenAI httpRouteRef: name: my-llm-route --- kind: HTTPRoute metadata: name: my-llm-route spec: matches: - headers: key: x-envoy-ai-gateway-llm-model value: llama3-70b backendRefs: - kserve: weight: 20 - aws-bedrock: weight: 80 ``` where LLMRoute is purely referencing HTTPRoute and users can configure whatever routing condition in a standardized way via HTTPRoute while leveraging the LLM specific information, in this case x-envoy-ai-gateway-llm-model header. In the implementation, though it's not merged yet, we have to do the routing calculation in the extproc by actually analyzing the referenced HTTPRoute, and emulate the behavior in order to do the transformation. The reason is that the routing decision is made at the very end of filter chain in general, and by the time we invoke extproc, we don't have that info. Furthermore, `x-envoy-ai-gateway-llm-model` is not available before extproc. As a bonus of this, we no longer need TargetRef at LLMRoute level since that's within the HTTPRoute resources. This will really simplify the PoC implementation. --------- Signed-off-by: Takeshi Yoneda <t.y.mathetake@gmail.com>
This adds the skeleton API of LLMRoute and LLMBackend.
These two resources would be the foundation for the future
iterations, such as authn/z, token-based rate limiting,
schema transformation and more advanced thingy like #10
Note: we might / will break APIs if necessity comes up until
the initial release.
part of #13
cc @yuzisun