Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

api: initial skeleton of LLMRoute and LLMBackend #20

Merged
merged 10 commits into from
Dec 5, 2024
Merged

Conversation

mathetake
Copy link
Member

@mathetake mathetake commented Dec 3, 2024

This adds the skeleton API of LLMRoute and LLMBackend.
These two resources would be the foundation for the future
iterations, such as authn/z, token-based rate limiting,
schema transformation and more advanced thingy like #10

Note: we might / will break APIs if necessity comes up until
the initial release.

part of #13

cc @yuzisun

Signed-off-by: Takeshi Yoneda <[email protected]>
@mathetake
Copy link
Member Author

cc @arkodg

@mathetake
Copy link
Member Author

cc @sanjeewa-malalgoda (i would appreciate it if you can ping other folks from your org)

Signed-off-by: Takeshi Yoneda <[email protected]>
Signed-off-by: Takeshi Yoneda <[email protected]>
Signed-off-by: Takeshi Yoneda <[email protected]>
@sanjeewa-malalgoda
Copy link

@mathetake We reviewed this and it looks good for us.

Copy link
Member Author

@mathetake mathetake left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed myself🏃

// APISchema specifies the API schema of the input that the target Gateway(s) will receive.
// Based on this schema, the ai-gateway will perform the necessary transformation to the
// output schema specified in the selected LLMBackend during the routing process.
APISchema LLMAPISchema `json:"inputSchema"`
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
APISchema LLMAPISchema `json:"inputSchema"`
APISchema LLMAPISchema `json:"apiSchema"`

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we going to allow different APISchema for different LLM route if we define at the route level?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If my understanding is correct, this indicates which vendor and model the route belongs to. Therefore, a single LLMRoute can correspond to only one vendor and model

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we going to allow different APISchema for different LLM route if we define at the route level?

I think the answer would be no at the moment since i am not sure why a user wants to access API like that - clients need to use different set of API clients and switching it from their end depending on a path? idk.

api/v1alpha1/api.go Show resolved Hide resolved
api/v1alpha1/api.go Outdated Show resolved Hide resolved
api/v1alpha1/api.go Outdated Show resolved Hide resolved
Signed-off-by: Takeshi Yoneda <[email protected]>
@mathetake
Copy link
Member Author

on second thought i kept the field name inputSchema and outputSchema for LLMRoute and LLMBackend respectively, instead of the same apiSchema as I left comment by myself yesterday. I think it's almost ready to go

Signed-off-by: Takeshi Yoneda <[email protected]>
Copy link

@codefromthecrypt codefromthecrypt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

drive-by notes

api/v1alpha1/api.go Outdated Show resolved Hide resolved
Based on this schema, the ai-gateway will perform the necessary transformation to the
output schema specified in the selected LLMBackend during the routing process.

Currently, the only supported schema is OpenAI as the input schema.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would it be less problematic to remove this text constraint and introduce bedrock later once supported?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the constraint will be enforced in the cel validation rule that happens at k8s API server - will do the follow up soon

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good comment anyways adrian as usual

mathetake and others added 2 commits December 5, 2024 07:43
Co-authored-by: Adrian Cole <[email protected]>
Signed-off-by: Takeshi Yoneda <[email protected]>
Signed-off-by: Takeshi Yoneda <[email protected]>
@mathetake mathetake merged commit e95a824 into main Dec 5, 2024
5 checks passed
@mathetake mathetake deleted the apidefinitions-llv branch December 5, 2024 17:34
mathetake added a commit that referenced this pull request Dec 10, 2024
This commit is a follow up on #20. Basically, this makes LLMRoute 
a pure "addition" to the existing standardized HTTPRoute. 
This makes it possible to configure something like 
```
kind: LLMRoute
metadata:
  name: llm-route
spec:
  inputSchema: OpenAI
  httpRouteRef:
    name: my-llm-route
---
kind: HTTPRoute
metadata:
  name: my-llm-route
spec:
  matches:
     - headers:
         key: x-envoy-ai-gateway-llm-model
         value: llama3-70b 
       backendRefs: 
       - kserve:
         weight: 20
       - aws-bedrock:
         weight: 80
```

where LLMRoute is purely referencing HTTPRoute and 
users can configure whatever routing condition in a standardized way
via HTTPRoute while leveraging the LLM specific information, in this
case
x-envoy-ai-gateway-llm-model header.

In the implementation, though it's not merged yet, we have to do the 
routing calculation in the extproc by actually analyzing the referenced 
HTTPRoute, and emulate the behavior in order to do the transformation.
The reason is that the routing decision is made at the very end of
filter chain
in general, and by the time we invoke extproc, we don't have that info.
Furthermore, `x-envoy-ai-gateway-llm-model` is not available before
extproc.


As a bonus of this, we no longer need TargetRef at LLMRoute level since
that's within
the HTTPRoute resources. This will really simplify the PoC
implementation.

---------

Signed-off-by: Takeshi Yoneda <[email protected]>
aabchoo pushed a commit that referenced this pull request Dec 12, 2024
This commit is a follow up on #20. Basically, this makes LLMRoute
a pure "addition" to the existing standardized HTTPRoute.
This makes it possible to configure something like
```
kind: LLMRoute
metadata:
  name: llm-route
spec:
  inputSchema: OpenAI
  httpRouteRef:
    name: my-llm-route
---
kind: HTTPRoute
metadata:
  name: my-llm-route
spec:
  matches:
     - headers:
         key: x-envoy-ai-gateway-llm-model
         value: llama3-70b
       backendRefs:
       - kserve:
         weight: 20
       - aws-bedrock:
         weight: 80
```

where LLMRoute is purely referencing HTTPRoute and
users can configure whatever routing condition in a standardized way
via HTTPRoute while leveraging the LLM specific information, in this
case
x-envoy-ai-gateway-llm-model header.

In the implementation, though it's not merged yet, we have to do the
routing calculation in the extproc by actually analyzing the referenced
HTTPRoute, and emulate the behavior in order to do the transformation.
The reason is that the routing decision is made at the very end of
filter chain
in general, and by the time we invoke extproc, we don't have that info.
Furthermore, `x-envoy-ai-gateway-llm-model` is not available before
extproc.

As a bonus of this, we no longer need TargetRef at LLMRoute level since
that's within
the HTTPRoute resources. This will really simplify the PoC
implementation.

---------

Signed-off-by: Takeshi Yoneda <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants