-
Notifications
You must be signed in to change notification settings - Fork 1.2k
feat: add oci genai service as chat inference provider #3876
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
feat: add oci genai service as chat inference provider #3876
Conversation
e17bf00 to
acd1008
Compare
|
@github-actions run precommit |
|
⏳ Running pre-commit hooks on PR #3876... |
|
✅ Pre-commit hooks completed successfully! 🔧 Changes have been committed and pushed to the PR branch. |
|
Removing docs additions at request of @raghotham |
|
cc @mattf for a review since this touches the inference system |
|
Any updates here? |
mattf
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- does oci provide an openai compatible endpoint?
- please include output of the inference tests against the remote::oci provider
|
I think it would be much preferable if we can work against an OpenAI compatible endpoint. Otherwise, at the very least we need a set of recorded tests against the provider. But before recordings, let's make sure at least the tests pass "live". Here's a command to run (roughly): |
|
@leseb @ashwinb @mattf thanks for reviewing. If it would be strongly preferable for me to use an OpenAI compatible endpoint, I can make those changes. I'll refactor and re-request when this is done. Sorry also, I started the PR a few weeks ago before a conference, and when I got back inference providers had changed significantly, although it seems like for the better. I'll align on the changes and re-request! |
32c0f3b to
62eed3b
Compare
|
Hi @leseb @ashwinb @raghotham , I've updated the OCI inference provider to use openai compatible endpoints. It isn't quite using the "mixin", but I am extending OpenAI's base client to use our auth, so I am making the calls directly from the OpenAI base client. I needed to still use the ModelRegistryHelper so I could list models, as our API endpoint doesn't give /models. Additionally, embeddings isn't supported because the service URL which I've used to support the openAI compatible endpoints shadows an internal API which calls a "chat" API, which leaves embeddings unsupported. However, it does work for chat_completions, and I've got it skipping tests which should not pass. To test, I've run: OCI_COMPARTMENT_OCID="ocid1.compartment.oc1..aaaaaaaa5rwhi5.......5a" OCI_REGION="us-chicago-1" OCI_AUTH_TYPE=instance_principal pytest -sv tests/integration/inference/ --stack-config oci --text-model meta.llama-3.3-70b-instruct --inference-mode liveWhich results in (warning is that OTEL collector not set): ===== 18 passed, 68 skipped, 1 warning in 16.92s =====Additionally, functionally running (returns many more models than this): curl localhost:8321/v1/models | jq
...
{
"id": "meta.llama-3.3-70b-instruct",
"object": "model",
"created": 1762335531,
"owned_by": "llama_stack",
"custom_metadata": {
"model_type": "llm",
"provider_id": "oci",
"provider_resource_id": "ocid1.generativeaimodel.oc1.us-chicago-1.amaaaaaask77ujdnqfjq6hzo2loq",
"display_name": "meta.llama-3.3-70b-instruct",
"capabilities": [
"CHAT"
],
"oci_model_id": "ocid1.generativeaimodel.oc1.us-chicago-1.amaaaaaask7dceyajqi2pl7ujdnqfjq6hzo2loq"
}
},
{
"id": "openai.gpt-4o",
"object": "model",
"created": 1762335531,
"owned_by": "llama_stack",
"custom_metadata": {
"model_type": "llm",
"provider_id": "oci",
"provider_resource_id": "ocid1.generativeaimodel.oc1.us-chicago-1.amaaaaaask6n7caii5lnvcpjlwr2s2q",
"display_name": "openai.gpt-4o",
"capabilities": [
"CHAT"
],
"oci_model_id": "ocid1.generativeaimodel.oc1.us-chicago-1.amaaaaaask7dceya663lnvcpjlwr2s2q"
}
},
...And using each of these models: curl -X POST http://localhost:8321/v1/chat/completions -H "Content-Type: application/json" -d '{"model": "meta.llama-3.3-70b-instruct", "messages": [{"role": "user", "content": "Write a haiku about gpus on the cloud."}]}' | jq
{
"id": "1c653419cc3146448e5c6d3771ff1ba9",
"choices": [
{
"finish_reason": "stop",
"index": 0,
"logprobs": null,
"message": {
"content": "Virtual cores rise\nCloud GPUs softly humming\nRemote power unfurls",
"refusal": null,
"role": "assistant",
"annotations": null,
"audio": null,
"function_call": null,
"tool_calls": null
},
"matched_stop": 128009
}
],
"created": 1762335708,
"model": "meta.llama-3.3-70b-instruct",
"object": "chat.completion",
"service_tier": null,
"system_fingerprint": null,
"usage": {
"completion_tokens": 14,
"prompt_tokens": 46,
"total_tokens": 60,
"completion_tokens_details": null,
"prompt_tokens_details": null
},
"metrics": [
{
"trace_id": "52ce0c3ea7c416e3744f722509107e5a",
"span_id": "28bc5e4e963c8f9d",
"timestamp": "2025-11-05T09:41:48.294940Z",
"attributes": {
"model_id": "meta.llama-3.3-70b-instruct",
"provider_id": "oci"
},
"type": "metric",
"metric": "prompt_tokens",
"value": 46,
"unit": "tokens"
},
{
"trace_id": "52ce0c3ea7c416e3744f722509107e5a",
"span_id": "28bc5e4e963c8f9d",
"timestamp": "2025-11-05T09:41:48.294952Z",
"attributes": {
"model_id": "meta.llama-3.3-70b-instruct",
"provider_id": "oci"
},
"type": "metric",
"metric": "completion_tokens",
"value": 14,
"unit": "tokens"
},
{
"trace_id": "52ce0c3ea7c416e3744f722509107e5a",
"span_id": "28bc5e4e963c8f9d",
"timestamp": "2025-11-05T09:41:48.294955Z",
"attributes": {
"model_id": "meta.llama-3.3-70b-instruct",
"provider_id": "oci"
},
"type": "metric",
"metric": "total_tokens",
"value": 60,
"unit": "tokens"
}
]
}And: curl -X POST http://localhost:8321/v1/chat/completions -H "Content-Type: application/json" -d '{"model": "openai.gpt-4o", "messages": [{"role": "user", "cont
ent": "Write a haiku about gpus on the cloud."}]}' | jq
{
"id": "chatcmpl-CYUKuk9bRrbwovof1PAMhiZgAEVCV",
"choices": [
{
"finish_reason": "stop",
"index": 0,
"logprobs": null,
"message": {
"content": "Silicon whispers, \nIn cloud's embrace, power soars— \nDreams rendered in light.",
"refusal": null,
"role": "assistant",
"annotations": [],
"audio": null,
"function_call": null,
"tool_calls": null
}
}
],
"created": 1762335800,
"model": "openai.gpt-4o",
"object": "chat.completion",
"service_tier": "default",
"system_fingerprint": "fp_cbf1785567",
"usage": {
"completion_tokens": 21,
"prompt_tokens": 18,
"total_tokens": 39,
"completion_tokens_details": {
"accepted_prediction_tokens": 0,
"audio_tokens": 0,
"reasoning_tokens": 0,
"rejected_prediction_tokens": 0
},
"prompt_tokens_details": {
"audio_tokens": 0,
"cached_tokens": 0
}
},
"metrics": [
{
"trace_id": "0c14652cb02eb4e408cfbc8bf7db4a57",
"span_id": "caba87321dc9a687",
"timestamp": "2025-11-05T09:43:21.490856Z",
"attributes": {
"model_id": "openai.gpt-4o",
"provider_id": "oci"
},
"type": "metric",
"metric": "prompt_tokens",
"value": 18,
"unit": "tokens"
},
{
"trace_id": "0c14652cb02eb4e408cfbc8bf7db4a57",
"span_id": "caba87321dc9a687",
"timestamp": "2025-11-05T09:43:21.490868Z",
"attributes": {
"model_id": "openai.gpt-4o",
"provider_id": "oci"
},
"type": "metric",
"metric": "completion_tokens",
"value": 21,
"unit": "tokens"
},
{
"trace_id": "0c14652cb02eb4e408cfbc8bf7db4a57",
"span_id": "caba87321dc9a687",
"timestamp": "2025-11-05T09:43:21.490871Z",
"attributes": {
"model_id": "openai.gpt-4o",
"provider_id": "oci"
},
"type": "metric",
"metric": "total_tokens",
"value": 39,
"unit": "tokens"
}
]
} |
ashwinb
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks clean to me. Not using OpenAIMixin directly here also seems reasonable given the extensive customizations but others may disagree. I have a couple small inlines. Also, please make pre-commit green.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
-
build_oci_model_entriescan beOpenAIMixin.list_provider_model_ids -
OciOpenAIClientcan be setup forOpenAIMixin.get_extra_client_params
I'll get these in, thanks. |
62eed3b to
b6f9788
Compare
|
@mattf - made the requested changes, thanks for pointing me in the right direction! |
b6f9788 to
46cd644
Compare
mattf
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looking good. please cleanup auth.py and models.py.
fyi, you're including rerank models, but aren't implementing a rerank interface.
46cd644 to
6319cce
Compare
@mattf Good deal, I actually use auth.py for client authentication - I need to use it for OCI request signing because we don't use API keys as the auth in the request and I don't think it makes sense to smoosh it into the other files. If you are aligned on auth.py, I've removed models.py and removed rerank from models list. I can follow up with embeddings and rerank in a future PR. |
6319cce to
670e4ec
Compare
What does this PR do?
Adds OCI GenAI PaaS models for openai chat completion endpoints.
Test Plan
In an OCI tenancy with access to GenAI PaaS, perform the following steps:
modelsendpoint to list models after server is running:/chat/completionsrequest:/modelsendpoint.