Skip to content

Commit f060f22

Browse files
feat: add oci genai service as chat inference provider
1 parent 9191005 commit f060f22

File tree

20 files changed

+1719
-21
lines changed

20 files changed

+1719
-21
lines changed

.pre-commit-config.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
exclude: 'build/'
22

33
default_language_version:
4-
python: python3.12
4+
python: python3.13
55
node: "22"
66

77
repos:
Lines changed: 180 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,180 @@
1+
---
2+
orphan: true
3+
---
4+
<!-- This file was auto-generated by distro_codegen.py, please edit source -->
5+
# OCI Distribution
6+
7+
The `llamastack/distribution-oci` distribution consists of the following provider configurations.
8+
9+
| API | Provider(s) |
10+
|-----|-------------|
11+
| agents | `inline::meta-reference` |
12+
| datasetio | `remote::huggingface`, `inline::localfs` |
13+
| eval | `inline::meta-reference` |
14+
| files | `inline::localfs` |
15+
| inference | `remote::oci` |
16+
| safety | `inline::llama-guard` |
17+
| scoring | `inline::basic`, `inline::llm-as-judge`, `inline::braintrust` |
18+
| telemetry | `inline::meta-reference` |
19+
| tool_runtime | `remote::brave-search`, `remote::tavily-search`, `inline::rag-runtime`, `remote::model-context-protocol` |
20+
| vector_io | `inline::faiss`, `remote::chromadb`, `remote::pgvector` |
21+
22+
23+
### Environment Variables
24+
25+
The following environment variables can be configured:
26+
27+
- `OCI_AUTH_TYPE`: OCI authentication type (instance_principal or config_file) (default: `instance_principal`)
28+
- `OCI_USER_OCID`: OCI user OCID for authentication (default: ``)
29+
- `OCI_TENANCY_OCID`: OCI tenancy OCID for authentication (default: ``)
30+
- `OCI_FINGERPRINT`: OCI API key fingerprint for authentication (default: ``)
31+
- `OCI_PRIVATE_KEY`: OCI private key for authentication (default: ``)
32+
- `OCI_REGION`: OCI region (e.g., us-ashburn-1, us-chicago-1, us-phoenix-1, eu-frankfurt-1) (default: ``)
33+
- `OCI_COMPARTMENT_OCID`: OCI compartment ID for the Generative AI service (default: ``)
34+
- `OCI_CONFIG_FILE_PATH`: OCI config file path (required if OCI_AUTH_TYPE is config_file) (default: `~/.oci/config`)
35+
- `OCI_CLI_PROFILE`: OCI CLI profile name to use from config file (default: `DEFAULT`)
36+
37+
38+
## Prerequisites
39+
### Oracle Cloud Infrastructure Setup
40+
41+
Before using the OCI Generative AI distribution, ensure you have:
42+
43+
1. **Oracle Cloud Infrastructure Account**: Sign up at [Oracle Cloud Infrastructure](https://cloud.oracle.com/)
44+
2. **Generative AI Service Access**: Enable the Generative AI service in your OCI tenancy
45+
3. **Compartment**: Create or identify a compartment where you'll deploy Generative AI models
46+
4. **Authentication**: Configure authentication using either:
47+
- **Instance Principal** (recommended for cloud-hosted deployments)
48+
- **API Key** (for on-premises or development environments)
49+
50+
### Authentication Methods
51+
52+
#### Instance Principal Authentication (Recommended)
53+
Instance Principal authentication allows OCI resources to authenticate using the identity of the compute instance they're running on. This is the most secure method for production deployments.
54+
55+
Requirements:
56+
- Instance must be running in an Oracle Cloud Infrastructure compartment
57+
- Instance must have appropriate IAM policies to access Generative AI services
58+
59+
#### API Key Authentication
60+
For development or on-premises deployments, you can use API key authentication with the following information:
61+
- User OCID
62+
- Tenancy OCID
63+
- API key fingerprint
64+
- Private key
65+
- Region
66+
67+
### Required IAM Policies
68+
69+
Ensure your OCI user or instance has the following policy statements:
70+
71+
```
72+
Allow group <group_name> to use generative-ai-inference-endpoints in compartment <compartment_name>
73+
Allow group <group_name> to manage generative-ai-inference-endpoints in compartment <compartment_name>
74+
```
75+
76+
## Supported Services
77+
78+
### Inference: OCI Generative AI
79+
Oracle Cloud Infrastructure Generative AI provides access to high-performance AI models through OCI's Platform-as-a-Service offering. The service supports:
80+
81+
- **Chat Completions**: Conversational AI with context awareness
82+
- **Text Generation**: Complete prompts and generate text content
83+
- **Embeddings**: Convert text to vector embeddings for search and retrieval
84+
- **Multiple Model Support**: Access to various foundation models including Cohere, Meta, and custom models
85+
86+
#### Available Models
87+
Common OCI Generative AI models include access to Meta, Cohere, OpenAI, and Grok models.
88+
89+
### Safety: Llama Guard
90+
For content safety and moderation, this distribution uses Meta's LlamaGuard model through the OCI Generative AI service to provide:
91+
- Content filtering and moderation
92+
- Policy compliance checking
93+
- Harmful content detection
94+
95+
### Vector Storage: Multiple Options
96+
The distribution supports several vector storage providers:
97+
- **FAISS**: Local in-memory vector search
98+
- **ChromaDB**: Distributed vector database
99+
- **PGVector**: PostgreSQL with vector extensions
100+
101+
### Additional Services
102+
- **Dataset I/O**: Local filesystem and Hugging Face integration
103+
- **Tool Runtime**: Web search (Brave, Tavily) and RAG capabilities
104+
- **Evaluation**: Meta reference evaluation framework
105+
106+
## Running Llama Stack with OCI
107+
108+
You can run the OCI distribution via Docker or local virtual environment.
109+
110+
### Via Docker
111+
112+
This method allows you to get started quickly without building the distribution code.
113+
114+
```bash
115+
LLAMA_STACK_PORT=8321
116+
docker run \
117+
-it \
118+
--pull always \
119+
-p $LLAMA_STACK_PORT:$LLAMA_STACK_PORT \
120+
-v ./run.yaml:/root/my-run.yaml \
121+
llamastack/distribution-oci \
122+
--config /root/my-run.yaml \
123+
--port $LLAMA_STACK_PORT \
124+
--env OCI_AUTH_TYPE=$OCI_AUTH_TYPE \
125+
--env OCI_REGION=$OCI_REGION \
126+
--env OCI_COMPARTMENT_OCID=$OCI_COMPARTMENT_OCID
127+
```
128+
129+
### Via venv
130+
131+
If you've set up your local development environment, you can also build the image using your local virtual environment.
132+
133+
```bash
134+
OCI_GENAI_MODEL_OCID=oci.ocid1.generativeaimodel.oc1.us-chicago-1.<ocid>
135+
llama stack build --distro oci --image-type venv
136+
llama stack run ./run.yaml \
137+
--port 8321 \
138+
--env OCI_AUTH_TYPE=$OCI_AUTH_TYPE \
139+
--env OCI_REGION=$OCI_REGION \
140+
--env OCI_COMPARTMENT_OCID=$OCI_COMPARTMENT_OCID
141+
```
142+
143+
### Configuration Examples
144+
145+
#### Using Instance Principal (Recommended for Production)
146+
```bash
147+
export OCI_AUTH_TYPE=instance_principal
148+
export OCI_REGION=us-chicago-1
149+
export OCI_COMPARTMENT_OCID=ocid1.compartment.oc1..<your-compartment-id>
150+
```
151+
152+
#### Using API Key Authentication (Development)
153+
```bash
154+
export OCI_AUTH_TYPE=config_file
155+
export OCI_CONFIG_FILE_PATH=~/.oci/config
156+
export OCI_CLI_PROFILE=DEFAULT
157+
export OCI_REGION=us-chicago-1
158+
export OCI_COMPARTMENT_OCID=ocid1.compartment.oc1..your-compartment-id
159+
```
160+
161+
## Regional Endpoints
162+
163+
OCI Generative AI is available in multiple regions. The service automatically routes to the appropriate regional endpoint based on your configuration. For a full list of regional model availability, visit:
164+
165+
https://docs.oracle.com/en-us/iaas/Content/generative-ai/overview.htm#regions
166+
167+
## Troubleshooting
168+
169+
### Common Issues
170+
171+
1. **Authentication Errors**: Verify your OCI credentials and IAM policies
172+
2. **Model Not Found**: Ensure the model OCID is correct and the model is available in your region
173+
3. **Permission Denied**: Check compartment permissions and Generative AI service access
174+
4. **Region Unavailable**: Verify the specified region supports Generative AI services
175+
176+
### Getting Help
177+
178+
For additional support:
179+
- [OCI Generative AI Documentation](https://docs.oracle.com/en-us/iaas/Content/generative-ai/home.htm)
180+
- [Llama Stack Issues](https://github.com/meta-llama/llama-stack/issues)

docs/docs/providers/agents/index.mdx

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
---
22
description: "Agents
33
4-
APIs for creating and interacting with agentic systems."
4+
APIs for creating and interacting with agentic systems."
55
sidebar_label: Agents
66
title: Agents
77
---
@@ -12,6 +12,6 @@ title: Agents
1212

1313
Agents
1414

15-
APIs for creating and interacting with agentic systems.
15+
APIs for creating and interacting with agentic systems.
1616

1717
This section contains documentation for all available providers for the **agents** API.
Lines changed: 12 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -1,14 +1,14 @@
11
---
22
description: "The Batches API enables efficient processing of multiple requests in a single operation,
3-
particularly useful for processing large datasets, batch evaluation workflows, and
4-
cost-effective inference at scale.
3+
particularly useful for processing large datasets, batch evaluation workflows, and
4+
cost-effective inference at scale.
55
6-
The API is designed to allow use of openai client libraries for seamless integration.
6+
The API is designed to allow use of openai client libraries for seamless integration.
77
8-
This API provides the following extensions:
9-
- idempotent batch creation
8+
This API provides the following extensions:
9+
- idempotent batch creation
1010
11-
Note: This API is currently under active development and may undergo changes."
11+
Note: This API is currently under active development and may undergo changes."
1212
sidebar_label: Batches
1313
title: Batches
1414
---
@@ -18,14 +18,14 @@ title: Batches
1818
## Overview
1919

2020
The Batches API enables efficient processing of multiple requests in a single operation,
21-
particularly useful for processing large datasets, batch evaluation workflows, and
22-
cost-effective inference at scale.
21+
particularly useful for processing large datasets, batch evaluation workflows, and
22+
cost-effective inference at scale.
2323

24-
The API is designed to allow use of openai client libraries for seamless integration.
24+
The API is designed to allow use of openai client libraries for seamless integration.
2525

26-
This API provides the following extensions:
27-
- idempotent batch creation
26+
This API provides the following extensions:
27+
- idempotent batch creation
2828

29-
Note: This API is currently under active development and may undergo changes.
29+
Note: This API is currently under active development and may undergo changes.
3030

3131
This section contains documentation for all available providers for the **batches** API.

docs/docs/providers/inference/index.mdx

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -3,9 +3,9 @@ description: "Inference
33
44
Llama Stack Inference API for generating completions, chat completions, and embeddings.
55
6-
This API provides the raw interface to the underlying models. Two kinds of models are supported:
7-
- LLM models: these models generate \"raw\" and \"chat\" (conversational) completions.
8-
- Embedding models: these models generate embeddings to be used for semantic search."
6+
This API provides the raw interface to the underlying models. Two kinds of models are supported:
7+
- LLM models: these models generate \"raw\" and \"chat\" (conversational) completions.
8+
- Embedding models: these models generate embeddings to be used for semantic search."
99
sidebar_label: Inference
1010
title: Inference
1111
---
@@ -18,8 +18,8 @@ Inference
1818

1919
Llama Stack Inference API for generating completions, chat completions, and embeddings.
2020

21-
This API provides the raw interface to the underlying models. Two kinds of models are supported:
22-
- LLM models: these models generate "raw" and "chat" (conversational) completions.
23-
- Embedding models: these models generate embeddings to be used for semantic search.
21+
This API provides the raw interface to the underlying models. Two kinds of models are supported:
22+
- LLM models: these models generate "raw" and "chat" (conversational) completions.
23+
- Embedding models: these models generate embeddings to be used for semantic search.
2424

2525
This section contains documentation for all available providers for the **inference** API.
Lines changed: 48 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,48 @@
1+
---
2+
description: |
3+
Oracle Cloud Infrastructure (OCI) Generative AI inference provider for accessing OCI's Generative AI Platform-as-a-Service models.
4+
Provider documentation
5+
https://docs.oracle.com/en-us/iaas/Content/generative-ai/home.htm
6+
sidebar_label: Remote - Oci
7+
title: remote::oci
8+
---
9+
10+
# remote::oci
11+
12+
## Description
13+
14+
15+
Oracle Cloud Infrastructure (OCI) Generative AI inference provider for accessing OCI's Generative AI Platform-as-a-Service models.
16+
Provider documentation
17+
https://docs.oracle.com/en-us/iaas/Content/generative-ai/home.htm
18+
19+
20+
## Configuration
21+
22+
| Field | Type | Required | Default | Description |
23+
|-------|------|----------|---------|-------------|
24+
| `oci_auth_type` | `<class 'str'>` | No | instance_principal | OCI authentication type (must be one of: instance_principal, config_file) |
25+
| `oci_config_file_path` | `<class 'str'>` | No | ~/.oci/config | OCI config file path (required if oci_auth_type is config_file) |
26+
| `oci_config_profile` | `<class 'str'>` | No | DEFAULT | OCI config profile (required if oci_auth_type is config_file) |
27+
| `oci_region` | `str \| None` | No | | OCI region (e.g., us-ashburn-1) |
28+
| `oci_compartment_id` | `str \| None` | No | | OCI compartment ID for the Generative AI service |
29+
| `oci_user_ocid` | `str \| None` | No | | OCI user OCID for authentication |
30+
| `oci_tenancy_ocid` | `str \| None` | No | | OCI tenancy OCID for authentication |
31+
| `oci_fingerprint` | `str \| None` | No | | OCI API key fingerprint for authentication |
32+
| `oci_private_key` | `str \| None` | No | | OCI private key for authentication |
33+
| `oci_serving_mode` | `<class 'str'>` | No | ON_DEMAND | OCI serving mode (must be one of: ON_DEMAND, DEDICATED) |
34+
35+
## Sample Configuration
36+
37+
```yaml
38+
oci_auth_type: ${env.OCI_AUTH_TYPE:=instance_principal}
39+
oci_config_file_path: ${env.OCI_CONFIG_FILE_PATH:=~/.oci/config}
40+
oci_config_profile: ${env.OCI_CLI_PROFILE:=DEFAULT}
41+
oci_region: ${env.OCI_REGION:=us-ashburn-1}
42+
oci_compartment_id: ${env.OCI_COMPARTMENT_OCID:=}
43+
oci_serving_mode: ${env.OCI_SERVING_MODE:=ON_DEMAND}
44+
oci_user_ocid: ${env.OCI_USER_OCID:=}
45+
oci_tenancy_ocid: ${env.OCI_TENANCY_OCID:=}
46+
oci_fingerprint: ${env.OCI_FINGERPRINT:=}
47+
oci_private_key: ${env.OCI_PRIVATE_KEY:=}
48+
```

docs/static/llama-stack-spec.html

Lines changed: 54 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -5061,8 +5061,62 @@
50615061
"description": "The model that was used to generate the chat completion"
50625062
},
50635063
"usage": {
5064+
<<<<<<< HEAD
50645065
"$ref": "#/components/schemas/OpenAIChatCompletionUsage",
50655066
"description": "Token usage information (typically included in final chunk with stream_options)"
5067+
=======
5068+
"type": "object",
5069+
"properties": {
5070+
"completion_tokens": {
5071+
"type": "integer"
5072+
},
5073+
"prompt_tokens": {
5074+
"type": "integer"
5075+
},
5076+
"total_tokens": {
5077+
"type": "integer"
5078+
},
5079+
"completion_tokens_details": {
5080+
"type": "object",
5081+
"properties": {
5082+
"accepted_prediction_tokens": {
5083+
"type": "integer"
5084+
},
5085+
"audio_tokens": {
5086+
"type": "integer"
5087+
},
5088+
"reasoning_tokens": {
5089+
"type": "integer"
5090+
},
5091+
"rejected_prediction_tokens": {
5092+
"type": "integer"
5093+
}
5094+
},
5095+
"additionalProperties": false,
5096+
"title": "CompletionTokensDetails"
5097+
},
5098+
"prompt_tokens_details": {
5099+
"type": "object",
5100+
"properties": {
5101+
"audio_tokens": {
5102+
"type": "integer"
5103+
},
5104+
"cached_tokens": {
5105+
"type": "integer"
5106+
}
5107+
},
5108+
"additionalProperties": false,
5109+
"title": "PromptTokensDetails"
5110+
}
5111+
},
5112+
"additionalProperties": false,
5113+
"required": [
5114+
"completion_tokens",
5115+
"prompt_tokens",
5116+
"total_tokens"
5117+
],
5118+
"description": "(Optional) Usage information for the completion"
5119+
>>>>>>> 18b9c4c1 (feat: add oci genai service as chat inference provider)
50665120
}
50675121
},
50685122
"additionalProperties": false,

llama_stack/apis/inference/inference.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -15,6 +15,7 @@
1515
)
1616

1717
from fastapi import Body
18+
from openai.types.completion_usage import CompletionUsage
1819
from pydantic import BaseModel, Field, field_validator
1920
from typing_extensions import TypedDict
2021

0 commit comments

Comments
 (0)