Skip to content

Commit acd1008

Browse files
feat: add oci genai service as chat inference provider
1 parent 9191005 commit acd1008

File tree

19 files changed

+1384
-28
lines changed

19 files changed

+1384
-28
lines changed
Lines changed: 154 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,154 @@
1+
---
2+
orphan: true
3+
---
4+
<!-- This file was auto-generated by distro_codegen.py, please edit source -->
5+
# OCI Distribution
6+
7+
The `llamastack/distribution-oci` distribution consists of the following provider configurations.
8+
9+
| API | Provider(s) |
10+
|-----|-------------|
11+
| agents | `inline::meta-reference` |
12+
| datasetio | `remote::huggingface`, `inline::localfs` |
13+
| eval | `inline::meta-reference` |
14+
| files | `inline::localfs` |
15+
| inference | `remote::oci` |
16+
| safety | `inline::llama-guard` |
17+
| scoring | `inline::basic`, `inline::llm-as-judge`, `inline::braintrust` |
18+
| tool_runtime | `remote::brave-search`, `remote::tavily-search`, `remote::model-context-protocol` |
19+
| vector_io | `inline::faiss`, `remote::chromadb`, `remote::pgvector` |
20+
21+
22+
### Environment Variables
23+
24+
The following environment variables can be configured:
25+
26+
- `OCI_AUTH_TYPE`: OCI authentication type (instance_principal or config_file) (default: `instance_principal`)
27+
- `OCI_USER_OCID`: OCI user OCID for authentication (default: ``)
28+
- `OCI_TENANCY_OCID`: OCI tenancy OCID for authentication (default: ``)
29+
- `OCI_FINGERPRINT`: OCI API key fingerprint for authentication (default: ``)
30+
- `OCI_PRIVATE_KEY`: OCI private key for authentication (default: ``)
31+
- `OCI_REGION`: OCI region (e.g., us-ashburn-1, us-chicago-1, us-phoenix-1, eu-frankfurt-1) (default: ``)
32+
- `OCI_COMPARTMENT_OCID`: OCI compartment ID for the Generative AI service (default: ``)
33+
- `OCI_CONFIG_FILE_PATH`: OCI config file path (required if OCI_AUTH_TYPE is config_file) (default: `~/.oci/config`)
34+
- `OCI_CLI_PROFILE`: OCI CLI profile name to use from config file (default: `DEFAULT`)
35+
36+
37+
## Prerequisites
38+
### Oracle Cloud Infrastructure Setup
39+
40+
Before using the OCI Generative AI distribution, ensure you have:
41+
42+
1. **Oracle Cloud Infrastructure Account**: Sign up at [Oracle Cloud Infrastructure](https://cloud.oracle.com/)
43+
2. **Generative AI Service Access**: Enable the Generative AI service in your OCI tenancy
44+
3. **Compartment**: Create or identify a compartment where you'll deploy Generative AI models
45+
4. **Authentication**: Configure authentication using either:
46+
- **Instance Principal** (recommended for cloud-hosted deployments)
47+
- **API Key** (for on-premises or development environments)
48+
49+
### Authentication Methods
50+
51+
#### Instance Principal Authentication (Recommended)
52+
Instance Principal authentication allows OCI resources to authenticate using the identity of the compute instance they're running on. This is the most secure method for production deployments.
53+
54+
Requirements:
55+
- Instance must be running in an Oracle Cloud Infrastructure compartment
56+
- Instance must have appropriate IAM policies to access Generative AI services
57+
58+
#### API Key Authentication
59+
For development or on-premises deployments, you can use API key authentication with the following information:
60+
- User OCID
61+
- Tenancy OCID
62+
- API key fingerprint
63+
- Private key
64+
- Region
65+
66+
### Required IAM Policies
67+
68+
Ensure your OCI user or instance has the following policy statements:
69+
70+
```
71+
Allow group <group_name> to use generative-ai-inference-endpoints in compartment <compartment_name>
72+
Allow group <group_name> to manage generative-ai-inference-endpoints in compartment <compartment_name>
73+
```
74+
75+
## Supported Services
76+
77+
### Inference: OCI Generative AI
78+
Oracle Cloud Infrastructure Generative AI provides access to high-performance AI models through OCI's Platform-as-a-Service offering. The service supports:
79+
80+
- **Chat Completions**: Conversational AI with context awareness
81+
- **Text Generation**: Complete prompts and generate text content
82+
- **Embeddings**: Convert text to vector embeddings for search and retrieval
83+
- **Multiple Model Support**: Access to various foundation models including Cohere, Meta, and custom models
84+
85+
#### Available Models
86+
Common OCI Generative AI models include access to Meta, Cohere, OpenAI, and Grok models.
87+
88+
### Safety: Llama Guard
89+
For content safety and moderation, this distribution uses Meta's LlamaGuard model through the OCI Generative AI service to provide:
90+
- Content filtering and moderation
91+
- Policy compliance checking
92+
- Harmful content detection
93+
94+
### Vector Storage: Multiple Options
95+
The distribution supports several vector storage providers:
96+
- **FAISS**: Local in-memory vector search
97+
- **ChromaDB**: Distributed vector database
98+
- **PGVector**: PostgreSQL with vector extensions
99+
100+
### Additional Services
101+
- **Dataset I/O**: Local filesystem and Hugging Face integration
102+
- **Tool Runtime**: Web search (Brave, Tavily) and RAG capabilities
103+
- **Evaluation**: Meta reference evaluation framework
104+
105+
## Running Llama Stack with OCI
106+
107+
You can run the OCI distribution via Docker or local virtual environment.
108+
109+
### Via venv
110+
111+
If you've set up your local development environment, you can also build the image using your local virtual environment.
112+
113+
```bash
114+
OCI_AUTH=$OCI_AUTH_TYPE OCI_REGION=$OCI_REGION OCI_COMPARTMENT_OCID=$OCI_COMPARTMENT_OCID llama stack run --port 8321 oci
115+
```
116+
117+
### Configuration Examples
118+
119+
#### Using Instance Principal (Recommended for Production)
120+
```bash
121+
export OCI_AUTH_TYPE=instance_principal
122+
export OCI_REGION=us-chicago-1
123+
export OCI_COMPARTMENT_OCID=ocid1.compartment.oc1..<your-compartment-id>
124+
```
125+
126+
#### Using API Key Authentication (Development)
127+
```bash
128+
export OCI_AUTH_TYPE=config_file
129+
export OCI_CONFIG_FILE_PATH=~/.oci/config
130+
export OCI_CLI_PROFILE=DEFAULT
131+
export OCI_REGION=us-chicago-1
132+
export OCI_COMPARTMENT_OCID=ocid1.compartment.oc1..your-compartment-id
133+
```
134+
135+
## Regional Endpoints
136+
137+
OCI Generative AI is available in multiple regions. The service automatically routes to the appropriate regional endpoint based on your configuration. For a full list of regional model availability, visit:
138+
139+
https://docs.oracle.com/en-us/iaas/Content/generative-ai/overview.htm#regions
140+
141+
## Troubleshooting
142+
143+
### Common Issues
144+
145+
1. **Authentication Errors**: Verify your OCI credentials and IAM policies
146+
2. **Model Not Found**: Ensure the model OCID is correct and the model is available in your region
147+
3. **Permission Denied**: Check compartment permissions and Generative AI service access
148+
4. **Region Unavailable**: Verify the specified region supports Generative AI services
149+
150+
### Getting Help
151+
152+
For additional support:
153+
- [OCI Generative AI Documentation](https://docs.oracle.com/en-us/iaas/Content/generative-ai/home.htm)
154+
- [Llama Stack Issues](https://github.com/meta-llama/llama-stack/issues)

docs/docs/providers/agents/index.mdx

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
---
22
description: "Agents
33
4-
APIs for creating and interacting with agentic systems."
4+
APIs for creating and interacting with agentic systems."
55
sidebar_label: Agents
66
title: Agents
77
---
@@ -12,6 +12,6 @@ title: Agents
1212

1313
Agents
1414

15-
APIs for creating and interacting with agentic systems.
15+
APIs for creating and interacting with agentic systems.
1616

1717
This section contains documentation for all available providers for the **agents** API.
Lines changed: 12 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -1,14 +1,14 @@
11
---
22
description: "The Batches API enables efficient processing of multiple requests in a single operation,
3-
particularly useful for processing large datasets, batch evaluation workflows, and
4-
cost-effective inference at scale.
3+
particularly useful for processing large datasets, batch evaluation workflows, and
4+
cost-effective inference at scale.
55
6-
The API is designed to allow use of openai client libraries for seamless integration.
6+
The API is designed to allow use of openai client libraries for seamless integration.
77
8-
This API provides the following extensions:
9-
- idempotent batch creation
8+
This API provides the following extensions:
9+
- idempotent batch creation
1010
11-
Note: This API is currently under active development and may undergo changes."
11+
Note: This API is currently under active development and may undergo changes."
1212
sidebar_label: Batches
1313
title: Batches
1414
---
@@ -18,14 +18,14 @@ title: Batches
1818
## Overview
1919

2020
The Batches API enables efficient processing of multiple requests in a single operation,
21-
particularly useful for processing large datasets, batch evaluation workflows, and
22-
cost-effective inference at scale.
21+
particularly useful for processing large datasets, batch evaluation workflows, and
22+
cost-effective inference at scale.
2323

24-
The API is designed to allow use of openai client libraries for seamless integration.
24+
The API is designed to allow use of openai client libraries for seamless integration.
2525

26-
This API provides the following extensions:
27-
- idempotent batch creation
26+
This API provides the following extensions:
27+
- idempotent batch creation
2828

29-
Note: This API is currently under active development and may undergo changes.
29+
Note: This API is currently under active development and may undergo changes.
3030

3131
This section contains documentation for all available providers for the **batches** API.

docs/docs/providers/eval/index.mdx

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
---
22
description: "Evaluations
33
4-
Llama Stack Evaluation API for running evaluations on model and agent candidates."
4+
Llama Stack Evaluation API for running evaluations on model and agent candidates."
55
sidebar_label: Eval
66
title: Eval
77
---
@@ -12,6 +12,6 @@ title: Eval
1212

1313
Evaluations
1414

15-
Llama Stack Evaluation API for running evaluations on model and agent candidates.
15+
Llama Stack Evaluation API for running evaluations on model and agent candidates.
1616

1717
This section contains documentation for all available providers for the **eval** API.

docs/docs/providers/files/index.mdx

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
---
22
description: "Files
33
4-
This API is used to upload documents that can be used with other Llama Stack APIs."
4+
This API is used to upload documents that can be used with other Llama Stack APIs."
55
sidebar_label: Files
66
title: Files
77
---
@@ -12,6 +12,6 @@ title: Files
1212

1313
Files
1414

15-
This API is used to upload documents that can be used with other Llama Stack APIs.
15+
This API is used to upload documents that can be used with other Llama Stack APIs.
1616

1717
This section contains documentation for all available providers for the **files** API.
Lines changed: 8 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -1,11 +1,11 @@
11
---
22
description: "Inference
33
4-
Llama Stack Inference API for generating completions, chat completions, and embeddings.
4+
Llama Stack Inference API for generating completions, chat completions, and embeddings.
55
6-
This API provides the raw interface to the underlying models. Two kinds of models are supported:
7-
- LLM models: these models generate \"raw\" and \"chat\" (conversational) completions.
8-
- Embedding models: these models generate embeddings to be used for semantic search."
6+
This API provides the raw interface to the underlying models. Two kinds of models are supported:
7+
- LLM models: these models generate \"raw\" and \"chat\" (conversational) completions.
8+
- Embedding models: these models generate embeddings to be used for semantic search."
99
sidebar_label: Inference
1010
title: Inference
1111
---
@@ -16,10 +16,10 @@ title: Inference
1616

1717
Inference
1818

19-
Llama Stack Inference API for generating completions, chat completions, and embeddings.
19+
Llama Stack Inference API for generating completions, chat completions, and embeddings.
2020

21-
This API provides the raw interface to the underlying models. Two kinds of models are supported:
22-
- LLM models: these models generate "raw" and "chat" (conversational) completions.
23-
- Embedding models: these models generate embeddings to be used for semantic search.
21+
This API provides the raw interface to the underlying models. Two kinds of models are supported:
22+
- LLM models: these models generate "raw" and "chat" (conversational) completions.
23+
- Embedding models: these models generate embeddings to be used for semantic search.
2424

2525
This section contains documentation for all available providers for the **inference** API.
Lines changed: 48 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,48 @@
1+
---
2+
description: |
3+
Oracle Cloud Infrastructure (OCI) Generative AI inference provider for accessing OCI's Generative AI Platform-as-a-Service models.
4+
Provider documentation
5+
https://docs.oracle.com/en-us/iaas/Content/generative-ai/home.htm
6+
sidebar_label: Remote - Oci
7+
title: remote::oci
8+
---
9+
10+
# remote::oci
11+
12+
## Description
13+
14+
15+
Oracle Cloud Infrastructure (OCI) Generative AI inference provider for accessing OCI's Generative AI Platform-as-a-Service models.
16+
Provider documentation
17+
https://docs.oracle.com/en-us/iaas/Content/generative-ai/home.htm
18+
19+
20+
## Configuration
21+
22+
| Field | Type | Required | Default | Description |
23+
|-------|------|----------|---------|-------------|
24+
| `oci_auth_type` | `<class 'str'>` | No | instance_principal | OCI authentication type (must be one of: instance_principal, config_file) |
25+
| `oci_config_file_path` | `<class 'str'>` | No | ~/.oci/config | OCI config file path (required if oci_auth_type is config_file) |
26+
| `oci_config_profile` | `<class 'str'>` | No | DEFAULT | OCI config profile (required if oci_auth_type is config_file) |
27+
| `oci_region` | `str \| None` | No | | OCI region (e.g., us-ashburn-1) |
28+
| `oci_compartment_id` | `str \| None` | No | | OCI compartment ID for the Generative AI service |
29+
| `oci_user_ocid` | `str \| None` | No | | OCI user OCID for authentication |
30+
| `oci_tenancy_ocid` | `str \| None` | No | | OCI tenancy OCID for authentication |
31+
| `oci_fingerprint` | `str \| None` | No | | OCI API key fingerprint for authentication |
32+
| `oci_private_key` | `str \| None` | No | | OCI private key for authentication |
33+
| `oci_serving_mode` | `<class 'str'>` | No | ON_DEMAND | OCI serving mode (must be one of: ON_DEMAND, DEDICATED) |
34+
35+
## Sample Configuration
36+
37+
```yaml
38+
oci_auth_type: ${env.OCI_AUTH_TYPE:=instance_principal}
39+
oci_config_file_path: ${env.OCI_CONFIG_FILE_PATH:=~/.oci/config}
40+
oci_config_profile: ${env.OCI_CLI_PROFILE:=DEFAULT}
41+
oci_region: ${env.OCI_REGION:=us-ashburn-1}
42+
oci_compartment_id: ${env.OCI_COMPARTMENT_OCID:=}
43+
oci_serving_mode: ${env.OCI_SERVING_MODE:=ON_DEMAND}
44+
oci_user_ocid: ${env.OCI_USER_OCID:=}
45+
oci_tenancy_ocid: ${env.OCI_TENANCY_OCID:=}
46+
oci_fingerprint: ${env.OCI_FINGERPRINT:=}
47+
oci_private_key: ${env.OCI_PRIVATE_KEY:=}
48+
```

docs/docs/providers/safety/index.mdx

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
---
22
description: "Safety
33
4-
OpenAI-compatible Moderations API."
4+
OpenAI-compatible Moderations API."
55
sidebar_label: Safety
66
title: Safety
77
---
@@ -12,6 +12,6 @@ title: Safety
1212

1313
Safety
1414

15-
OpenAI-compatible Moderations API.
15+
OpenAI-compatible Moderations API.
1616

1717
This section contains documentation for all available providers for the **safety** API.
Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,9 @@
1+
# Copyright (c) Meta Platforms, Inc. and affiliates.
2+
# All rights reserved.
3+
#
4+
# This source code is licensed under the terms described in the LICENSE file in
5+
# the root directory of this source tree.
6+
7+
from .oci import get_distribution_template
8+
9+
__all__ = ["get_distribution_template"]
Lines changed: 34 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,34 @@
1+
version: 2
2+
distribution_spec:
3+
description: Use Oracle Cloud Infrastructure (OCI) Generative AI for running LLM
4+
inference with scalable cloud services
5+
providers:
6+
inference:
7+
- provider_type: remote::oci
8+
vector_io:
9+
- provider_type: inline::faiss
10+
- provider_type: remote::chromadb
11+
- provider_type: remote::pgvector
12+
safety:
13+
- provider_type: inline::llama-guard
14+
agents:
15+
- provider_type: inline::meta-reference
16+
eval:
17+
- provider_type: inline::meta-reference
18+
datasetio:
19+
- provider_type: remote::huggingface
20+
- provider_type: inline::localfs
21+
scoring:
22+
- provider_type: inline::basic
23+
- provider_type: inline::llm-as-judge
24+
- provider_type: inline::braintrust
25+
tool_runtime:
26+
- provider_type: remote::brave-search
27+
- provider_type: remote::tavily-search
28+
- provider_type: remote::model-context-protocol
29+
files:
30+
- provider_type: inline::localfs
31+
image_type: venv
32+
additional_pip_packages:
33+
- aiosqlite
34+
- sqlalchemy[asyncio]

0 commit comments

Comments
 (0)