llamastack
diff --git a/‎.pre-commit-config.yaml‎
Lines changed: 1 addition & 1 deletion b/‎.pre-commit-config.yaml‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎docs/docs/distributions/remote_hosted_distro/oci.md‎
Lines changed: 180 additions & 0 deletions b/‎docs/docs/distributions/remote_hosted_distro/oci.md‎
Lines changed: 180 additions & 0 deletions
diff --git a/‎docs/docs/providers/agents/index.mdx‎
Lines changed: 2 additions & 2 deletions b/‎docs/docs/providers/agents/index.mdx‎
Lines changed: 2 additions & 2 deletions
diff --git a/‎docs/docs/providers/batches/index.mdx‎
Lines changed: 12 additions & 12 deletions b/‎docs/docs/providers/batches/index.mdx‎
Lines changed: 12 additions & 12 deletions
diff --git a/‎docs/docs/providers/inference/index.mdx‎
Lines changed: 6 additions & 6 deletions b/‎docs/docs/providers/inference/index.mdx‎
Lines changed: 6 additions & 6 deletions
diff --git a/‎docs/docs/providers/inference/remote_oci.mdx‎
Lines changed: 48 additions & 0 deletions b/‎docs/docs/providers/inference/remote_oci.mdx‎
Lines changed: 48 additions & 0 deletions
diff --git a/‎docs/static/llama-stack-spec.html‎
Lines changed: 54 additions & 0 deletions b/‎docs/static/llama-stack-spec.html‎
Lines changed: 54 additions & 0 deletions
diff --git a/‎llama_stack/apis/inference/inference.py‎
Lines changed: 1 addition & 0 deletions b/‎llama_stack/apis/inference/inference.py‎
Lines changed: 1 addition & 0 deletions
@@ -1,7 +1,7 @@
 exclude: 'build/'
 
 default_language_version:
-    python: python3.12
+    python: python3.13
     node: "22"
 
 repos:
 
@@ -0,0 +1,180 @@
+---
+orphan: true
+---
+<!-- This file was auto-generated by distro_codegen.py, please edit source -->
+# OCI Distribution
+
+The `llamastack/distribution-oci` distribution consists of the following provider configurations.
+
+| API | Provider(s) |
+|-----|-------------|
+| agents | `inline::meta-reference` |
+| datasetio | `remote::huggingface`, `inline::localfs` |
+| eval | `inline::meta-reference` |
+| files | `inline::localfs` |
+| inference | `remote::oci` |
+| safety | `inline::llama-guard` |
+| scoring | `inline::basic`, `inline::llm-as-judge`, `inline::braintrust` |
+| telemetry | `inline::meta-reference` |
+| tool_runtime | `remote::brave-search`, `remote::tavily-search`, `inline::rag-runtime`, `remote::model-context-protocol` |
+| vector_io | `inline::faiss`, `remote::chromadb`, `remote::pgvector` |
+
+
+### Environment Variables
+
+The following environment variables can be configured:
+
+- `OCI_AUTH_TYPE`: OCI authentication type (instance_principal or config_file) (default: `instance_principal`)
+- `OCI_USER_OCID`: OCI user OCID for authentication (default: ``)
+- `OCI_TENANCY_OCID`: OCI tenancy OCID for authentication (default: ``)
+- `OCI_FINGERPRINT`: OCI API key fingerprint for authentication (default: ``)
+- `OCI_PRIVATE_KEY`: OCI private key for authentication (default: ``)
+- `OCI_REGION`: OCI region (e.g., us-ashburn-1, us-chicago-1, us-phoenix-1, eu-frankfurt-1) (default: ``)
+- `OCI_COMPARTMENT_OCID`: OCI compartment ID for the Generative AI service (default: ``)
+- `OCI_CONFIG_FILE_PATH`: OCI config file path (required if OCI_AUTH_TYPE is config_file) (default: `~/.oci/config`)
+- `OCI_CLI_PROFILE`: OCI CLI profile name to use from config file (default: `DEFAULT`)
+
+
+## Prerequisites
+### Oracle Cloud Infrastructure Setup
+
+Before using the OCI Generative AI distribution, ensure you have:
+
+1. **Oracle Cloud Infrastructure Account**: Sign up at [Oracle Cloud Infrastructure](https://cloud.oracle.com/)
+2. **Generative AI Service Access**: Enable the Generative AI service in your OCI tenancy
+3. **Compartment**: Create or identify a compartment where you'll deploy Generative AI models
+4. **Authentication**: Configure authentication using either:
+   - **Instance Principal** (recommended for cloud-hosted deployments)
+   - **API Key** (for on-premises or development environments)
+
+### Authentication Methods
+
+#### Instance Principal Authentication (Recommended)
+Instance Principal authentication allows OCI resources to authenticate using the identity of the compute instance they're running on. This is the most secure method for production deployments.
+
+Requirements:
+- Instance must be running in an Oracle Cloud Infrastructure compartment
+- Instance must have appropriate IAM policies to access Generative AI services
+
+#### API Key Authentication
+For development or on-premises deployments, you can use API key authentication with the following information:
+- User OCID
+- Tenancy OCID
+- API key fingerprint
+- Private key
+- Region
+
+### Required IAM Policies
+
+Ensure your OCI user or instance has the following policy statements:
+
+```
+Allow group <group_name> to use generative-ai-inference-endpoints in compartment <compartment_name>
+Allow group <group_name> to manage generative-ai-inference-endpoints in compartment <compartment_name>
+```
+
+## Supported Services
+
+### Inference: OCI Generative AI
+Oracle Cloud Infrastructure Generative AI provides access to high-performance AI models through OCI's Platform-as-a-Service offering. The service supports:
+
+- **Chat Completions**: Conversational AI with context awareness
+- **Text Generation**: Complete prompts and generate text content
+- **Embeddings**: Convert text to vector embeddings for search and retrieval
+- **Multiple Model Support**: Access to various foundation models including Cohere, Meta, and custom models
+
+#### Available Models
+Common OCI Generative AI models include access to Meta, Cohere, OpenAI, and Grok models.
+
+### Safety: Llama Guard
+For content safety and moderation, this distribution uses Meta's LlamaGuard model through the OCI Generative AI service to provide:
+- Content filtering and moderation
+- Policy compliance checking
+- Harmful content detection
+
+### Vector Storage: Multiple Options
+The distribution supports several vector storage providers:
+- **FAISS**: Local in-memory vector search
+- **ChromaDB**: Distributed vector database
+- **PGVector**: PostgreSQL with vector extensions
+
+### Additional Services
+- **Dataset I/O**: Local filesystem and Hugging Face integration
+- **Tool Runtime**: Web search (Brave, Tavily) and RAG capabilities
+- **Evaluation**: Meta reference evaluation framework
+
+## Running Llama Stack with OCI
+
+You can run the OCI distribution via Docker or local virtual environment.
+
+### Via Docker
+
+This method allows you to get started quickly without building the distribution code.
+
+```bash
+LLAMA_STACK_PORT=8321
+docker run \
+  -it \
+  --pull always \
+  -p $LLAMA_STACK_PORT:$LLAMA_STACK_PORT \
+  -v ./run.yaml:/root/my-run.yaml \
+  llamastack/distribution-oci \
+  --config /root/my-run.yaml \
+  --port $LLAMA_STACK_PORT \
+  --env OCI_AUTH_TYPE=$OCI_AUTH_TYPE \
+  --env OCI_REGION=$OCI_REGION \
+  --env OCI_COMPARTMENT_OCID=$OCI_COMPARTMENT_OCID
+```
+
+### Via venv
+
+If you've set up your local development environment, you can also build the image using your local virtual environment.
+
+```bash
+OCI_GENAI_MODEL_OCID=oci.ocid1.generativeaimodel.oc1.us-chicago-1.<ocid>
+llama stack build --distro oci --image-type venv
+llama stack run ./run.yaml \
+  --port 8321 \
+  --env OCI_AUTH_TYPE=$OCI_AUTH_TYPE \
+  --env OCI_REGION=$OCI_REGION \
+  --env OCI_COMPARTMENT_OCID=$OCI_COMPARTMENT_OCID
+```
+
+### Configuration Examples
+
+#### Using Instance Principal (Recommended for Production)
+```bash
+export OCI_AUTH_TYPE=instance_principal
+export OCI_REGION=us-chicago-1
+export OCI_COMPARTMENT_OCID=ocid1.compartment.oc1..<your-compartment-id>
+```
+
+#### Using API Key Authentication (Development)
+```bash
+export OCI_AUTH_TYPE=config_file
+export OCI_CONFIG_FILE_PATH=~/.oci/config
+export OCI_CLI_PROFILE=DEFAULT
+export OCI_REGION=us-chicago-1
+export OCI_COMPARTMENT_OCID=ocid1.compartment.oc1..your-compartment-id
+```
+
+## Regional Endpoints
+
+OCI Generative AI is available in multiple regions. The service automatically routes to the appropriate regional endpoint based on your configuration. For a full list of regional model availability, visit:
+
+https://docs.oracle.com/en-us/iaas/Content/generative-ai/overview.htm#regions
+
+## Troubleshooting
+
+### Common Issues
+
+1. **Authentication Errors**: Verify your OCI credentials and IAM policies
+2. **Model Not Found**: Ensure the model OCID is correct and the model is available in your region
+3. **Permission Denied**: Check compartment permissions and Generative AI service access
+4. **Region Unavailable**: Verify the specified region supports Generative AI services
+
+### Getting Help
+
+For additional support:
+- [OCI Generative AI Documentation](https://docs.oracle.com/en-us/iaas/Content/generative-ai/home.htm)
+- [Llama Stack Issues](https://github.com/meta-llama/llama-stack/issues)
@@ -1,7 +1,7 @@
 ---
 description: "Agents
 
-    APIs for creating and interacting with agentic systems."
+APIs for creating and interacting with agentic systems."
 sidebar_label: Agents
 title: Agents
 ---
@@ -12,6 +12,6 @@ title: Agents
 
 Agents
 
-    APIs for creating and interacting with agentic systems.
+APIs for creating and interacting with agentic systems.
 
 This section contains documentation for all available providers for the **agents** API.
@@ -1,14 +1,14 @@
 ---
 description: "The Batches API enables efficient processing of multiple requests in a single operation,
-    particularly useful for processing large datasets, batch evaluation workflows, and
-    cost-effective inference at scale.
+particularly useful for processing large datasets, batch evaluation workflows, and
+cost-effective inference at scale.
 
-    The API is designed to allow use of openai client libraries for seamless integration.
+The API is designed to allow use of openai client libraries for seamless integration.
 
-    This API provides the following extensions:
-     - idempotent batch creation
+This API provides the following extensions:
+ - idempotent batch creation
 
-    Note: This API is currently under active development and may undergo changes."
+Note: This API is currently under active development and may undergo changes."
 sidebar_label: Batches
 title: Batches
 ---
@@ -18,14 +18,14 @@ title: Batches
 ## Overview
 
 The Batches API enables efficient processing of multiple requests in a single operation,
-    particularly useful for processing large datasets, batch evaluation workflows, and
-    cost-effective inference at scale.
+particularly useful for processing large datasets, batch evaluation workflows, and
+cost-effective inference at scale.
 
-    The API is designed to allow use of openai client libraries for seamless integration.
+The API is designed to allow use of openai client libraries for seamless integration.
 
-    This API provides the following extensions:
-     - idempotent batch creation
+This API provides the following extensions:
+ - idempotent batch creation
 
-    Note: This API is currently under active development and may undergo changes.
+Note: This API is currently under active development and may undergo changes.
 
 This section contains documentation for all available providers for the **batches** API.
@@ -3,9 +3,9 @@ description: "Inference
 
     Llama Stack Inference API for generating completions, chat completions, and embeddings.
 
-    This API provides the raw interface to the underlying models. Two kinds of models are supported:
-    - LLM models: these models generate \"raw\" and \"chat\" (conversational) completions.
-    - Embedding models: these models generate embeddings to be used for semantic search."
+This API provides the raw interface to the underlying models. Two kinds of models are supported:
+- LLM models: these models generate \"raw\" and \"chat\" (conversational) completions.
+- Embedding models: these models generate embeddings to be used for semantic search."
 sidebar_label: Inference
 title: Inference
 ---
@@ -18,8 +18,8 @@ Inference
 
     Llama Stack Inference API for generating completions, chat completions, and embeddings.
 
-    This API provides the raw interface to the underlying models. Two kinds of models are supported:
-    - LLM models: these models generate "raw" and "chat" (conversational) completions.
-    - Embedding models: these models generate embeddings to be used for semantic search.
+This API provides the raw interface to the underlying models. Two kinds of models are supported:
+- LLM models: these models generate "raw" and "chat" (conversational) completions.
+- Embedding models: these models generate embeddings to be used for semantic search.
 
 This section contains documentation for all available providers for the **inference** API.
@@ -0,0 +1,48 @@
+---
+description: |
+  Oracle Cloud Infrastructure (OCI) Generative AI inference provider for accessing OCI's Generative AI Platform-as-a-Service models.
+  Provider documentation
+  https://docs.oracle.com/en-us/iaas/Content/generative-ai/home.htm
+sidebar_label: Remote - Oci
+title: remote::oci
+---
+
+# remote::oci
+
+## Description
+
+
+Oracle Cloud Infrastructure (OCI) Generative AI inference provider for accessing OCI's Generative AI Platform-as-a-Service models.
+Provider documentation
+https://docs.oracle.com/en-us/iaas/Content/generative-ai/home.htm
+
+
+## Configuration
+
+| Field | Type | Required | Default | Description |
+|-------|------|----------|---------|-------------|
+| `oci_auth_type` | `<class 'str'>` | No | instance_principal | OCI authentication type (must be one of: instance_principal, config_file) |
+| `oci_config_file_path` | `<class 'str'>` | No | ~/.oci/config | OCI config file path (required if oci_auth_type is config_file) |
+| `oci_config_profile` | `<class 'str'>` | No | DEFAULT | OCI config profile (required if oci_auth_type is config_file) |
+| `oci_region` | `str \| None` | No |  | OCI region (e.g., us-ashburn-1) |
+| `oci_compartment_id` | `str \| None` | No |  | OCI compartment ID for the Generative AI service |
+| `oci_user_ocid` | `str \| None` | No |  | OCI user OCID for authentication |
+| `oci_tenancy_ocid` | `str \| None` | No |  | OCI tenancy OCID for authentication |
+| `oci_fingerprint` | `str \| None` | No |  | OCI API key fingerprint for authentication |
+| `oci_private_key` | `str \| None` | No |  | OCI private key for authentication |
+| `oci_serving_mode` | `<class 'str'>` | No | ON_DEMAND | OCI serving mode (must be one of: ON_DEMAND, DEDICATED) |
+
+## Sample Configuration
+
+```yaml
+oci_auth_type: ${env.OCI_AUTH_TYPE:=instance_principal}
+oci_config_file_path: ${env.OCI_CONFIG_FILE_PATH:=~/.oci/config}
+oci_config_profile: ${env.OCI_CLI_PROFILE:=DEFAULT}
+oci_region: ${env.OCI_REGION:=us-ashburn-1}
+oci_compartment_id: ${env.OCI_COMPARTMENT_OCID:=}
+oci_serving_mode: ${env.OCI_SERVING_MODE:=ON_DEMAND}
+oci_user_ocid: ${env.OCI_USER_OCID:=}
+oci_tenancy_ocid: ${env.OCI_TENANCY_OCID:=}
+oci_fingerprint: ${env.OCI_FINGERPRINT:=}
+oci_private_key: ${env.OCI_PRIVATE_KEY:=}
+```
@@ -5061,8 +5061,62 @@
                         "description": "The model that was used to generate the chat completion"
                     },
                     "usage": {
+<<<<<<< HEAD
                         "$ref": "#/components/schemas/OpenAIChatCompletionUsage",
                         "description": "Token usage information (typically included in final chunk with stream_options)"
+=======
+                        "type": "object",
+                        "properties": {
+                            "completion_tokens": {
+                                "type": "integer"
+                            },
+                            "prompt_tokens": {
+                                "type": "integer"
+                            },
+                            "total_tokens": {
+                                "type": "integer"
+                            },
+                            "completion_tokens_details": {
+                                "type": "object",
+                                "properties": {
+                                    "accepted_prediction_tokens": {
+                                        "type": "integer"
+                                    },
+                                    "audio_tokens": {
+                                        "type": "integer"
+                                    },
+                                    "reasoning_tokens": {
+                                        "type": "integer"
+                                    },
+                                    "rejected_prediction_tokens": {
+                                        "type": "integer"
+                                    }
+                                },
+                                "additionalProperties": false,
+                                "title": "CompletionTokensDetails"
+                            },
+                            "prompt_tokens_details": {
+                                "type": "object",
+                                "properties": {
+                                    "audio_tokens": {
+                                        "type": "integer"
+                                    },
+                                    "cached_tokens": {
+                                        "type": "integer"
+                                    }
+                                },
+                                "additionalProperties": false,
+                                "title": "PromptTokensDetails"
+                            }
+                        },
+                        "additionalProperties": false,
+                        "required": [
+                            "completion_tokens",
+                            "prompt_tokens",
+                            "total_tokens"
+                        ],
+                        "description": "(Optional) Usage information for the completion"
+>>>>>>> 18b9c4c1 (feat: add oci genai service as chat inference provider)
                     }
                 },
                 "additionalProperties": false,
 
@@ -15,6 +15,7 @@
 )
 
 from fastapi import Body
+from openai.types.completion_usage import CompletionUsage
 from pydantic import BaseModel, Field, field_validator
 from typing_extensions import TypedDict
Original file line number	Diff line number	Diff line change
`@@ -15,6 +15,7 @@`
`15`	`15`	`)`
`16`	`16`
`17`	`17`	`from fastapi import Body`
	`18`	`+from openai.types.completion_usage import CompletionUsage`
`18`	`19`	`from pydantic import BaseModel, Field, field_validator`
`19`	`20`	`from typing_extensions import TypedDict`
`20`	`21`