Skip to content

Commit 5fa97b5

Browse files
committed
Add max_output_tokens as argument to Response API
Responses and Completions have a max_output_tokens field. It is currently missing from the create and response object in Responses API. This PR fixes it. fixes: #3562 Signed-off-by: Abhishek Bongale <[email protected]>
1 parent 92219fd commit 5fa97b5

File tree

14 files changed

+127
-20
lines changed

14 files changed

+127
-20
lines changed

docs/docs/providers/agents/index.mdx

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
---
22
description: "Agents
33
4-
APIs for creating and interacting with agentic systems."
4+
APIs for creating and interacting with agentic systems."
55
sidebar_label: Agents
66
title: Agents
77
---
@@ -12,6 +12,6 @@ title: Agents
1212

1313
Agents
1414

15-
APIs for creating and interacting with agentic systems.
15+
APIs for creating and interacting with agentic systems.
1616

1717
This section contains documentation for all available providers for the **agents** API.
Lines changed: 12 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -1,14 +1,14 @@
11
---
22
description: "The Batches API enables efficient processing of multiple requests in a single operation,
3-
particularly useful for processing large datasets, batch evaluation workflows, and
4-
cost-effective inference at scale.
3+
particularly useful for processing large datasets, batch evaluation workflows, and
4+
cost-effective inference at scale.
55
6-
The API is designed to allow use of openai client libraries for seamless integration.
6+
The API is designed to allow use of openai client libraries for seamless integration.
77
8-
This API provides the following extensions:
9-
- idempotent batch creation
8+
This API provides the following extensions:
9+
- idempotent batch creation
1010
11-
Note: This API is currently under active development and may undergo changes."
11+
Note: This API is currently under active development and may undergo changes."
1212
sidebar_label: Batches
1313
title: Batches
1414
---
@@ -18,14 +18,14 @@ title: Batches
1818
## Overview
1919

2020
The Batches API enables efficient processing of multiple requests in a single operation,
21-
particularly useful for processing large datasets, batch evaluation workflows, and
22-
cost-effective inference at scale.
21+
particularly useful for processing large datasets, batch evaluation workflows, and
22+
cost-effective inference at scale.
2323

24-
The API is designed to allow use of openai client libraries for seamless integration.
24+
The API is designed to allow use of openai client libraries for seamless integration.
2525

26-
This API provides the following extensions:
27-
- idempotent batch creation
26+
This API provides the following extensions:
27+
- idempotent batch creation
2828

29-
Note: This API is currently under active development and may undergo changes.
29+
Note: This API is currently under active development and may undergo changes.
3030

3131
This section contains documentation for all available providers for the **batches** API.
Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -1,9 +1,9 @@
11
---
22
description: "Llama Stack Inference API for generating completions, chat completions, and embeddings.
33
4-
This API provides the raw interface to the underlying models. Two kinds of models are supported:
5-
- LLM models: these models generate \"raw\" and \"chat\" (conversational) completions.
6-
- Embedding models: these models generate embeddings to be used for semantic search."
4+
This API provides the raw interface to the underlying models. Two kinds of models are supported:
5+
- LLM models: these models generate \"raw\" and \"chat\" (conversational) completions.
6+
- Embedding models: these models generate embeddings to be used for semantic search."
77
sidebar_label: Inference
88
title: Inference
99
---
@@ -14,8 +14,8 @@ title: Inference
1414

1515
Llama Stack Inference API for generating completions, chat completions, and embeddings.
1616

17-
This API provides the raw interface to the underlying models. Two kinds of models are supported:
18-
- LLM models: these models generate "raw" and "chat" (conversational) completions.
19-
- Embedding models: these models generate embeddings to be used for semantic search.
17+
This API provides the raw interface to the underlying models. Two kinds of models are supported:
18+
- LLM models: these models generate "raw" and "chat" (conversational) completions.
19+
- Embedding models: these models generate embeddings to be used for semantic search.
2020

2121
This section contains documentation for all available providers for the **inference** API.

docs/static/deprecated-llama-stack-spec.html

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -9096,6 +9096,10 @@
90969096
"type": "string",
90979097
"description": "(Optional) Truncation strategy applied to the response"
90989098
},
9099+
"max_output_tokens": {
9100+
"type": "integer",
9101+
"description": "(Optional) Upper bound for response tokens generation"
9102+
},
90999103
"input": {
91009104
"type": "array",
91019105
"items": {
@@ -9914,6 +9918,9 @@
99149918
},
99159919
"max_infer_iters": {
99169920
"type": "integer"
9921+
},
9922+
"max_output_tokens": {
9923+
"type": "integer"
99179924
}
99189925
},
99199926
"additionalProperties": false,
@@ -9983,6 +9990,10 @@
99839990
"truncation": {
99849991
"type": "string",
99859992
"description": "(Optional) Truncation strategy applied to the response"
9993+
},
9994+
"max_output_tokens": {
9995+
"type": "integer",
9996+
"description": "(Optional) Upper bound for response tokens generation"
99869997
}
99879998
},
99889999
"additionalProperties": false,

docs/static/deprecated-llama-stack-spec.yaml

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -6740,6 +6740,10 @@ components:
67406740
type: string
67416741
description: >-
67426742
(Optional) Truncation strategy applied to the response
6743+
max_output_tokens:
6744+
type: integer
6745+
description: >-
6746+
(Optional) Upper bound for response tokens generation
67436747
input:
67446748
type: array
67456749
items:
@@ -7351,6 +7355,8 @@ components:
73517355
(Optional) Additional fields to include in the response.
73527356
max_infer_iters:
73537357
type: integer
7358+
max_output_tokens:
7359+
type: integer
73547360
additionalProperties: false
73557361
required:
73567362
- input
@@ -7414,6 +7420,10 @@ components:
74147420
type: string
74157421
description: >-
74167422
(Optional) Truncation strategy applied to the response
7423+
max_output_tokens:
7424+
type: integer
7425+
description: >-
7426+
(Optional) Upper bound for response tokens generation
74177427
additionalProperties: false
74187428
required:
74197429
- created_at

docs/static/llama-stack-spec.html

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -7503,6 +7503,10 @@
75037503
"type": "string",
75047504
"description": "(Optional) Truncation strategy applied to the response"
75057505
},
7506+
"max_output_tokens": {
7507+
"type": "integer",
7508+
"description": "(Optional) Upper bound for response tokens generation"
7509+
},
75067510
"input": {
75077511
"type": "array",
75087512
"items": {
@@ -8009,6 +8013,9 @@
80098013
},
80108014
"max_infer_iters": {
80118015
"type": "integer"
8016+
},
8017+
"max_output_tokens": {
8018+
"type": "integer"
80128019
}
80138020
},
80148021
"additionalProperties": false,
@@ -8078,6 +8085,10 @@
80788085
"truncation": {
80798086
"type": "string",
80808087
"description": "(Optional) Truncation strategy applied to the response"
8088+
},
8089+
"max_output_tokens": {
8090+
"type": "integer",
8091+
"description": "(Optional) Upper bound for response tokens generation"
80818092
}
80828093
},
80838094
"additionalProperties": false,

docs/static/llama-stack-spec.yaml

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -5660,6 +5660,10 @@ components:
56605660
type: string
56615661
description: >-
56625662
(Optional) Truncation strategy applied to the response
5663+
max_output_tokens:
5664+
type: integer
5665+
description: >-
5666+
(Optional) Upper bound for response tokens generation
56635667
input:
56645668
type: array
56655669
items:
@@ -6014,6 +6018,8 @@ components:
60146018
(Optional) Additional fields to include in the response.
60156019
max_infer_iters:
60166020
type: integer
6021+
max_output_tokens:
6022+
type: integer
60176023
additionalProperties: false
60186024
required:
60196025
- input
@@ -6077,6 +6083,10 @@ components:
60776083
type: string
60786084
description: >-
60796085
(Optional) Truncation strategy applied to the response
6086+
max_output_tokens:
6087+
type: integer
6088+
description: >-
6089+
(Optional) Upper bound for response tokens generation
60806090
additionalProperties: false
60816091
required:
60826092
- created_at

docs/static/stainless-llama-stack-spec.html

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -9512,6 +9512,10 @@
95129512
"type": "string",
95139513
"description": "(Optional) Truncation strategy applied to the response"
95149514
},
9515+
"max_output_tokens": {
9516+
"type": "integer",
9517+
"description": "(Optional) Upper bound for response tokens generation"
9518+
},
95159519
"input": {
95169520
"type": "array",
95179521
"items": {
@@ -10018,6 +10022,9 @@
1001810022
},
1001910023
"max_infer_iters": {
1002010024
"type": "integer"
10025+
},
10026+
"max_output_tokens": {
10027+
"type": "integer"
1002110028
}
1002210029
},
1002310030
"additionalProperties": false,
@@ -10087,6 +10094,10 @@
1008710094
"truncation": {
1008810095
"type": "string",
1008910096
"description": "(Optional) Truncation strategy applied to the response"
10097+
},
10098+
"max_output_tokens": {
10099+
"type": "integer",
10100+
"description": "(Optional) Upper bound for response tokens generation"
1009010101
}
1009110102
},
1009210103
"additionalProperties": false,

docs/static/stainless-llama-stack-spec.yaml

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -7105,6 +7105,10 @@ components:
71057105
type: string
71067106
description: >-
71077107
(Optional) Truncation strategy applied to the response
7108+
max_output_tokens:
7109+
type: integer
7110+
description: >-
7111+
(Optional) Upper bound for response tokens generation
71087112
input:
71097113
type: array
71107114
items:
@@ -7459,6 +7463,8 @@ components:
74597463
(Optional) Additional fields to include in the response.
74607464
max_infer_iters:
74617465
type: integer
7466+
max_output_tokens:
7467+
type: integer
74627468
additionalProperties: false
74637469
required:
74647470
- input
@@ -7522,6 +7528,10 @@ components:
75227528
type: string
75237529
description: >-
75247530
(Optional) Truncation strategy applied to the response
7531+
max_output_tokens:
7532+
type: integer
7533+
description: >-
7534+
(Optional) Upper bound for response tokens generation
75257535
additionalProperties: false
75267536
required:
75277537
- created_at

llama_stack/apis/agents/agents.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -825,6 +825,7 @@ async def create_openai_response(
825825
"List of shields to apply during response generation. Shields provide safety and content moderation."
826826
),
827827
] = None,
828+
max_output_tokens: int | None = None,
828829
) -> OpenAIResponseObject | AsyncIterator[OpenAIResponseObjectStream]:
829830
"""Create a new OpenAI response.
830831

0 commit comments

Comments
 (0)