Skip to content

Commit b54a16e

Browse files
authored
Add timeout options for run_sync (#389)
2 parents baf3c99 + 13def76 commit b54a16e

File tree

2 files changed

+103
-79
lines changed

2 files changed

+103
-79
lines changed

serverless/endpoints/send-requests.mdx

Lines changed: 89 additions & 69 deletions
Original file line numberDiff line numberDiff line change
@@ -12,26 +12,34 @@ Serverless endpoints provide synchronous and asynchronous job processing with au
1212

1313
## How requests work
1414

15-
After creating a Serverless [endpoint](/serverless/endpoints/overview), you can start sending it **requests** to submit jobs and retrieve results. A request can include parameters, payloads, and headers that define what the endpoint should process. For example, you can send a `POST` request to submit a job, or a `GET` request to check status of a job, retrieve results, or check endpoint health.
15+
After creating a Serverless [endpoint](/serverless/endpoints/overview), you can start sending it **requests** to submit jobs and retrieve results.
1616

17-
A **job** is a unit of work containing the input data from the request, packaged for processing by your [workers](/serverless/workers/overview). If no worker is immediately available, the job is queued. Once a worker is available, the job is processed by the worker using your [handler function](/serverless/workers/handler-functions).
17+
A request can include parameters, payloads, and headers that define what the endpoint should process. For example, you can send a `POST` request to submit a job, or a `GET` request to check status of a job, retrieve results, or check endpoint health.
1818

19-
When you submit a job request, it can be either synchronous or asynchronous depending on the operation you use:
19+
A **job** is a unit of work containing the input data from the request, packaged for processing by your [workers](/serverless/workers/overview).
2020

21-
- `/runsync` submits a synchronous job. A response is returned as soon as the job is complete.
22-
- `/run` submits an asynchronous job. The job is processed in the background, and you can retrieve the result by sending a `GET` request to the `/status` endpoint.
21+
If no worker is immediately available, the job is queued. Once a worker is available, the job is processed using your worker's [handler function](/serverless/workers/handler-functions).
2322

24-
Queue-based endpoints provide a fixed set of operations for submitting and managing jobs. You can find a full list of operations and examples in the [sections below](/serverless/endpoints/send-requests#operation-overview).
23+
Queue-based endpoints provide a fixed set of operations for submitting and managing jobs. You can find a full list of operations and sample code in the [sections below](/serverless/endpoints/send-requests#operation-overview).
2524

26-
<Tip>
27-
If you need to create an endpoint that supports custom API paths, use [load balancing endpoints](/serverless/load-balancing/overview).
28-
</Tip>
25+
## Sync vs. async
2926

30-
## Request input structure
27+
When you submit a job request, it can be either synchronous or asynchronous depending on the operation you use:
3128

32-
When submitting a job with `/runsync` or `/run`, your request must include a JSON object the the key `input`, containing the parameters required by your worker's [handler function](/serverless/workers/handler-functions).
29+
- `/runsync` submits a synchronous job.
30+
- Client waits for the job to complete before returning the result.
31+
- A response is returned as soon as the job is complete.
32+
- Results are available for 1 minute by default (5 minutes max).
33+
- Ideal for quick responses and interactive applications.
34+
- `/run` submits an asynchronous job.
35+
- The job is processed in the background.
36+
- Retrieve the result by sending a `GET` request to the `/status` endpoint.
37+
- Results are available for 30 minutes after completion.
38+
- Ideal for long-running tasks and batch processing.
3339

34-
For example:
40+
## Request input structure
41+
42+
When submitting a job with `/runsync` or `/run`, your request must include a JSON object with the key `input` containing the parameters required by your worker's [handler function](/serverless/workers/handler-functions). For example:
3543

3644
```json
3745
{
@@ -41,7 +49,7 @@ For example:
4149
}
4250
```
4351

44-
The exact parameters inside the `input` object depend on your specific worker implementation. Check your worker's documentation for required and optional parameters.
52+
The exact parameters required in the `input` object depend on your specific worker implementation (e.g. `prompt` commonly used for endpoints serving LLMs, but not all workers accept it). Check your worker's documentation for a list of required and optional parameters.
4553

4654
## Send requests from the console
4755

@@ -81,6 +89,10 @@ Here's a quick overview of the operations available for queue-based endpoints:
8189
| `/purge-queue` | POST | Clear all pending jobs from the queue without affecting jobs already in progress. |
8290
| `/health` | GET | Monitor the operational status of your endpoint, including worker and job statistics. |
8391

92+
<Tip>
93+
If you need to create an endpoint that supports custom API paths, use [load balancing endpoints](/serverless/load-balancing/overview).
94+
</Tip>
95+
8496
## Operation reference
8597

8698
Below you'll find detailed explanations and examples for each operation using `cURL` and the Runpod SDK.
@@ -114,11 +126,23 @@ export ENDPOINT_ID="YOUR_ENDPOINT_ID"
114126

115127
Synchronous jobs wait for completion and return the complete result in a single response. This approach works best for shorter tasks where you need immediate results, interactive applications, and simpler client code without status polling.
116128

117-
* **Payload limit**: 20 MB
118-
* **Job availability**: Results are available for 60 seconds after completion
129+
`/runsync` requests have a maximum payload size of 20 MB.
130+
131+
Results are available for 1 minute by default, but you can append `?wait=x` to the request URL to extend this up to 5 minutes, where `x` is the number of milliseconds to store the results, from 1000 (1 second) to 300000 (5 minutes).
132+
133+
For example, `?wait=120000` will keep your results available for 2 minutes:
134+
135+
```sh
136+
https://api.runpod.ai/v2/$ENDPOINT_ID/runsync?wait=120000
137+
```
138+
139+
<Note>
140+
`?wait` is only available for `cURL` and standard HTTP request libraries.
141+
</Note>
119142

120143
<Tabs>
121144
<Tab title="cURL">
145+
122146
```sh
123147
curl --request POST \
124148
--url https://api.runpod.ai/v2/$ENDPOINT_ID/runsync \
@@ -130,6 +154,7 @@ curl --request POST \
130154
</Tab>
131155

132156
<Tab title="Python">
157+
133158
```python
134159
import runpod
135160
import os
@@ -140,7 +165,7 @@ endpoint = runpod.Endpoint(os.getenv("ENDPOINT_ID"))
140165
try:
141166
run_request = endpoint.run_sync(
142167
{"prompt": "Hello, world!"},
143-
timeout=60, # Timeout in seconds
168+
timeout=60, # Client timeout in seconds
144169
)
145170
print(run_request)
146171
except TimeoutError:
@@ -149,6 +174,7 @@ except TimeoutError:
149174
</Tab>
150175

151176
<Tab title="JavaScript">
177+
152178
```javascript
153179
const { RUNPOD_API_KEY, ENDPOINT_ID } = process.env;
154180
import runpodSdk from "runpod-sdk";
@@ -160,13 +186,16 @@ const result = await endpoint.runSync({
160186
"input": {
161187
"prompt": "Hello, World!",
162188
},
189+
timeout: 60000, // Client timeout in milliseconds
190+
});
163191
});
164192

165193
console.log(result);
166194
```
167195
</Tab>
168196

169197
<Tab title="Go">
198+
170199
```go
171200
package main
172201

@@ -199,7 +228,7 @@ func main() {
199228
"prompt": "Hello World",
200229
},
201230
},
202-
Timeout: sdk.Int(120),
231+
Timeout: sdk.Int(60), // Client timeout in seconds
203232
}
204233

205234
output, err := endpoint.RunSync(&jobInput)
@@ -212,10 +241,9 @@ func main() {
212241
}
213242
```
214243
</Tab>
244+
</Tabs>
215245

216-
<Tab title="Response">
217-
218-
`/runsync` requests return a response as soon as the job is complete:
246+
`/runsync` returns a response as soon as the job is complete:
219247

220248
```json
221249
{
@@ -231,15 +259,14 @@ func main() {
231259
"status": "COMPLETED"
232260
}
233261
```
234-
</Tab>
235-
</Tabs>
236262

237263
### `/run`
238264

239265
Asynchronous jobs process in the background and return immediately with a job ID. This approach works best for longer-running tasks that don't require immediate results, operations requiring significant processing time, and managing multiple concurrent jobs.
240266

241-
* **Payload limit**: 10 MB
242-
* **Job availability**: Results are available for 30 minutes after completion
267+
`/run` requests have a maximum payload size of 10 MB.
268+
269+
Job results are available for 30 minutes after completion.
243270

244271
<Tabs>
245272
<Tab title="cURL">
@@ -341,23 +368,32 @@ func main() {
341368
```
342369
</Tab>
343370

344-
<Tab title="Response">
371+
</Tabs>
372+
373+
`/run` returns a response with the job ID and status:
374+
345375
```json
346376
{
347377
"id": "eaebd6e7-6a92-4bb8-a911-f996ac5ea99d",
348378
"status": "IN_QUEUE"
349379
}
350380
```
351-
</Tab>
352-
</Tabs>
381+
382+
Further results must be retrieved using the `/status` operation.
353383

354384
### `/status`
355385

356-
Check the current state, execution statistics, and results of previously submitted jobs. The status endpoint provides the current job state, execution statistics like queue delay and processing time, and job output if completed.
386+
Check the current state, execution statistics, and results of previously submitted jobs. The status operation provides the current job state, execution statistics like queue delay and processing time, and job output if completed.
387+
388+
<Tip>
389+
You can configure time-to-live (TTL) for individual jobs by appending a TTL parameter to the request URL.
390+
391+
For example, `https://api.runpod.ai/v2/$ENDPOINT_ID/status/YOUR_JOB_ID?ttl=6000` sets the TTL to 6 seconds.
392+
</Tip>
357393

358394
<Tabs>
359395
<Tab title="cURL">
360-
Replace `YOUR_JOB_ID` with the actual job ID you received in the response to the `/run` request.
396+
Replace `YOUR_JOB_ID` with the actual job ID you received in the response to the `/run` operation.
361397

362398
```sh
363399
curl --request GET \
@@ -476,9 +512,9 @@ func main() {
476512
```
477513
</Tab>
478514

479-
<Tab title="Response">
515+
</Tabs>
480516

481-
`/status` requests return a JSON response with the job status (e.g. `IN_QUEUE`, `IN_PROGRESS`, `COMPLETED`, `FAILED`), and an optional `output` field if the job is completed:
517+
`/status` returns a JSON response with the job status (e.g. `IN_QUEUE`, `IN_PROGRESS`, `COMPLETED`, `FAILED`), and an optional `output` field if the job is completed:
482518

483519
```json
484520
{
@@ -493,12 +529,6 @@ func main() {
493529
"status": "COMPLETED"
494530
}
495531
```
496-
</Tab>
497-
</Tabs>
498-
499-
<Tip>
500-
You can configure time-to-live (TTL) for individual jobs by appending a TTL parameter: `https://api.runpod.ai/v2/$ENDPOINT_ID/status/YOUR_JOB_ID?ttl=6000` sets the TTL to 6 seconds.
501-
</Tip>
502532

503533
### `/stream`
504534

@@ -629,7 +659,14 @@ func main() {
629659
```
630660
</Tab>
631661

632-
<Tab title="Response">
662+
</Tabs>
663+
664+
<Info>
665+
The maximum size for a single streamed payload chunk is 1 MB. Larger outputs will be split across multiple chunks.
666+
</Info>
667+
668+
Streaming response format:
669+
633670
```json
634671
[
635672
{
@@ -654,12 +691,6 @@ func main() {
654691
}
655692
]
656693
```
657-
</Tab>
658-
</Tabs>
659-
660-
<Info>
661-
The maximum size for a single streamed payload chunk is 1 MB. Larger outputs will be split across multiple chunks.
662-
</Info>
663694

664695
### `/cancel`
665696

@@ -794,15 +825,18 @@ func main() {
794825
```
795826
</Tab>
796827

797-
<Tab title="Response">
828+
</Tabs>
829+
830+
831+
`/cancel` requests return a JSON response with the status of the cancel operation:
832+
798833
```json
799834
{
800835
"id": "724907fe-7bcc-4e42-998d-52cb93e1421f-u1",
801836
"status": "CANCELLED"
802837
}
803838
```
804-
</Tab>
805-
</Tabs>
839+
806840

807841
### `/retry`
808842

@@ -826,7 +860,7 @@ You'll see the job status updated to `IN_QUEUE` when the job is retried:
826860
```
827861

828862
<Note>
829-
Job results expire after a set period. Asynchronous jobs (`/run`) results are available for 30 minutes, while synchronous jobs (`/runsync`) results are available for 1 minute. Once expired, jobs cannot be retried.
863+
Job results expire after a set period. Asynchronous jobs (`/run`) results are available for 30 minutes, while synchronous jobs (`/runsync`) results are available for 1 minute (up to 5 minutes with `?wait=t`). Once expired, jobs cannot be retried.
830864
</Note>
831865

832866
### `/purge-queue`
@@ -881,7 +915,11 @@ main();
881915
```
882916
</Tab>
883917

884-
<Tab title="Response">
918+
</Tabs>
919+
920+
<Warning>
921+
`/purge-queue` operation only affects jobs waiting in the queue. Jobs already in progress will continue to run.
922+
</Warning>
885923

886924
`/purge-queue` requests return a JSON response with the number of jobs removed from the queue and the status of the purge operation:
887925

@@ -891,12 +929,6 @@ main();
891929
"status": "completed"
892930
}
893931
```
894-
</Tab>
895-
</Tabs>
896-
897-
<Warning>
898-
`/purge-queue` operation only affects jobs waiting in the queue. Jobs already in progress will continue to run.
899-
</Warning>
900932

901933
### `/health`
902934

@@ -940,7 +972,7 @@ console.log(health);
940972
```
941973
</Tab>
942974

943-
<Tab title="Response">
975+
</Tabs>
944976

945977
`/health` requests return a JSON response with the current status of the endpoint, including the number of jobs completed, failed, in progress, in queue, and retried, as well as the status of workers.
946978

@@ -959,8 +991,6 @@ console.log(health);
959991
}
960992
}
961993
```
962-
</Tab>
963-
</Tabs>
964994

965995
## vLLM and OpenAI requests
966996

@@ -1097,14 +1127,4 @@ Here are some common issues and suggested solutions:
10971127
| Rate limiting | Too many requests in short time | Implement backoff strategy, batch requests when possible |
10981128
| Missing results | Results expired | Retrieve results within expiration window (30 min for async, 1 min for sync) |
10991129

1100-
Implementing proper error handling and retry logic will make your integrations more robust and reliable.
1101-
1102-
## Related resources
1103-
1104-
* [Endpoint configurations](/serverless/endpoints/endpoint-configurations)
1105-
* [Python SDK for endpoints](/sdks/python/endpoints)
1106-
* [JavaScript SDK for endpoints](/sdks/javascript/endpoints)
1107-
* [Go SDK for endpoints](/sdks/go/endpoints)
1108-
* [Handler functions](/serverless/workers/handler-functions)
1109-
* [Local testing](/serverless/development/local-testing)
1110-
* [GitHub integration](/serverless/workers/github-integration)
1130+
Implementing proper error handling and retry logic will make your integrations more robust and reliable.

0 commit comments

Comments
 (0)