You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: serverless/endpoints/send-requests.mdx
+89-69Lines changed: 89 additions & 69 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -12,26 +12,34 @@ Serverless endpoints provide synchronous and asynchronous job processing with au
12
12
13
13
## How requests work
14
14
15
-
After creating a Serverless [endpoint](/serverless/endpoints/overview), you can start sending it **requests** to submit jobs and retrieve results. A request can include parameters, payloads, and headers that define what the endpoint should process. For example, you can send a `POST` request to submit a job, or a `GET` request to check status of a job, retrieve results, or check endpoint health.
15
+
After creating a Serverless [endpoint](/serverless/endpoints/overview), you can start sending it **requests** to submit jobs and retrieve results.
16
16
17
-
A **job** is a unit of work containing the input data from the request, packaged for processing by your [workers](/serverless/workers/overview). If no worker is immediately available, the job is queued. Once a worker is available, the job is processed by the worker using your [handler function](/serverless/workers/handler-functions).
17
+
A request can include parameters, payloads, and headers that define what the endpoint should process. For example, you can send a `POST` request to submit a job, or a `GET` request to check status of a job, retrieve results, or check endpoint health.
18
18
19
-
When you submit a job request, it can be either synchronous or asynchronous depending on the operation you use:
19
+
A **job** is a unit of work containing the input data from the request, packaged for processing by your [workers](/serverless/workers/overview).
20
20
21
-
-`/runsync` submits a synchronous job. A response is returned as soon as the job is complete.
22
-
-`/run` submits an asynchronous job. The job is processed in the background, and you can retrieve the result by sending a `GET` request to the `/status` endpoint.
21
+
If no worker is immediately available, the job is queued. Once a worker is available, the job is processed using your worker's [handler function](/serverless/workers/handler-functions).
23
22
24
-
Queue-based endpoints provide a fixed set of operations for submitting and managing jobs. You can find a full list of operations and examples in the [sections below](/serverless/endpoints/send-requests#operation-overview).
23
+
Queue-based endpoints provide a fixed set of operations for submitting and managing jobs. You can find a full list of operations and sample code in the [sections below](/serverless/endpoints/send-requests#operation-overview).
25
24
26
-
<Tip>
27
-
If you need to create an endpoint that supports custom API paths, use [load balancing endpoints](/serverless/load-balancing/overview).
28
-
</Tip>
25
+
## Sync vs. async
29
26
30
-
## Request input structure
27
+
When you submit a job request, it can be either synchronous or asynchronous depending on the operation you use:
31
28
32
-
When submitting a job with `/runsync` or `/run`, your request must include a JSON object the the key `input`, containing the parameters required by your worker's [handler function](/serverless/workers/handler-functions).
29
+
-`/runsync` submits a synchronous job.
30
+
- Client waits for the job to complete before returning the result.
31
+
- A response is returned as soon as the job is complete.
32
+
- Results are available for 1 minute by default (5 minutes max).
33
+
- Ideal for quick responses and interactive applications.
34
+
-`/run` submits an asynchronous job.
35
+
- The job is processed in the background.
36
+
- Retrieve the result by sending a `GET` request to the `/status` endpoint.
37
+
- Results are available for 30 minutes after completion.
38
+
- Ideal for long-running tasks and batch processing.
33
39
34
-
For example:
40
+
## Request input structure
41
+
42
+
When submitting a job with `/runsync` or `/run`, your request must include a JSON object with the key `input` containing the parameters required by your worker's [handler function](/serverless/workers/handler-functions). For example:
35
43
36
44
```json
37
45
{
@@ -41,7 +49,7 @@ For example:
41
49
}
42
50
```
43
51
44
-
The exact parameters inside the `input` object depend on your specific worker implementation. Check your worker's documentation for required and optional parameters.
52
+
The exact parameters required in the `input` object depend on your specific worker implementation (e.g. `prompt` commonly used for endpoints serving LLMs, but not all workers accept it). Check your worker's documentation for a list of required and optional parameters.
45
53
46
54
## Send requests from the console
47
55
@@ -81,6 +89,10 @@ Here's a quick overview of the operations available for queue-based endpoints:
81
89
|`/purge-queue`| POST | Clear all pending jobs from the queue without affecting jobs already in progress. |
82
90
|`/health`| GET | Monitor the operational status of your endpoint, including worker and job statistics. |
83
91
92
+
<Tip>
93
+
If you need to create an endpoint that supports custom API paths, use [load balancing endpoints](/serverless/load-balancing/overview).
94
+
</Tip>
95
+
84
96
## Operation reference
85
97
86
98
Below you'll find detailed explanations and examples for each operation using `cURL` and the Runpod SDK.
Synchronous jobs wait for completion and return the complete result in a single response. This approach works best for shorter tasks where you need immediate results, interactive applications, and simpler client code without status polling.
116
128
117
-
***Payload limit**: 20 MB
118
-
***Job availability**: Results are available for 60 seconds after completion
129
+
`/runsync` requests have a maximum payload size of 20 MB.
130
+
131
+
Results are available for 1 minute by default, but you can append `?wait=x` to the request URL to extend this up to 5 minutes, where `x` is the number of milliseconds to store the results, from 1000 (1 second) to 300000 (5 minutes).
132
+
133
+
For example, `?wait=120000` will keep your results available for 2 minutes:
@@ -160,13 +186,16 @@ const result = await endpoint.runSync({
160
186
"input": {
161
187
"prompt":"Hello, World!",
162
188
},
189
+
timeout:60000, // Client timeout in milliseconds
190
+
});
163
191
});
164
192
165
193
console.log(result);
166
194
```
167
195
</Tab>
168
196
169
197
<Tabtitle="Go">
198
+
170
199
```go
171
200
package main
172
201
@@ -199,7 +228,7 @@ func main() {
199
228
"prompt": "Hello World",
200
229
},
201
230
},
202
-
Timeout: sdk.Int(120),
231
+
Timeout: sdk.Int(60), // Client timeout in seconds
203
232
}
204
233
205
234
output, err:= endpoint.RunSync(&jobInput)
@@ -212,10 +241,9 @@ func main() {
212
241
}
213
242
```
214
243
</Tab>
244
+
</Tabs>
215
245
216
-
<Tabtitle="Response">
217
-
218
-
`/runsync` requests return a response as soon as the job is complete:
246
+
`/runsync` returns a response as soon as the job is complete:
219
247
220
248
```json
221
249
{
@@ -231,15 +259,14 @@ func main() {
231
259
"status": "COMPLETED"
232
260
}
233
261
```
234
-
</Tab>
235
-
</Tabs>
236
262
237
263
### `/run`
238
264
239
265
Asynchronous jobs process in the background and return immediately with a job ID. This approach works best for longer-running tasks that don't require immediate results, operations requiring significant processing time, and managing multiple concurrent jobs.
240
266
241
-
***Payload limit**: 10 MB
242
-
***Job availability**: Results are available for 30 minutes after completion
267
+
`/run` requests have a maximum payload size of 10 MB.
268
+
269
+
Job results are available for 30 minutes after completion.
243
270
244
271
<Tabs>
245
272
<Tabtitle="cURL">
@@ -341,23 +368,32 @@ func main() {
341
368
```
342
369
</Tab>
343
370
344
-
<Tabtitle="Response">
371
+
</Tabs>
372
+
373
+
`/run` returns a response with the job ID and status:
374
+
345
375
```json
346
376
{
347
377
"id": "eaebd6e7-6a92-4bb8-a911-f996ac5ea99d",
348
378
"status": "IN_QUEUE"
349
379
}
350
380
```
351
-
</Tab>
352
-
</Tabs>
381
+
382
+
Further results must be retrieved using the `/status` operation.
353
383
354
384
### `/status`
355
385
356
-
Check the current state, execution statistics, and results of previously submitted jobs. The status endpoint provides the current job state, execution statistics like queue delay and processing time, and job output if completed.
386
+
Check the current state, execution statistics, and results of previously submitted jobs. The status operation provides the current job state, execution statistics like queue delay and processing time, and job output if completed.
387
+
388
+
<Tip>
389
+
You can configure time-to-live (TTL) for individual jobs by appending a TTL parameter to the request URL.
390
+
391
+
For example, `https://api.runpod.ai/v2/$ENDPOINT_ID/status/YOUR_JOB_ID?ttl=6000` sets the TTL to 6 seconds.
392
+
</Tip>
357
393
358
394
<Tabs>
359
395
<Tabtitle="cURL">
360
-
Replace `YOUR_JOB_ID` with the actual job ID you received in the response to the `/run`request.
396
+
Replace `YOUR_JOB_ID` with the actual job ID you received in the response to the `/run`operation.
361
397
362
398
```sh
363
399
curl --request GET \
@@ -476,9 +512,9 @@ func main() {
476
512
```
477
513
</Tab>
478
514
479
-
<Tabtitle="Response">
515
+
</Tabs>
480
516
481
-
`/status`requests return a JSON response with the job status (e.g. `IN_QUEUE`, `IN_PROGRESS`, `COMPLETED`, `FAILED`), and an optional `output` field if the job is completed:
517
+
`/status`returns a JSON response with the job status (e.g. `IN_QUEUE`, `IN_PROGRESS`, `COMPLETED`, `FAILED`), and an optional `output` field if the job is completed:
482
518
483
519
```json
484
520
{
@@ -493,12 +529,6 @@ func main() {
493
529
"status": "COMPLETED"
494
530
}
495
531
```
496
-
</Tab>
497
-
</Tabs>
498
-
499
-
<Tip>
500
-
You can configure time-to-live (TTL) for individual jobs by appending a TTL parameter: `https://api.runpod.ai/v2/$ENDPOINT_ID/status/YOUR_JOB_ID?ttl=6000` sets the TTL to 6 seconds.
501
-
</Tip>
502
532
503
533
### `/stream`
504
534
@@ -629,7 +659,14 @@ func main() {
629
659
```
630
660
</Tab>
631
661
632
-
<Tab title="Response">
662
+
</Tabs>
663
+
664
+
<Info>
665
+
The maximum size for a single streamed payload chunk is 1MB. Larger outputs will be split across multiple chunks.
666
+
</Info>
667
+
668
+
Streaming response format:
669
+
633
670
```json
634
671
[
635
672
{
@@ -654,12 +691,6 @@ func main() {
654
691
}
655
692
]
656
693
```
657
-
</Tab>
658
-
</Tabs>
659
-
660
-
<Info>
661
-
The maximum size for a single streamed payload chunk is 1MB. Larger outputs will be split across multiple chunks.
662
-
</Info>
663
694
664
695
### `/cancel`
665
696
@@ -794,15 +825,18 @@ func main() {
794
825
```
795
826
</Tab>
796
827
797
-
<Tabtitle="Response">
828
+
</Tabs>
829
+
830
+
831
+
`/cancel` requests return a JSON response with the status of the cancel operation:
832
+
798
833
```json
799
834
{
800
835
"id": "724907fe-7bcc-4e42-998d-52cb93e1421f-u1",
801
836
"status": "CANCELLED"
802
837
}
803
838
```
804
-
</Tab>
805
-
</Tabs>
839
+
806
840
807
841
### `/retry`
808
842
@@ -826,7 +860,7 @@ You'll see the job status updated to `IN_QUEUE` when the job is retried:
826
860
```
827
861
828
862
<Note>
829
-
Job results expire after a set period. Asynchronous jobs (`/run`) results are available for 30 minutes, while synchronous jobs (`/runsync`) results are available for 1 minute. Once expired, jobs cannot be retried.
863
+
Job results expire after a set period. Asynchronous jobs (`/run`) results are available for 30 minutes, while synchronous jobs (`/runsync`) results are available for 1 minute (up to 5 minutes with `?wait=t`). Once expired, jobs cannot be retried.
830
864
</Note>
831
865
832
866
### `/purge-queue`
@@ -881,7 +915,11 @@ main();
881
915
```
882
916
</Tab>
883
917
884
-
<Tabtitle="Response">
918
+
</Tabs>
919
+
920
+
<Warning>
921
+
`/purge-queue` operation only affects jobs waiting in the queue. Jobs already in progress will continue to run.
922
+
</Warning>
885
923
886
924
`/purge-queue` requests return a JSON response with the number of jobs removed from the queue and the status of the purge operation:
887
925
@@ -891,12 +929,6 @@ main();
891
929
"status": "completed"
892
930
}
893
931
```
894
-
</Tab>
895
-
</Tabs>
896
-
897
-
<Warning>
898
-
`/purge-queue` operation only affects jobs waiting in the queue. Jobs already in progress will continue to run.
899
-
</Warning>
900
932
901
933
### `/health`
902
934
@@ -940,7 +972,7 @@ console.log(health);
940
972
```
941
973
</Tab>
942
974
943
-
<Tabtitle="Response">
975
+
</Tabs>
944
976
945
977
`/health` requests return a JSON response with the current status of the endpoint, including the number of jobs completed, failed, in progress, in queue, and retried, as well as the status of workers.
946
978
@@ -959,8 +991,6 @@ console.log(health);
959
991
}
960
992
}
961
993
```
962
-
</Tab>
963
-
</Tabs>
964
994
965
995
## vLLM and OpenAI requests
966
996
@@ -1097,14 +1127,4 @@ Here are some common issues and suggested solutions:
1097
1127
| Rate limiting | Too many requests in short time | Implement backoff strategy, batch requests when possible |
1098
1128
| Missing results | Results expired | Retrieve results within expiration window (30 min for async, 1 min for sync) |
1099
1129
1100
-
Implementing proper error handling and retry logic will make your integrations more robust and reliable.
0 commit comments