Skip to content

Commit 7f21b67

Browse files
feat: add BLS decoupled response iterator's cancel() method for the request cancellation (#398)
Adding a cancel() method to the BLS decoupled response iterator, so that it will be able to cancel the Triton server inference request corresponding to the response iterator if the stub process gets the enough response from the response iterator. Due to each stub InferenceRequest object can create multiple BLS Triton Server inference requests, so adding cancel() to the response iterator would be more feasible to manage cancelling individual request rather than cancelling all requests generated with the stub InferenceRequest object. More details can be found in the change of the README.md
1 parent 1b797d6 commit 7f21b67

18 files changed

+405
-19
lines changed

Diff for: .gitignore

+1
Original file line numberDiff line numberDiff line change
@@ -140,3 +140,4 @@ dmypy.json
140140

141141
# vscode
142142
.vscode/settings.json
143+
.vscode/c_cpp_properties.json

Diff for: CMakeLists.txt

+2
Original file line numberDiff line numberDiff line change
@@ -241,6 +241,8 @@ set(
241241
src/pb_response_iterator.cc
242242
src/pb_cancel.cc
243243
src/pb_cancel.h
244+
src/pb_bls_cancel.cc
245+
src/pb_bls_cancel.h
244246
)
245247

246248
list(APPEND

Diff for: README.md

+40-2
Original file line numberDiff line numberDiff line change
@@ -1409,14 +1409,52 @@ class TritonPythonModel:
14091409
A complete example for sync and async BLS for decoupled models is included in
14101410
the [Examples](#examples) section.
14111411

1412+
Note: Async BLS is not supported on Python 3.6 or lower due to the `async`
1413+
keyword and `asyncio.run` being introduced in Python 3.7.
1414+
14121415
Starting from the 22.04 release, the lifetime of the BLS output tensors have
14131416
been improved such that if a tensor is no longer needed in your Python model it
14141417
will be automatically deallocated. This can increase the number of BLS requests
14151418
that you can execute in your model without running into the out of GPU or
14161419
shared memory error.
14171420

1418-
Note: Async BLS is not supported on Python 3.6 or lower due to the `async`
1419-
keyword and `asyncio.run` being introduced in Python 3.7.
1421+
### Cancelling decoupled BLS requests
1422+
A decoupled BLS inference request may be cancelled by calling the `cancel()`
1423+
method on the response iterator returned from the method executing the BLS
1424+
inference request. For example,
1425+
1426+
```python
1427+
import triton_python_backend_utils as pb_utils
1428+
1429+
class TritonPythonModel:
1430+
...
1431+
def execute(self, requests):
1432+
...
1433+
bls_response_iterator = bls_request.exec(decoupled=True)
1434+
...
1435+
bls_response_iterator.cancel()
1436+
...
1437+
```
1438+
1439+
You may also call the `cancel()` method on the response iterator returned from
1440+
the `async_exec()` method of the inference request. For example,
1441+
1442+
```python
1443+
import triton_python_backend_utils as pb_utils
1444+
1445+
class TritonPythonModel:
1446+
...
1447+
async def execute(self, requests):
1448+
...
1449+
bls_response_iterator = await bls_request.async_exec(decoupled=True)
1450+
...
1451+
bls_response_iterator.cancel()
1452+
...
1453+
```
1454+
1455+
Note: Whether the decoupled model returns a cancellation error and stops executing
1456+
the request depends on the model's backend implementation. Please refer to the
1457+
documentation for more details [Handing in Backend](https://github.com/triton-inference-server/server/blob/main/docs/user_guide/request_cancellation.md#handling-in-backend)
14201458

14211459
## Model Loading API
14221460

Diff for: src/infer_payload.cc

+30-2
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
// Copyright 2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
1+
// Copyright 2023-2025, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
22
//
33
// Redistribution and use in source and binary forms, with or without
44
// modification, are permitted provided that the following conditions
@@ -31,7 +31,8 @@ namespace triton { namespace backend { namespace python {
3131
InferPayload::InferPayload(
3232
const bool is_decoupled,
3333
std::function<void(std::unique_ptr<InferResponse>)> callback)
34-
: is_decoupled_(is_decoupled), is_promise_set_(false), callback_(callback)
34+
: is_decoupled_(is_decoupled), is_promise_set_(false), callback_(callback),
35+
request_address_(reinterpret_cast<intptr_t>(nullptr))
3536
{
3637
promise_.reset(new std::promise<std::unique_ptr<InferResponse>>());
3738
}
@@ -91,4 +92,31 @@ InferPayload::ResponseAllocUserp()
9192
return response_alloc_userp_;
9293
}
9394

95+
void
96+
InferPayload::SetRequestAddress(intptr_t request_address)
97+
{
98+
std::unique_lock<std::mutex> lock(request_address_mutex_);
99+
request_address_ = request_address;
100+
}
101+
102+
void
103+
InferPayload::SetRequestCancellationFunc(
104+
const std::function<void(intptr_t)>& request_cancel_func)
105+
{
106+
request_cancel_func_ = request_cancel_func;
107+
}
108+
109+
void
110+
InferPayload::SafeCancelRequest()
111+
{
112+
std::unique_lock<std::mutex> lock(request_address_mutex_);
113+
if (request_address_ == 0L) {
114+
return;
115+
}
116+
117+
if (request_cancel_func_) {
118+
request_cancel_func_(request_address_);
119+
}
120+
}
121+
94122
}}} // namespace triton::backend::python

Diff for: src/infer_payload.h

+8-1
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
// Copyright 2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
1+
// Copyright 2023-2025, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
22
//
33
// Redistribution and use in source and binary forms, with or without
44
// modification, are permitted provided that the following conditions
@@ -62,6 +62,10 @@ class InferPayload : public std::enable_shared_from_this<InferPayload> {
6262
void SetResponseAllocUserp(
6363
const ResponseAllocatorUserp& response_alloc_userp);
6464
std::shared_ptr<ResponseAllocatorUserp> ResponseAllocUserp();
65+
void SetRequestAddress(intptr_t request_address);
66+
void SetRequestCancellationFunc(
67+
const std::function<void(intptr_t)>& request_cancel_func);
68+
void SafeCancelRequest();
6569

6670
private:
6771
std::unique_ptr<std::promise<std::unique_ptr<InferResponse>>> promise_;
@@ -70,6 +74,9 @@ class InferPayload : public std::enable_shared_from_this<InferPayload> {
7074
bool is_promise_set_;
7175
std::function<void(std::unique_ptr<InferResponse>)> callback_;
7276
std::shared_ptr<ResponseAllocatorUserp> response_alloc_userp_;
77+
std::mutex request_address_mutex_;
78+
intptr_t request_address_;
79+
std::function<void(intptr_t)> request_cancel_func_;
7380
};
7481

7582
}}} // namespace triton::backend::python

Diff for: src/infer_response.cc

+1-1
Original file line numberDiff line numberDiff line change
@@ -91,6 +91,7 @@ InferResponse::SaveToSharedMemory(
9191
response_shm_ptr->is_error_set = false;
9292
shm_handle_ = response_shm_.handle_;
9393
response_shm_ptr->is_last_response = is_last_response_;
94+
response_shm_ptr->id = id_;
9495

9596
// Only save the output tensors to shared memory when the inference response
9697
// doesn't have error.
@@ -113,7 +114,6 @@ InferResponse::SaveToSharedMemory(
113114
tensor_handle_shm_ptr[j] = output_tensor->ShmHandle();
114115
j++;
115116
}
116-
response_shm_ptr->id = id_;
117117

118118
parameters_shm_ = PbString::Create(shm_pool, parameters_);
119119
response_shm_ptr->parameters = parameters_shm_->ShmHandle();

Diff for: src/ipc_message.h

+3-2
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
// Copyright 2021-2024, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
1+
// Copyright 2021-2025, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
22
//
33
// Redistribution and use in source and binary forms, with or without
44
// modification, are permitted provided that the following conditions
@@ -67,7 +67,8 @@ typedef enum PYTHONSTUB_commandtype_enum {
6767
PYTHONSTUB_LoadModelRequest,
6868
PYTHONSTUB_UnloadModelRequest,
6969
PYTHONSTUB_ModelReadinessRequest,
70-
PYTHONSTUB_IsRequestCancelled
70+
PYTHONSTUB_IsRequestCancelled,
71+
PYTHONSTUB_CancelBLSInferRequest
7172
} PYTHONSTUB_CommandType;
7273

7374
///

Diff for: src/pb_bls_cancel.cc

+92
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,92 @@
1+
// Copyright 2025, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
2+
//
3+
// Redistribution and use in source and binary forms, with or without
4+
// modification, are permitted provided that the following conditions
5+
// are met:
6+
// * Redistributions of source code must retain the above copyright
7+
// notice, this list of conditions and the following disclaimer.
8+
// * Redistributions in binary form must reproduce the above copyright
9+
// notice, this list of conditions and the following disclaimer in the
10+
// documentation and/or other materials provided with the distribution.
11+
// * Neither the name of NVIDIA CORPORATION nor the names of its
12+
// contributors may be used to endorse or promote products derived
13+
// from this software without specific prior written permission.
14+
//
15+
// THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS ``AS IS'' AND ANY
16+
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
17+
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
18+
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR
19+
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
20+
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
21+
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
22+
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY
23+
// OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
24+
// (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
25+
// OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
26+
27+
#include "pb_bls_cancel.h"
28+
29+
#include "pb_stub.h"
30+
31+
namespace triton { namespace backend { namespace python {
32+
33+
void
34+
PbBLSCancel::SaveToSharedMemory(std::unique_ptr<SharedMemoryManager>& shm_pool)
35+
{
36+
cancel_shm_ = shm_pool->Construct<CancelBLSRequestMessage>();
37+
new (&(cancel_shm_.data_->mu)) bi::interprocess_mutex;
38+
new (&(cancel_shm_.data_->cv)) bi::interprocess_condition;
39+
cancel_shm_.data_->waiting_on_stub = false;
40+
cancel_shm_.data_->infer_payload_id = infer_playload_id_;
41+
cancel_shm_.data_->is_cancelled = is_cancelled_;
42+
}
43+
44+
bi::managed_external_buffer::handle_t
45+
PbBLSCancel::ShmHandle()
46+
{
47+
return cancel_shm_.handle_;
48+
}
49+
50+
CancelBLSRequestMessage*
51+
PbBLSCancel::ShmPayload()
52+
{
53+
return cancel_shm_.data_.get();
54+
}
55+
56+
void
57+
PbBLSCancel::Cancel()
58+
{
59+
// Release the GIL. Python objects are not accessed during the check.
60+
py::gil_scoped_release gil_release;
61+
62+
std::unique_lock<std::mutex> lk(mu_);
63+
// The cancelled flag can only move from false to true, not the other way, so
64+
// it is checked on each query until cancelled and then implicitly cached.
65+
if (is_cancelled_) {
66+
return;
67+
}
68+
if (!updating_) {
69+
std::unique_ptr<Stub>& stub = Stub::GetOrCreateInstance();
70+
if (!stub->StubToParentServiceActive()) {
71+
LOG_ERROR << "Cannot communicate with parent service";
72+
return;
73+
}
74+
75+
stub->EnqueueCancelBLSRequest(this);
76+
updating_ = true;
77+
}
78+
cv_.wait(lk, [this] { return !updating_; });
79+
}
80+
81+
void
82+
PbBLSCancel::ReportIsCancelled(bool is_cancelled)
83+
{
84+
{
85+
std::lock_guard<std::mutex> lk(mu_);
86+
is_cancelled_ = is_cancelled;
87+
updating_ = false;
88+
}
89+
cv_.notify_all();
90+
}
91+
92+
}}} // namespace triton::backend::python

Diff for: src/pb_bls_cancel.h

+63
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,63 @@
1+
// Copyright 2025, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
2+
//
3+
// Redistribution and use in source and binary forms, with or without
4+
// modification, are permitted provided that the following conditions
5+
// are met:
6+
// * Redistributions of source code must retain the above copyright
7+
// notice, this list of conditions and the following disclaimer.
8+
// * Redistributions in binary form must reproduce the above copyright
9+
// notice, this list of conditions and the following disclaimer in the
10+
// documentation and/or other materials provided with the distribution.
11+
// * Neither the name of NVIDIA CORPORATION nor the names of its
12+
// contributors may be used to endorse or promote products derived
13+
// from this software without specific prior written permission.
14+
//
15+
// THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS ``AS IS'' AND ANY
16+
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
17+
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
18+
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR
19+
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
20+
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
21+
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
22+
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY
23+
// OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
24+
// (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
25+
// OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
26+
27+
#pragma once
28+
29+
#include <condition_variable>
30+
#include <mutex>
31+
32+
#include "pb_utils.h"
33+
34+
namespace triton { namespace backend { namespace python {
35+
36+
class PbBLSCancel {
37+
public:
38+
PbBLSCancel(void* infer_playload_id)
39+
: updating_(false), infer_playload_id_(infer_playload_id),
40+
is_cancelled_(false)
41+
{
42+
}
43+
DISALLOW_COPY_AND_ASSIGN(PbBLSCancel);
44+
45+
void SaveToSharedMemory(std::unique_ptr<SharedMemoryManager>& shm_pool);
46+
bi::managed_external_buffer::handle_t ShmHandle();
47+
CancelBLSRequestMessage* ShmPayload();
48+
49+
void Cancel();
50+
void ReportIsCancelled(bool is_cancelled);
51+
52+
private:
53+
AllocatedSharedMemory<CancelBLSRequestMessage> cancel_shm_;
54+
55+
std::mutex mu_;
56+
std::condition_variable cv_;
57+
bool updating_;
58+
59+
void* infer_playload_id_;
60+
bool is_cancelled_;
61+
};
62+
63+
}}}; // namespace triton::backend::python

Diff for: src/pb_response_iterator.cc

+10-1
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
// Copyright 2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
1+
// Copyright 2023-2025, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
22
//
33
// Redistribution and use in source and binary forms, with or without
44
// modification, are permitted provided that the following conditions
@@ -40,6 +40,7 @@ ResponseIterator::ResponseIterator(
4040
: id_(response->Id()), is_finished_(false), is_cleared_(false), idx_(0)
4141
{
4242
response_buffer_.push(response);
43+
pb_bls_cancel_ = std::make_shared<PbBLSCancel>(response->Id());
4344
}
4445

4546
ResponseIterator::~ResponseIterator()
@@ -159,4 +160,12 @@ ResponseIterator::GetExistingResponses()
159160
return responses;
160161
}
161162

163+
void
164+
ResponseIterator::Cancel()
165+
{
166+
if (!is_finished_) {
167+
pb_bls_cancel_->Cancel();
168+
}
169+
}
170+
162171
}}} // namespace triton::backend::python

Diff for: src/pb_response_iterator.h

+4-1
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
// Copyright 2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
1+
// Copyright 2023-2025, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
22
//
33
// Redistribution and use in source and binary forms, with or without
44
// modification, are permitted provided that the following conditions
@@ -29,6 +29,7 @@
2929
#include <queue>
3030

3131
#include "infer_response.h"
32+
#include "pb_bls_cancel.h"
3233

3334
namespace triton { namespace backend { namespace python {
3435

@@ -43,6 +44,7 @@ class ResponseIterator {
4344
void* Id();
4445
void Clear();
4546
std::vector<std::shared_ptr<InferResponse>> GetExistingResponses();
47+
void Cancel();
4648

4749
private:
4850
std::vector<std::shared_ptr<InferResponse>> responses_;
@@ -53,6 +55,7 @@ class ResponseIterator {
5355
bool is_finished_;
5456
bool is_cleared_;
5557
size_t idx_;
58+
std::shared_ptr<PbBLSCancel> pb_bls_cancel_;
5659
};
5760

5861
}}} // namespace triton::backend::python

0 commit comments

Comments
 (0)