Skip to content

[Feature Request] Global Threadpool in Python API #23523

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
alex-halpin opened this issue Jan 28, 2025 · 5 comments
Open

[Feature Request] Global Threadpool in Python API #23523

alex-halpin opened this issue Jan 28, 2025 · 5 comments
Labels
feature request request for unsupported feature or enhancement

Comments

@alex-halpin
Copy link

Describe the feature request

Expose the ability to utilize a global threadpool for inference sessions in the python API

Describe scenario use case

My current use case requires the instantiation of many (thousands) of small onnx models in memory at once. Doing so causes too many threads to be spawned halting the program. The functionality for a global threadpool exists in the cpp source but is not exposed to the python bindings.

@alex-halpin alex-halpin added the feature request request for unsupported feature or enhancement label Jan 28, 2025
@alex-halpin
Copy link
Author

I have linked a fork that i have tested to be working on my macbook with the following implementation as an example use case

ort.set_global_thread_pool_sizes(64, 64) # new functionality

class OnnxRunner:

    sess_options = ort.SessionOptions()
    sess_options.use_per_session_threads = False # newly exposed to python api

    def __init__(self, model: bytes):
        self.session = ort.InferenceSession(model, sess_options=self.sess_options)

    def predict(self, x: Array):
        x = x.astype("float32")
        y = self.session.run(["output"], {"input": x})

        return y[0]

@yuslepukhin
Copy link
Member

Would you be willing to submit a PR?

@alex-halpin
Copy link
Author

Would you be willing to submit a PR?

yes, i have an open PR here

@khoover
Copy link

khoover commented Mar 21, 2025

+1 to this, we're a similar situation of having thousands of small models that are resident simultaneously, we don't want to have that many thread pools spun up and sitting idle.

@alex-halpin
Copy link
Author

+1 to this, we're a similar situation of having thousands of small models that are resident simultaneously, we don't want to have that many thread pools spun up and sitting idle.

My pr is still open but i unfortunately don't really have the knowledge or bandwidth to push it further at the moment. I threw it together as a POC for my team but we ended up just going with:

inter_op_num_threads =1
intra_op_num threads = 1

in local testing this worked to pin the inference sessions to 1 global shared thread which was sufficient for our use-case.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request request for unsupported feature or enhancement
Projects
None yet
Development

No branches or pull requests

3 participants