Skip to content

Conversation

swolchok
Copy link
Contributor

@swolchok swolchok commented Sep 4, 2025

Description

We don't have access to llvm::SmallVector or similar, but given the limited subset of the std::vector API that
function_call::args{,_convert} need and the "reserve-then-fill" usage pattern, it is relatively straightforward to implement custom containers that get the job done.

Seems to improves time to call the collatz function in pybind/pybind11_benchmark significantly; numbers are a little noisy but there's a clear improvement from "about 60 ns per call" to "about 45 ns per call" on my machine (M4 Max Mac), as measured with timeit.repeat('collatz(4)', 'from pybind11_benchmark import collatz').

Suggested changelog entry:

  • Dispatching functions with 5 or fewer arguments no longer requires a heap allocation for their C++ argument array.

@swolchok swolchok requested a review from henryiii as a code owner September 4, 2025 05:38
@swolchok
Copy link
Contributor Author

swolchok commented Sep 4, 2025

Didn't fully realize pybind11 continues to support C++11 and C++14, hence failing CI jobs. Can remove std::variant, which I think will also save memory and allow inline size increase. Would be nice to have directional review feedback if folks see this before I get that done, though.

@rwgk
Copy link
Collaborator

rwgk commented Sep 4, 2025

Awesome.

Can remove std::variant, which I think will also save memory and allow inline size increase.

That'd be great.

Could you please move struct argument_vector to a separate file under include/pybind11/detail?

WDYT about a more generic name, e.g. hybrid_vector? I think that'd fit in nicely between SmallVector and vector.

You'll have to update the top-level CMakeLists.txt and tests/extra_python_package/test_files.py for the new file in include/pybind11/detail. Look for e.g. dynamic_raw_ptr_cast_if_possible.h to pin-point where to add the new filename.

@swolchok
Copy link
Contributor Author

swolchok commented Sep 4, 2025

more generic name

This was feasible mostly because we can take a bunch of implementation shortcuts knowing that the value type is py::handle, which is trivially copyable and trivially destructible, and knowing that we only have to do a limited subset of the vector interface. I think it would be best to be more specific, not less; I've thought about diverging from the vector interface entirely by requiring that the maximum size be specified up front (mandatory reserve) and not being able to grow in push_back at all, but I haven't done the legwork necessary to make sure that would actually be workable.

@rwgk
Copy link
Collaborator

rwgk commented Sep 4, 2025 via email

…ents

We don't have access to llvm::SmallVector or similar, but given the
limited subset of the `std::vector` API that
`function_call::args{,_convert}` need and the "reserve-then-fill"
usage pattern, it is relatively straightforward to implement custom
containers that get the job done.

Seems to improves time to call the collatz function in
pybind/pybind11_benchmark significantly; numbers are a little noisy
but there's a clear improvement from "about 60 ns per call" to "about
45 ns per call" on my machine (M4 Max Mac), as measured with
`timeit.repeat('collatz(4)', 'from pybind11_benchmark import
collatz')`.
@swolchok swolchok force-pushed the argument-small-vectors branch from 7c09a6b to 9415686 Compare September 4, 2025 19:49
@rwgk
Copy link
Collaborator

rwgk commented Sep 4, 2025

(Please don't force push, because that makes it more difficult for me to follow along. Just keep merging, that works really well for me.)

@swolchok
Copy link
Contributor Author

swolchok commented Sep 4, 2025

Hmm, looks like I managed to somehow break exactly IO redirection on exactly windows-latest but not windows-2022. Breakage includes clang-latest, so it's not an MSVC issue. Nothing jumps out at me as to the cause yet.

@swolchok
Copy link
Contributor Author

swolchok commented Sep 4, 2025

Looks like windows-latest might have changed to windows-2025 very recently. Example job from yesterday where windows-latest meant windows-2022: https://github.com/pybind/pybind11/actions/runs/17443220310/job/49540880643 . GitHub does seem to say that the rollout started 2 days ago, so this is plausible: actions/runner-images#12677

I guess we should check whether the windows-latest workflows are broken on master without any changes?

@swolchok
Copy link
Contributor Author

swolchok commented Sep 4, 2025

check whether the windows-latest workflows are broken on master without any changes?

Sent #5825

Copy link
Collaborator

@rwgk rwgk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some super-quick comments, I can only spend 5 minutes right now. I'll probably get to it over the weekend.

@@ -2045,10 +2046,12 @@ struct function_call {
const function_record &func;

/// Arguments passed to the function:
std::vector<handle> args;
/// (Inline size chosen mostly arbitrarily; 5 should pad function_call out to two cache lines
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

6 should?

Maybe make this a constexpr unsigned argument_vector_small_size = 6; or similar, because you're using the constant in three (at least) places?

if (overloaded) {
// We're in the first no-convert pass, so swap out the conversion flags for a
// set of all-false flags. If the call fails, we'll swap the flags back in for
// the conversion-allowed call below.
second_pass_convert.resize(func.nargs, false);
second_pass_convert = args_convert_vector<6>(func.nargs, false);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add a terse comment to hint that creating a new object is better than some sort of resize semantics?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the reason I chose not to implement resize is that it's more general than we need -- you can resize containers to be smaller or larger and cross (or not cross) the inline size limit in either direction. we can just rewrite this in a way that more or less removes the question, though:

second_pass_convert = std::move(call.args_convert);
call.args_convert = args_convert_vector<argument_vector_small_size>(func.nargs, false);

(call.args_convert is moved-from, so we're not necessarily sure about its state.)

@@ -0,0 +1,84 @@
#include "pybind11/pybind11.h"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe create a new directory, test_low_level or similar?

Putting this in test_embed seems misleading.

@henryiii for opinion

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't qualify for test/pure_cpp because it needs CPython around to compile py::handle, so I put it in test_embed because I was hoping not to have to duplicate the CMake configuration for C++ tests that need CPython around. I'll see if there's another way to do that, like creating a utility CMake file.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what if instead we renamed test_embed?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't qualify for test/pure_cpp
Agreed/realized.

what if instead we renamed test_embed?
I'd be OK with that.

@henryiii?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

renamed test_embed

done

@rwgk
Copy link
Collaborator

rwgk commented Sep 5, 2025

🐍 (macos-latest, 3.14t, -DCMAKE_CXX_STANDARD=20) / 🧪
cancelled now in 5h 55m 47s

It was hanging here:

Run cmake --build build --target cpptest
Change Dir: '/Users/runner/work/pybind11/pybind11/build'
Run Build Command(s): /opt/homebrew/bin/ninja -v cpptest
[0/2] /opt/homebrew/bin/cmake -P /Users/runner/work/pybind11/pybind11/build/CMakeFiles/VerifyGlobs.cmake

I've seen this once before, a few days ago.

Rerun triggered.

@rwgk
Copy link
Collaborator

rwgk commented Sep 8, 2025

Small heads-up: I'll get to fully reviewing this only between Thu-Sun.

How about test_with_catch as the new directory name?

"Low level" doesn't fit too well for the actual embedding tests.

I think this accurately reflects that we're grouping tests together for a technical reason (I checked, all three existing .cpp files include catch.hpp).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants