Skip to content

Conversation

eramongodb
Copy link
Contributor

@eramongodb eramongodb commented Aug 20, 2025

Summary

Resolves CXX-3234. Followup to #1430 and #1445.

This is 4 out of an estimated 7 major PRs which in total are expected to resolve CXX-2745 and CXX-3320

This PR refactors the bsoncxx v_noabi interfaces and implementations to:

  • reduce code duplication between v1 and v_noabi,
  • document-as-code the discrepancies between v1 and v_noabi behavior,
  • indirectly test v1 API using the existing v_noabi test suite, and
  • support v1 <-> v_noabi conversion functions to allow incremental migration from v_noabi to v1.

Important

There should be no API breaking changes in this refactor!

Unlike prior bsoncxx v1 PRs thus far, this PR is the first case where the changes to v_noabi and v1 are not an example of how similar changes would be implemented for v1 and v2, since v1 is a stable ABI namespace. This refactor takes advantage of v_noabi's lack of ABI stability guarantees to reduce code duplication and re-implement v_noabi in terms of v1 as much as possible. However, when v2 API is introduced, v1 will not be able to be refactored to the same extent, although ABI+API compatible refactors would of course still be permitted.

Note

Extending support for BSONCXX_API_OVERRIDE_DEFAULT_ABI is not in scope for this PR. All changes to the v_noabi test suite in this PR are additions only (excluding CXX-2120-related error message assertions), meaning there should be no API breaking changes. Extending support for BSONCXX_API_OVERRIDE_DEFAULT_ABI will necessarily need to be accompanied by corresponding changes to test code to support the API breaking change caused by the changes to root namespace redeclarations.

Per local testing, this PR reduces the ABI footprint for bsoncxx::v_noabi from 326 symbols to 169 (-48.2%). Excluding the symbols corresponding to components refactored by this PR (which have v1 counterparts), remaining symbols include:

  • bsoncxx::v_noabi::builder::*: pending CXX-3275
  • bsoncxx::v_noabi::*_json() + related: out-of-scope for initial v1 implementation.
  • bsoncxx::v_noabi::*view_or_value: to be removed in favor of CXX-1546, CXX-1827, etc.
  • bsoncxx::v_noabi::validate + related: out-of-scope for initial v1 implementation.
  • bsoncxx::v_noabi::vector: out-of-scope for initial v1 implementation.

@eramongodb eramongodb self-assigned this Aug 20, 2025
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Defining error condition mappings for v_noabi::error_code <-> v1::*::errc was considered but rejected due to too much effort and complexity for an API (v_noabi) which we intend to deprecate and remove (in favor of v1). It will also be more reasonable to consider defining mappings between component-specific v1::*::errc error codes and also component-specific v2::*::errc in the future, than to define mappings to a library-wide v_noabi::error_code.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately, there is no way to general-case-extendably arrange for the v_noabi exception class to IS-A the v1 exception class (or for v1 exceptions to inherit v2 exceptions, etc.). As demonstrated by changes in this PR, any v_noabi exception boundaries must manually catch and rethrow potential v1 exceptions as a v_noabi exception instead to honor their documented "exception thrown by <abi>" contract. Users writing against both v_noabi and v1 API will need to catch std::system_error instead of either v_noabi::exception or v1::exception to handle both cases. Perhaps we can/should consider removing these exception classes in favor of throwing generic std::system_error with error codes instead...

Comment on lines +37 to +41
template <bsoncxx::detail::endian = bsoncxx::detail::endian::native>
void construct(float value);

template <bsoncxx::detail::endian = bsoncxx::detail::endian::native>
float convert() const;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new v1/detail/bit.hpp header is used to provide endianness checks. Instead of preprocessor macros, the compile-time conditional branch is implemented as template specializations (hopefully eventually replaced with C++17 if constexpr).

Comment on lines +41 to +42
/// @par Includes
/// - @ref bsoncxx/v1/decimal128-fwd.hpp
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Related to CXX-3119: all v_noabi headers publically document the inclusion of v1 equivalent headers. This is not only due to the refactor, but also to set precedent for v_noabi headers to continue to provide root namespace redeclarations for users who do not require ABI stability.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I may be misunderstanding. Is including the v1 forwarding header needed? Removing the v1 include still compiles.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is not strictly necessary, yet. This is implementing in advance the header structure necessary to support declaring+providing v1 API via v_noabi headers for when root namespace redeclaration will be updated to v_noabi -> v1 (e.g. bsoncxx::decimal128 -> bsoncxx::v1::decimal128).

Comment on lines +17 to +22
#include <bsoncxx/decimal128-fwd.hpp>

//

#include <bsoncxx/v1/decimal128.hpp>
#include <bsoncxx/v1/detail/type_traits.hpp>
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To keep the scope of potential breaking changes minimal, this PR refrains from changing (removing) include directives even when undocumented. Following releases which publically document transitive includes, the next API major release will hopefully take the opportunity to perform a more thorough header hygiene cleanup.

That being said, component headers are reordered so they are always included first before others.

auto const deleter = ptr.get_deleter(); // bson_free_deleter

SECTION("default") {
(void)v_noabi{data, size, deleter}; // Unused.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is to reacquire ownership the released data above for consistency with other test sections.

bool operator!=(types::bson_value::view const& v, element const& elem) {
return !(elem == v);
}
// MSVC: `std::is_constructible<T, Args...>` does not work with using-declared conversion functions to class type...?
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An odd MSVC bug where a using-declared conversion function to a class type does not seem to inherit the explicit specifier. See: https://godbolt.org/z/9jTxMan5W

@@ -313,6 +314,38 @@ TEST_CASE("b_types", "[bsoncxx][v1][types][value]") {
#pragma pop_macro("X")
}

TEST_CASE("get_bson_value", "[bsoncxx][v1][types][value][internal]") {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These test cases were added to document and compare the subtly different requirements concerning empty string behavior by v1's internal implementation vs. what is required by mongocxx's scoped_bson_value utility (which is refactored to use v1::types::value's internal bson_value_t directly) when interfacing with CSFLE API.

Comment on lines +35 to +38
// CSFLE API requires empty strings to be not-null.
if (v.value_type == BSON_TYPE_UTF8 && v.value.v_utf8.str == nullptr) {
v.value.v_utf8.str = static_cast<char*>(bson_malloc0(1u));
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This requirement is both unexpected and, when not satisfied, a major pain to diagnose. This problem manifests as inscrutible CSFLE errors such as:

-------------------------------------------------------------------------------
Corpus
-------------------------------------------------------------------------------
../src/mongocxx/test/v_noabi/client_side_encryption.cpp:1059
...............................................................................

../src/mongocxx/test/v_noabi/client_side_encryption.cpp:979: FAILED:
explicitly with message:
  caught an exception for encrypting an allowed field payload=0,algo=rand: BSON
  type invalid for encryption: generic server error

with stack trace (libmongocrypt 1.13.0, mongoc 912209d, ignoring mocks) pointing here (with bson_type == BSON_TYPE_EOD):

_permitted_for_encryption (src/mongocrypt-ctx-encrypt.c:1739)
explicit_encrypt_init (src/mongocrypt-ctx-encrypt.c:1962)
mongocrypt_ctx_explicit_encrypt_init (src/mongocrypt-ctx-encrypt.c)
_mongoc_crypt_explicit_encrypt (src/libmongoc/src/mongoc/mongoc-crypt.c:1772)
mongoc_client_encryption_encrypt (src/libmongoc/src/mongoc/mongoc-client-side-encryption.c:2671)
mongocxx::v_noabi::client_encryption::impl::encrypt (src/mongocxx/lib/mongocxx/private/client_encryption.hh:111)
mongocxx::v_noabi::client_encryption::encrypt (src/mongocxx/lib/mongocxx/v_noabi/mongocxx/client_encryption:43)
_run_corpus_test (src/mongocxx/lib/mongocxx/v_noabi/mongocxx/client_encryption:971)

There may be insufficient input validation by mongoc, libmongocrypt, or both, but atm I am not sure which may be at fault. For now, this PR assumes it is mongocxx's responsibility and applies the necessary workaround to ensure v_utf8.str != nullptr is always true even for empty strings. This issue is only observable by mongocxx due to accessing bsoncxx::v1::types::value's bson_value_t directly.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

which may be at fault

I expect this is related to this documented quirk in bson_append_utf8:

Due to legacy behavior, passing NULL for value appends a null value, not a UTF8 value.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Confirming: am I correct that this results in no observed behavior change? The previous make_copy_for_libbson appears to also allocate for an empty string.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this should not have any observable behavior changes, since mongoc (and mongocrypt) should not assume null-termination for a bson_value_t UTF-8 with explicit length. The bug described above was due to passing a null pointer despite length == 0 due to the legacy behavior you identified and linked above, which I believe to be the correct root cause. This allocation (formerly unconditional, now only on-demand when passed to CSFLE API) preserves the not-null guarantee to avoid this legacy behavior.

Comment on lines +51 to +56
value v{b_string{bsoncxx::stdx::string_view{}}};
REQUIRE(v.get()->value_type == BSON_TYPE_UTF8);
auto const& v_utf8 = v.get()->value.v_utf8;
CHECK(v_utf8.len == 0u);
CHECK(static_cast<void const*>(v_utf8.str) != nullptr);
CHECK(v_utf8.str[0] == '\0');
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These test cases were added to document and compare the subtly different requirements concerning empty string behavior by v1's internal implementation vs. what is required by mongocxx's scoped_bson_value utility (which is refactored to use v1::types::value's internal bson_value_t directly) when interfacing with CSFLE API.

@eramongodb eramongodb marked this pull request as ready for review August 20, 2025 15:47
@eramongodb eramongodb requested a review from a team as a code owner August 20, 2025 15:47
Copy link
Collaborator

@kevinAlbs kevinAlbs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Posting initial comments.

Comment on lines +42 to +43
v1::document::view _view;
std::size_t _length;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose the current behavior allows passing a larger buffer length:

// Document is 12 bytes { 'x': 1}. Buffer is 13 bytes (extra 0xFF).
std::uint8_t const data[] = {12, 0, 0, 0, 16, 'x', '\0', 1, 0, 0, 0, 0, 0xFF};
auto v = bsoncxx::v_noabi::document::view(data, sizeof(data));
// Returns the passed length:
REQUIRE(v.length() == 13);

Though I would rather discourage it. Iterating quietly fails:

REQUIRE(v.begin() == v.end());

Consider adding a warning to the v_noabi view constructors accepting a length:

/// @warning For backward compatibility, this function does NOT check if @p length equals the embedded length in the BSON bytes.

bson_iter_t iter{};
bson_iter_init_from_data_at_offset(&iter, e.raw(), e.length(), e.offset(), e.keylen());
(void)bson_iter_init_from_data_at_offset(&iter, e.raw(), e.length(), e.offset(), e.keylen());
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I slightly prefer checking the return. I expect the unchecked return could result in quiet failures. Example:

// Document is {'x': 'y'}, except length for 'y' is incorrect.
std::uint8_t const data[] = {
    14, 0, 0, 0,  // Document length.
    2, 'x', '\0', // String and key.
    123, 0, 0, 0, // Bad length for value.
    'y', 0,
    0
};
auto v = bsoncxx::v_noabi::document::view(data, sizeof(data));
REQUIRE(v.begin() == v.end()); // No error indicated!?

Though I would rather fix in a separate PR to avoid including behavior changes with the refactor. Reported: CXX-3333.

Comment on lines +35 to +38
// CSFLE API requires empty strings to be not-null.
if (v.value_type == BSON_TYPE_UTF8 && v.value.v_utf8.str == nullptr) {
v.value.v_utf8.str = static_cast<char*>(bson_malloc0(1u));
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

which may be at fault

I expect this is related to this documented quirk in bson_append_utf8:

Due to legacy behavior, passing NULL for value appends a null value, not a UTF8 value.

Comment on lines +35 to +38
// CSFLE API requires empty strings to be not-null.
if (v.value_type == BSON_TYPE_UTF8 && v.value.v_utf8.str == nullptr) {
v.value.v_utf8.str = static_cast<char*>(bson_malloc0(1u));
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Confirming: am I correct that this results in no observed behavior change? The previous make_copy_for_libbson appears to also allocate for an empty string.

Comment on lines +41 to +42
/// @par Includes
/// - @ref bsoncxx/v1/decimal128-fwd.hpp
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I may be misunderstanding. Is including the v1 forwarding header needed? Removing the v1 include still compiles.

@eramongodb
Copy link
Contributor Author

A closer inspection of abi-compliance-checker's API compatibility report identified some unintended API breaking changes.


First, the return type for v_noabi::types::bson_value::view::get_<type>() functions were incorrectly changed from T const& to T (note: v1's equivalent API returns by-value):

Change: Type of return value has been changed from bsoncxx::v_noabi::types::b_int32 const& to bsoncxx::v_noabi::types::b_int32 of different format.
Effect: Recompilation of a client program may be broken.

This can break existing code in the following manner:

auto example(v_noabi::types::bson_value::view const& v)
  -> v_noabi::types::b_int32 const&
{
  return v.get_int32(); // Before: reference to this->_b_int32.
                        // After: dangling reference to temporary b_int32. :(
}

Therefore, partially reverted the initial changes and refactored the implementation so that get_<type>() can once again returns a T const& without undefined behavior. Because v_noabi::b_<type> classes do not wrap v1::b_<type> classes (which would be an API breaking change due to public data members), v1::types::view cannot be directly used as the underlying representation for v_noabi::types::bson_value::view without reinterpret-casting from the v1::types::view::_b_<type> data member(s) to v_noabi::types::b_<type>, which is not consistently possible for all b_<type> classes (they are not all layout-compatible). Therefore, v_noabi::types::bson_value::view is once again given its own _id + _b_<type> union data members.


The v_noabi::types::bson_value::value ctor parameters were incorrectly changed from T to T const&:

Change: Parameters list has been changed.
Effect: Recompilation of a client program may be broken.

This one is comparatively trivial in both scope and fix. These are simply changed back to T const&.


There are two "private" (and undocumented) hidden friend functions which were overlooked and omitted by the initial changes:

  • bsoncxx::v_noabi::types::bson_value::make_owning_bson(void* internal_value)
  • bsoncxx::v_noabi::oid_compare(oid const& lhs, oid const& rhs)

These are clearly intended to be private API, and I was misled by their declaration in private access scope to interpret they are also publically inaccessible. However, as hidden friend functions, their actual declaration scope is in the enclosing namespace of the class. Therefore, they are just as eligible for ADL as if it were declared in public access scope: the access specifier makes no programmable difference, it is only aesthetic. These functions are therefore vulnerable to Hyrum's Law.

Due to oid_compare() being very useful (which motivated its inclusion in v1 API as v1::oid::compare(v1::oid const& other)), and given it is already an exported symbol, it is likely enough that users may be using this API despite it being "private" and undocumented. Therefore, this "private" hidden friend function is restored, but it remains undocumented do discourage further usage. Instead, a v1-equivalent compare() member will be added in a followup PR to aid in v_noabi -> v1 transitions along with other renamed entities.

On the other hand, make_owning_bson() is much less likely to be currently used by users given it requires a void* parameter, which resolves to a bson_value_t const* (C Driver API). There is no equivalent v1 public API for this operation (which is superceded by v1::types::value's b_type constructors and value::internal::get_bson_value()). it is not an exported symbol either (a la v1's internal API; the value constructor is exported instead). Therefore, this "private" and undocumented hidden friend function remains omitted from this PR's proposed changes.


I do not believe the remaining issues in the API compatibility report should be problematic given our circumstances, as they mostly concern user-provided ctors/dtors and implicit defaulted ctors which are now made trivial (reported as "removed"...?) as well as changes to set of implicit template instantiations and specializations (whose template definitions are visible and/or not owned by us).

There is a lot of noise in the v_noabi API compatibility report due to v1 API being included in the unstable ABI report despite the list of headers to analyze being limited to v_noabi headers. The XML descriptors may need to be updated to better filter results to v_noabi API only.

@eramongodb eramongodb requested a review from kevinAlbs September 9, 2025 17:31
Copy link
Collaborator

@kevinAlbs kevinAlbs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

A closer inspection of abi-compliance-checker's API compatibility report

Great catch. I am happy to see the compatibility tasks helped to identify.

Copy link
Contributor

@vector-of-bool vector-of-bool left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, with only a few very minor comments.

Comment on lines 196 to 198
struct uninit_type {};

oid(uninit_type) : _bytes{} {} // Required by bsoncxx::v_noabi::oid::oid().
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor: rename to zeroinit_type? This value-initiailizes the _bytes to all zeroes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The zero-initialization is meant to be an implementation detail that avoids language-level undefined behavior (in the spirit of -ftrivial-auto-var-init=zero and to avoid triggering static analysis warnings for uninitialized data members), but it's still meant to be library-level undefined behavior to access _bytes after initializing with uninit_type, since the elements are expected to be unconditionally overwritten by v_noabi::oid::oid() anyways. I expect/hope the compiler should be able to see this unconditional write-without-read and optimize accordingly (ignoring exception paths):

v_noabi::oid::oid()
  // Zeroed bytes are never read after this initialization.
  : _oid{v1::oid::internal::make_oid_without_init()}
{
  // Unconditionally overwritten.
  _oid = v1::oid{};
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On second thought, to avoid the "uninit" confusion, adopted the *_for_overwrite suffix from std::make_unique_for_overwrite, which hopefully makes the intention much clearer.

value(std::vector<unsigned char> v, binary_sub_type const sub_type = {});
/* explicit(false) */
value(std::vector<unsigned char> v, v_noabi::binary_sub_type const sub_type = {})
: _value{reinterpret_cast<std::uint8_t const*>(v.data()), v.size(), to_v1(sub_type)} {}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be taking the vector parameter by-value, or by const&?

Comment on lines 74 to +78

template <>
inline void float32::construct<bsoncxx::detail::endian::little>(float value) {
std::memcpy(bytes, &value, sizeof value);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are template specializations required, or can this just be a regular if branch in a single function?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The specializations are to avoid "conditional expression is constant" compiler warnings without needing to use warning suppressions. This should really be an if constexpr, but alas...

Comment on lines +79 to +82
template <typename From, typename To>
struct is_explicitly_convertible : bsoncxx::detail::conjunction<
std::is_constructible<To, From>,
bsoncxx::detail::negation<std::is_convertible<From, To>>> {};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tweak name: implicit-convertbile should imply is_explicit_convertible, but this checks that it is also not implicit-convertible. Maybe is_only_explicitly_convertible?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The purpose of these type traits is to more accurately assert the presence or absence of explicit for a given constructor or conversion function while also avoiding the To vs. From ordering inconsistency (which I frequently mixed up when writing the relevant static assertions prior to defining these helpers):

struct From {};
struct To { /* explicit(false) */ To(From); };

static_assert(std::is_constructible_v<To, From>, "OK...?"); // Implies nothing about `explicit`.
static_assert(std::is_convertible_v<From, To>, "there is no `explicit`");

static_assert(is_explicitly_convertible_v<From, To>, "there is an `explicit`"); // Fails: a useful check!
static_assert(is_implicitly_convertible_v<From, To>, "there is no `explicit`");

I considered naming them has_(implicit|explicit)_ctor, but this would not account for conversion functions (indistinguishable from ctors by usage syntax alone). I also considered has_explicit + has_no_explicit, but then there'd be a no_ in a type trait identifier... 😑

Therefore I ended up with the current names for the best symmetry: is_only_explicitly_convertible would imply an is_explicitly_convertible ("not only explicit")... but that would just be equivalent to std::is_constructible<To, From> (providing no additional value). I don't think we want/need an is_only_implicitly_convertible either (is that even possible?).

That being said, added additional notes to their documentation to clarify that std::is_constructible and std::is_convertible should be used if the additional implications are not desirable.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants