Skip to content

Commit 0298e58

Browse files
authored
[libc++] Optimize input_iterator-pair insert for std::vector (#113768)
As a follow-up to #113852, this PR optimizes the performance of the `insert(const_iterator pos, InputIt first, InputIt last)` function for `input_iterator`-pair inputs in `std::vector` for cases where reallocation occurs during insertion. Additionally, this optimization enhances exception safety by replacing the traditional `try-catch` mechanism with a modern exception guard for the `insert` function. The optimization targets cases where insertion trigger reallocation. In scenarios without reallocation, the implementation remains unchanged. Previous implementation ----------------------- The previous implementation of `insert` is inefficient in reallocation scenarios because it performs the following steps separately: - `reserve()`: This leads to the first round of relocating old elements to new memory; - `rotate()`: This leads to the second round of reorganizing the existing elements; - Move-forward: Moves the elements after the insertion position to their final positions. - Insert: performs the actual insertion. This approach results in a lot of redundant operations, requiring the elements to undergo three rounds of relocations/reorganizations to be placed in their final positions. Proposed implementation ----------------------- The proposed implementation jointly optimize the above 4 steps in the previous implementation such that each element is placed in its final position in just one round of relocation. Specifically, this optimization reduces the total cost from 2 relocations + 1 std::rotate call to just 1 relocation, without needing to call `std::rotate`, thereby significantly improving overall performance.
1 parent 091adb8 commit 0298e58

File tree

6 files changed

+291
-185
lines changed

6 files changed

+291
-185
lines changed

libcxx/docs/ReleaseNotes/20.rst

+7-3
Original file line numberDiff line numberDiff line change
@@ -69,9 +69,13 @@ Improvements and New Features
6969
- The ``_LIBCPP_ABI_BOUNDED_ITERATORS_IN_STD_ARRAY`` ABI configuration was added, which allows storing valid bounds
7070
in ``std::array::iterator`` and detecting OOB accesses when the appropriate hardening mode is enabled.
7171

72-
- The input iterator overload of `assign(_InputIterator, _InputIterator)` in `std::vector<_Tp, _Allocator>` has been
73-
optimized, resulting in a performance improvement of up to 2x for trivial element types (e.g., `std::vector<int>`),
74-
and up to 3.4x for non-trivial element types (e.g., `std::vector<std::vector<int>>`).
72+
- The ``input_iterator``-pair overload of ``void assign(InputIt, InputIt)`` has been optimized for ``std::vector``,
73+
resulting in a performance improvement of up to 2x for trivial element types (e.g., ``std::vector<int>``), and up
74+
to 3.4x for non-trivial element types (e.g., ``std::vector<std::vector<int>>``).
75+
76+
- The ``input_iterator``-pair overload of ``iterator insert(const_iterator, InputIt, InputIt)`` has been optimized
77+
for ``std::vector``, resulting in a performance improvement of up to 10x for ``std::vector<int>``, and up to 2.3x
78+
for ``std::vector<std::vector<int>>``.
7579

7680
- On Windows, ``<system_error>``'s ``std::system_category`` is now distinct from ``std::generic_category``. The behavior
7781
on other operating systems is unchanged.

libcxx/include/__vector/vector.h

+22-22
Original file line numberDiff line numberDiff line change
@@ -1250,30 +1250,30 @@ vector<_Tp, _Allocator>::__insert_with_sentinel(const_iterator __position, _Inpu
12501250
difference_type __off = __position - begin();
12511251
pointer __p = this->__begin_ + __off;
12521252
pointer __old_last = this->__end_;
1253-
for (; this->__end_ != this->__cap_ && __first != __last; ++__first) {
1253+
for (; this->__end_ != this->__cap_ && __first != __last; ++__first)
12541254
__construct_one_at_end(*__first);
1255+
1256+
if (__first == __last)
1257+
(void)std::rotate(__p, __old_last, this->__end_);
1258+
else {
1259+
__split_buffer<value_type, allocator_type&> __v(__alloc_);
1260+
auto __guard = std::__make_exception_guard(
1261+
_AllocatorDestroyRangeReverse<allocator_type, pointer>(__alloc_, __old_last, this->__end_));
1262+
__v.__construct_at_end_with_sentinel(std::move(__first), std::move(__last));
1263+
__split_buffer<value_type, allocator_type&> __merged(
1264+
__recommend(size() + __v.size()), __off, __alloc_); // has `__off` positions available at the front
1265+
std::__uninitialized_allocator_relocate(
1266+
__alloc_, std::__to_address(__old_last), std::__to_address(this->__end_), std::__to_address(__merged.__end_));
1267+
__guard.__complete(); // Release the guard once objects in [__old_last_, __end_) have been successfully relocated.
1268+
__merged.__end_ += this->__end_ - __old_last;
1269+
this->__end_ = __old_last;
1270+
std::__uninitialized_allocator_relocate(
1271+
__alloc_, std::__to_address(__v.__begin_), std::__to_address(__v.__end_), std::__to_address(__merged.__end_));
1272+
__merged.__end_ += __v.size();
1273+
__v.__end_ = __v.__begin_;
1274+
__p = __swap_out_circular_buffer(__merged, __p);
12551275
}
1256-
__split_buffer<value_type, allocator_type&> __v(this->__alloc_);
1257-
if (__first != __last) {
1258-
#if _LIBCPP_HAS_EXCEPTIONS
1259-
try {
1260-
#endif // _LIBCPP_HAS_EXCEPTIONS
1261-
__v.__construct_at_end_with_sentinel(std::move(__first), std::move(__last));
1262-
difference_type __old_size = __old_last - this->__begin_;
1263-
difference_type __old_p = __p - this->__begin_;
1264-
reserve(__recommend(size() + __v.size()));
1265-
__p = this->__begin_ + __old_p;
1266-
__old_last = this->__begin_ + __old_size;
1267-
#if _LIBCPP_HAS_EXCEPTIONS
1268-
} catch (...) {
1269-
erase(__make_iter(__old_last), end());
1270-
throw;
1271-
}
1272-
#endif // _LIBCPP_HAS_EXCEPTIONS
1273-
}
1274-
__p = std::rotate(__p, __old_last, this->__end_);
1275-
insert(__make_iter(__p), std::make_move_iterator(__v.begin()), std::make_move_iterator(__v.end()));
1276-
return begin() + __off;
1276+
return __make_iter(__p);
12771277
}
12781278

12791279
template <class _Tp, class _Allocator>

libcxx/test/benchmarks/GenerateInput.h

+8-1
Original file line numberDiff line numberDiff line change
@@ -134,7 +134,14 @@ std::vector<std::vector<IntT>> getRandomIntegerInputsWithLength(std::size_t N, s
134134
return inputs;
135135
}
136136

137-
inline std::vector<std::string> getPrefixedRandomStringInputs(std::size_t N) {
137+
inline std::vector<std::string> getSSORandomStringInputs(size_t N) {
138+
std::vector<std::string> inputs;
139+
for (size_t i = 0; i < N; ++i)
140+
inputs.push_back(getRandomString(10)); // SSO
141+
return inputs;
142+
}
143+
144+
inline std::vector<std::string> getPrefixedRandomStringInputs(size_t N) {
138145
std::vector<std::string> inputs;
139146
inputs.reserve(N);
140147
constexpr int kSuffixLength = 32;

libcxx/test/benchmarks/containers/ContainerBenchmarks.h

+60
Original file line numberDiff line numberDiff line change
@@ -135,6 +135,66 @@ void BM_InsertValueRehash(benchmark::State& st, Container c, GenInputs gen) {
135135
}
136136
}
137137

138+
template <class Container, class GenInputs>
139+
void BM_Insert_InputIterIter_NoRealloc(benchmark::State& st, Container c, GenInputs gen) {
140+
auto in = gen(st.range(0));
141+
DoNotOptimizeData(in);
142+
const auto size = c.size();
143+
const auto beg = cpp17_input_iterator(in.begin());
144+
const auto end = cpp17_input_iterator(in.end());
145+
c.reserve(size + in.size()); // force no reallocation
146+
for (auto _ : st) {
147+
benchmark::DoNotOptimize(&(*c.insert(c.begin(), beg, end)));
148+
st.PauseTiming();
149+
c.erase(c.begin() + size, c.end()); // avoid the container to grow indefinitely
150+
st.ResumeTiming();
151+
DoNotOptimizeData(c);
152+
benchmark::ClobberMemory();
153+
}
154+
}
155+
156+
template <class Container, class GenInputs>
157+
void BM_Insert_InputIterIter_Realloc_HalfFilled(benchmark::State& st, Container, GenInputs gen) {
158+
const auto size = st.range(0);
159+
Container a = gen(size);
160+
Container in = gen(size + 10);
161+
DoNotOptimizeData(a);
162+
DoNotOptimizeData(in);
163+
const auto beg = cpp17_input_iterator(in.begin());
164+
const auto end = cpp17_input_iterator(in.end());
165+
for (auto _ : st) {
166+
st.PauseTiming();
167+
Container c;
168+
c.reserve(size * 2); // Reallocation with half-filled container
169+
c = a;
170+
st.ResumeTiming();
171+
benchmark::DoNotOptimize(&(*c.insert(c.begin(), beg, end)));
172+
DoNotOptimizeData(c);
173+
benchmark::ClobberMemory();
174+
}
175+
}
176+
177+
template <class Container, class GenInputs>
178+
void BM_Insert_InputIterIter_Realloc_NearFull(benchmark::State& st, Container, GenInputs gen) {
179+
const auto size = st.range(0);
180+
Container a = gen(size);
181+
Container in = gen(10);
182+
DoNotOptimizeData(a);
183+
DoNotOptimizeData(in);
184+
const auto beg = cpp17_input_iterator(in.begin());
185+
const auto end = cpp17_input_iterator(in.end());
186+
for (auto _ : st) {
187+
st.PauseTiming();
188+
Container c;
189+
c.reserve(size + 5); // Reallocation almost-full container
190+
c = a;
191+
st.ResumeTiming();
192+
benchmark::DoNotOptimize(&(*c.insert(c.begin(), beg, end)));
193+
DoNotOptimizeData(c);
194+
benchmark::ClobberMemory();
195+
}
196+
}
197+
138198
template <class Container, class GenInputs>
139199
void BM_InsertDuplicate(benchmark::State& st, Container c, GenInputs gen) {
140200
auto in = gen(st.range(0));

libcxx/test/benchmarks/containers/vector_operations.bench.cpp

+14
Original file line numberDiff line numberDiff line change
@@ -91,4 +91,18 @@ BENCHMARK_CAPTURE(BM_AssignInputIterIter<100>,
9191
getRandomIntegerInputsWithLength<int>)
9292
->Args({TestNumInputs, TestNumInputs});
9393

94+
BENCHMARK_CAPTURE(BM_Insert_InputIterIter_NoRealloc, vector_int, std::vector<int>(100, 1), getRandomIntegerInputs<int>)
95+
->Arg(514048);
96+
BENCHMARK_CAPTURE(
97+
BM_Insert_InputIterIter_Realloc_HalfFilled, vector_int, std::vector<int>{}, getRandomIntegerInputs<int>)
98+
->Arg(514048);
99+
BENCHMARK_CAPTURE(BM_Insert_InputIterIter_Realloc_NearFull, vector_int, std::vector<int>{}, getRandomIntegerInputs<int>)
100+
->Arg(514048);
101+
BENCHMARK_CAPTURE(
102+
BM_Insert_InputIterIter_Realloc_HalfFilled, vector_string, std::vector<std::string>{}, getSSORandomStringInputs)
103+
->Arg(514048);
104+
BENCHMARK_CAPTURE(
105+
BM_Insert_InputIterIter_Realloc_NearFull, vector_string, std::vector<std::string>{}, getSSORandomStringInputs)
106+
->Arg(514048);
107+
94108
BENCHMARK_MAIN();

0 commit comments

Comments
 (0)