-
Notifications
You must be signed in to change notification settings - Fork 13.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimize input_iterator-pair insert
for std::vector
#113768
Optimize input_iterator-pair insert
for std::vector
#113768
Conversation
@llvm/pr-subscribers-libcxx Author: Peng Liu (winner245) ChangesThis PR includes optimizations and enhancements to the Details:
Testing: Full diff: https://github.com/llvm/llvm-project/pull/113768.diff 2 Files Affected:
diff --git a/libcxx/include/__vector/vector.h b/libcxx/include/__vector/vector.h
index 7889e8c2201ac1..02f91b537c7e22 100644
--- a/libcxx/include/__vector/vector.h
+++ b/libcxx/include/__vector/vector.h
@@ -1360,27 +1360,27 @@ vector<_Tp, _Allocator>::__insert_with_sentinel(const_iterator __position, _Inpu
for (; this->__end_ != this->__end_cap() && __first != __last; ++__first) {
__construct_one_at_end(*__first);
}
- __split_buffer<value_type, allocator_type&> __v(__a);
- if (__first != __last) {
-#if _LIBCPP_HAS_EXCEPTIONS
- try {
-#endif // _LIBCPP_HAS_EXCEPTIONS
- __v.__construct_at_end_with_sentinel(std::move(__first), std::move(__last));
- difference_type __old_size = __old_last - this->__begin_;
- difference_type __old_p = __p - this->__begin_;
- reserve(__recommend(size() + __v.size()));
- __p = this->__begin_ + __old_p;
- __old_last = this->__begin_ + __old_size;
-#if _LIBCPP_HAS_EXCEPTIONS
- } catch (...) {
- erase(__make_iter(__old_last), end());
- throw;
- }
-#endif // _LIBCPP_HAS_EXCEPTIONS
+
+ if (__first == __last)
+ (void)std::rotate(__p, __old_last, this->__end_);
+ else {
+ __split_buffer<value_type, allocator_type&> __v(__a);
+ auto __guard =
+ std::__make_exception_guard(_AllocatorDestroyRangeReverse<allocator_type, pointer>(__a, __old_last, __end_));
+ __v.__construct_at_end_with_sentinel(std::move(__first), std::move(__last));
+ __split_buffer<value_type, allocator_type&> __merged(__recommend(size() + __v.size()), __off, __a);
+ std::__uninitialized_allocator_relocate(
+ __a, std::__to_address(__old_last), std::__to_address(__end_), std::__to_address(__merged.__end_));
+ __merged.__end_ += __end_ - __old_last;
+ __end_ = __old_last;
+ __guard.__complete();
+ std::__uninitialized_allocator_relocate(
+ __a, std::__to_address(__v.__begin_), std::__to_address(__v.__end_), std::__to_address(__merged.__end_));
+ __merged.__end_ += __v.size();
+ __v.__begin_ = __v.__end_;
+ __p = __swap_out_circular_buffer(__merged, __p);
}
- __p = std::rotate(__p, __old_last, this->__end_);
- insert(__make_iter(__p), std::make_move_iterator(__v.begin()), std::make_move_iterator(__v.end()));
- return begin() + __off;
+ return __make_iter(__p);
}
template <class _Tp, class _Allocator>
diff --git a/libcxx/test/std/containers/sequences/vector/vector.modifiers/insert_iter_iter_iter.pass.cpp b/libcxx/test/std/containers/sequences/vector/vector.modifiers/insert_iter_iter_iter.pass.cpp
index 934b85ce01c67b..8dce6e5c1a690e 100644
--- a/libcxx/test/std/containers/sequences/vector/vector.modifiers/insert_iter_iter_iter.pass.cpp
+++ b/libcxx/test/std/containers/sequences/vector/vector.modifiers/insert_iter_iter_iter.pass.cpp
@@ -46,6 +46,46 @@ TEST_CONSTEXPR_CXX20 bool tests()
for (; j < 105; ++j)
assert(v[j] == 0);
}
+ { // Vector may or may not need to reallocate because of the insertion -- test both cases.
+ { // The input range is shorter than the remaining capacity of the vector -- ensure no reallocation happens.
+ typedef std::vector<int> V;
+ V v(100);
+ v.reserve(v.size() + 10);
+ int a[] = {1, 2, 3, 4, 5};
+ const int N = sizeof(a) / sizeof(a[0]);
+ V::iterator i =
+ v.insert(v.cbegin() + 10, cpp17_input_iterator<const int*>(a), cpp17_input_iterator<const int*>(a + N));
+ assert(v.size() == 100 + N);
+ assert(is_contiguous_container_asan_correct(v));
+ assert(i == v.begin() + 10);
+ int j;
+ for (j = 0; j < 10; ++j)
+ assert(v[j] == 0);
+ for (std::size_t k = 0; k < N; ++j, ++k)
+ assert(v[j] == a[k]);
+ for (; j < 105; ++j)
+ assert(v[j] == 0);
+ }
+ { // The input range is longer than the remaining capacity of the vector -- ensure reallocation happens.
+ typedef std::vector<int> V;
+ V v(100);
+ v.reserve(v.size() + 2);
+ int a[] = {1, 2, 3, 4, 5};
+ const int N = sizeof(a) / sizeof(a[0]);
+ V::iterator i =
+ v.insert(v.cbegin() + 10, cpp17_input_iterator<const int*>(a), cpp17_input_iterator<const int*>(a + N));
+ assert(v.size() == 100 + N);
+ assert(is_contiguous_container_asan_correct(v));
+ assert(i == v.begin() + 10);
+ int j;
+ for (j = 0; j < 10; ++j)
+ assert(v[j] == 0);
+ for (std::size_t k = 0; k < N; ++j, ++k)
+ assert(v[j] == a[k]);
+ for (; j < 105; ++j)
+ assert(v[j] == 0);
+ }
+ }
{
typedef std::vector<int> V;
V v(100);
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the patch, this is a nice find! It looks like a really worthwhile optimization. I have some comments and questions.
be9ddb6
to
a31c874
Compare
✅ With the latest revision this PR passed the C/C++ code formatter. |
a31c874
to
9397cb4
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you rebase this onto main
?
@@ -119,6 +119,90 @@ void BM_InsertValueRehash(benchmark::State& st, Container c, GenInputs gen) { | |||
} | |||
} | |||
|
|||
// Wrap any Iterator into an input iterator | |||
template <typename Iterator> | |||
class InputIterator { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you could reuse the type from test_iterators.h
like you've done in some of your recent patches.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I totally agree. This is one of my earliest PRs. During that time, I was not aware of the test utilities we already have. Now I have better understandings on the test framework libc++ offered, and will reuse the code as much as possible. In the meantime, I will try to improve this early work and provide an updated benchmark result.
39373a0
to
19f0d7f
Compare
insert
for std::vector
insert
for std::vectorinsert
for std::vector
Summary of the current updates to this PR:
|
ce4ac8e
to
d57c222
Compare
d57c222
to
8ab512d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This LGTM with a small nitpick about adding a comment, and a suggestion to optimize further. Given I have no more feedback, I actually suggest that we land the patch after my nitpick comment and investigate further optimizations in later patches, to unblock this work and make sure we hit LLVM 20.
Thanks for the great patch!
if (__first == __last) | ||
(void)std::rotate(__p, __old_last, this->__end_); | ||
else { | ||
__split_buffer<value_type, allocator_type&> __v(__alloc_); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we could have a heuristic of some kind here, we could pre-allocate __v
to a given size and hope that we get lucky. That way, we might not have to reallocate memory for __v
. Without a heuristic though, I think allocating the smallest amount of memory (what you have right now) is probably the best approach. It's worth thinking about whether such a heuristic exists.
__split_buffer<value_type, allocator_type&> __v(__alloc_); | |
__split_buffer<value_type, allocator_type&> __v(__alloc_); | |
__v.reserve(__recommend(capacity() + heuristic-maybe)); | |
// now if we got lucky, perhaps __v.capacity() is enough to hold (size() + (last-first)) elements |
Since the CI was already passing and I only added a comment, merging. |
As a follow-up to #113852, this PR optimizes the performance of the
insert(const_iterator pos, InputIt first, InputIt last)
function forinput_iterator
-pair inputs instd::vector
for cases where reallocation occurs during insertion. Additionally, this optimization enhances exception safety by replacing the traditionaltry-catch
mechanism with a modern exception guard for theinsert
function.1. Performance Improvement:
The optimization targets cases where insertion trigger reallocation. In scenarios without reallocation, the implementation remains unchanged.
Previous implementation:
The previous implementation of
insert
is inefficient in reallocation scenarios because it performs the following steps separately:reserve()
: This leads to the first round of relocating old elements to new memory;rotate()
: This leads to the second round of reorganizing the existing elements;This approach results in a lot of redundant operations, requiring the elements to undergo three rounds of relocations/reorganizations to be placed in their final positions.
Proposed implementation:
The proposed implementation jointly optimize the above 4 steps in the previous implementation such that each element is placed in its final position in just one round of relocation. Specifically, this optimization reduces the total cost from 2 relocations + 1 std::rotate call to just 1 relocation, without needing to call
std::rotate
, thereby significantly improving overall performance.2. Exception Safety:
Replaced the traditional try-catch mechanism with a modern exception guard to enhance exception safety.
Benchmark Testing:
Before
After
Observations:
The optimized implementation maintains the same performance in non-reallocation scenarios while achieving significant improvements during reallocation:
std::vector<int>
, the optimized implementation achieves performance improvements of 1.3x and 10x for a half-filled and almost-full vector, respectively.std::vector<std::string>
, the optimized implementation achieves performance improvements of 1.4x and 2.3x for a half-filled and almost-full vector, respectively.