Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize input_iterator-pair insert for std::vector #113768

Merged
merged 5 commits into from
Jan 14, 2025

Conversation

winner245
Copy link
Contributor

@winner245 winner245 commented Oct 26, 2024

As a follow-up to #113852, this PR optimizes the performance of the insert(const_iterator pos, InputIt first, InputIt last) function for input_iterator-pair inputs in std::vector for cases where reallocation occurs during insertion. Additionally, this optimization enhances exception safety by replacing the traditional try-catch mechanism with a modern exception guard for the insert function.

1. Performance Improvement:

The optimization targets cases where insertion trigger reallocation. In scenarios without reallocation, the implementation remains unchanged.

Previous implementation:

The previous implementation of insert is inefficient in reallocation scenarios because it performs the following steps separately:

  • reserve(): This leads to the first round of relocating old elements to new memory;
  • rotate(): This leads to the second round of reorganizing the existing elements;
  • Move-forward: Moves the elements after the insertion position to their final positions.
  • Insert: performs the actual insertion.

This approach results in a lot of redundant operations, requiring the elements to undergo three rounds of relocations/reorganizations to be placed in their final positions.

Proposed implementation:

The proposed implementation jointly optimize the above 4 steps in the previous implementation such that each element is placed in its final position in just one round of relocation. Specifically, this optimization reduces the total cost from 2 relocations + 1 std::rotate call to just 1 relocation, without needing to call std::rotate, thereby significantly improving overall performance.

2. Exception Safety:

Replaced the traditional try-catch mechanism with a modern exception guard to enhance exception safety.

Benchmark Testing:

Before

----------------------------------------------------------------------------------------------------------
Benchmark                                                                Time             CPU   Iterations
----------------------------------------------------------------------------------------------------------
BM_Insert_InputIterIter_NoRealloc/vector_int/514048                1005980 ns      1097760 ns          616
BM_Insert_InputIterIter_Realloc_HalfFilled/vector_int/514048        693841 ns       757618 ns          927
BM_Insert_InputIterIter_Realloc_NearFull/vector_int/514048          740592 ns       808204 ns          863
BM_Insert_InputIterIter_Realloc_HalfFilled/vector_string/514048    8008397 ns      8736346 ns           79
BM_Insert_InputIterIter_Realloc_NearFull/vector_string/514048      3238763 ns      3494092 ns          201

After

----------------------------------------------------------------------------------------------------------
Benchmark                                                                Time             CPU   Iterations
----------------------------------------------------------------------------------------------------------
BM_Insert_InputIterIter_NoRealloc/vector_int/514048                1004259 ns      1095871 ns          616
BM_Insert_InputIterIter_Realloc_HalfFilled/vector_int/514048        541093 ns       590598 ns         1099
BM_Insert_InputIterIter_Realloc_NearFull/vector_int/514048           69801 ns        76075 ns         8902
BM_Insert_InputIterIter_Realloc_HalfFilled/vector_string/514048    5739119 ns      6260323 ns          115
BM_Insert_InputIterIter_Realloc_NearFull/vector_string/514048      1434683 ns      1419476 ns          489

Observations:

The optimized implementation maintains the same performance in non-reallocation scenarios while achieving significant improvements during reallocation:

  • For std::vector<int>, the optimized implementation achieves performance improvements of 1.3x and 10x for a half-filled and almost-full vector, respectively.
  • For std::vector<std::string>, the optimized implementation achieves performance improvements of 1.4x and 2.3x for a half-filled and almost-full vector, respectively.

@winner245 winner245 requested a review from a team as a code owner October 26, 2024 17:38
@llvmbot llvmbot added the libc++ libc++ C++ Standard Library. Not GNU libstdc++. Not libc++abi. label Oct 26, 2024
@llvmbot
Copy link
Member

llvmbot commented Oct 26, 2024

@llvm/pr-subscribers-libcxx

Author: Peng Liu (winner245)

Changes

This PR includes optimizations and enhancements to the __insert_with_sentinel function in the std::vector implementation, focusing on performance improvements and better exception handling.

Details:

  1. Performance Improvement:

    • The existing implementation triggers a reserve() operation, causing a round of memory relocation, followed by an additional std::rotate operation, and finally another memory relocation for elements after the insertion point.
    • The improved version eliminates redundant operations by directly relocating both existing and new elements to their final positions within the allocated __split_buffer.
    • This optimization reduces the total cost from 2 relocations + 1 std::rotate to 1 relocation without the need for std::rotate, improving the overall performance. Note that std::rotate is only needed when the vector has enough space for insertion where reallocation does not happen.
  2. Exception Safety:

    • Replaced the traditional try-catch mechanism with the modern approach of exception guard.

Testing:
New test cases are added in libcxx/test/std/containers/sequences/vector/vector.modifiers/insert_iter_iter_iter.pass.cpp, which check both cases where reallocation happens or not. Passed all existing libcxx tests for std::vector based on my local testing.


Full diff: https://github.com/llvm/llvm-project/pull/113768.diff

2 Files Affected:

  • (modified) libcxx/include/__vector/vector.h (+20-20)
  • (modified) libcxx/test/std/containers/sequences/vector/vector.modifiers/insert_iter_iter_iter.pass.cpp (+40)
diff --git a/libcxx/include/__vector/vector.h b/libcxx/include/__vector/vector.h
index 7889e8c2201ac1..02f91b537c7e22 100644
--- a/libcxx/include/__vector/vector.h
+++ b/libcxx/include/__vector/vector.h
@@ -1360,27 +1360,27 @@ vector<_Tp, _Allocator>::__insert_with_sentinel(const_iterator __position, _Inpu
   for (; this->__end_ != this->__end_cap() && __first != __last; ++__first) {
     __construct_one_at_end(*__first);
   }
-  __split_buffer<value_type, allocator_type&> __v(__a);
-  if (__first != __last) {
-#if _LIBCPP_HAS_EXCEPTIONS
-    try {
-#endif // _LIBCPP_HAS_EXCEPTIONS
-      __v.__construct_at_end_with_sentinel(std::move(__first), std::move(__last));
-      difference_type __old_size = __old_last - this->__begin_;
-      difference_type __old_p    = __p - this->__begin_;
-      reserve(__recommend(size() + __v.size()));
-      __p        = this->__begin_ + __old_p;
-      __old_last = this->__begin_ + __old_size;
-#if _LIBCPP_HAS_EXCEPTIONS
-    } catch (...) {
-      erase(__make_iter(__old_last), end());
-      throw;
-    }
-#endif // _LIBCPP_HAS_EXCEPTIONS
+
+  if (__first == __last)
+    (void)std::rotate(__p, __old_last, this->__end_);
+  else {
+    __split_buffer<value_type, allocator_type&> __v(__a);
+    auto __guard =
+        std::__make_exception_guard(_AllocatorDestroyRangeReverse<allocator_type, pointer>(__a, __old_last, __end_));
+    __v.__construct_at_end_with_sentinel(std::move(__first), std::move(__last));
+    __split_buffer<value_type, allocator_type&> __merged(__recommend(size() + __v.size()), __off, __a);
+    std::__uninitialized_allocator_relocate(
+        __a, std::__to_address(__old_last), std::__to_address(__end_), std::__to_address(__merged.__end_));
+    __merged.__end_ += __end_ - __old_last;
+    __end_ = __old_last;
+    __guard.__complete();
+    std::__uninitialized_allocator_relocate(
+        __a, std::__to_address(__v.__begin_), std::__to_address(__v.__end_), std::__to_address(__merged.__end_));
+    __merged.__end_ += __v.size();
+    __v.__begin_ = __v.__end_;
+    __p          = __swap_out_circular_buffer(__merged, __p);
   }
-  __p = std::rotate(__p, __old_last, this->__end_);
-  insert(__make_iter(__p), std::make_move_iterator(__v.begin()), std::make_move_iterator(__v.end()));
-  return begin() + __off;
+  return __make_iter(__p);
 }
 
 template <class _Tp, class _Allocator>
diff --git a/libcxx/test/std/containers/sequences/vector/vector.modifiers/insert_iter_iter_iter.pass.cpp b/libcxx/test/std/containers/sequences/vector/vector.modifiers/insert_iter_iter_iter.pass.cpp
index 934b85ce01c67b..8dce6e5c1a690e 100644
--- a/libcxx/test/std/containers/sequences/vector/vector.modifiers/insert_iter_iter_iter.pass.cpp
+++ b/libcxx/test/std/containers/sequences/vector/vector.modifiers/insert_iter_iter_iter.pass.cpp
@@ -46,6 +46,46 @@ TEST_CONSTEXPR_CXX20 bool tests()
         for (; j < 105; ++j)
             assert(v[j] == 0);
     }
+    {   // Vector may or may not need to reallocate because of the insertion -- test both cases.
+      { // The input range is shorter than the remaining capacity of the vector -- ensure no reallocation happens.
+        typedef std::vector<int> V;
+        V v(100);
+        v.reserve(v.size() + 10);
+        int a[]     = {1, 2, 3, 4, 5};
+        const int N = sizeof(a) / sizeof(a[0]);
+        V::iterator i =
+            v.insert(v.cbegin() + 10, cpp17_input_iterator<const int*>(a), cpp17_input_iterator<const int*>(a + N));
+        assert(v.size() == 100 + N);
+        assert(is_contiguous_container_asan_correct(v));
+        assert(i == v.begin() + 10);
+        int j;
+        for (j = 0; j < 10; ++j)
+          assert(v[j] == 0);
+        for (std::size_t k = 0; k < N; ++j, ++k)
+          assert(v[j] == a[k]);
+        for (; j < 105; ++j)
+          assert(v[j] == 0);
+      }
+      { // The input range is longer than the remaining capacity of the vector -- ensure reallocation happens.
+        typedef std::vector<int> V;
+        V v(100);
+        v.reserve(v.size() + 2);
+        int a[]     = {1, 2, 3, 4, 5};
+        const int N = sizeof(a) / sizeof(a[0]);
+        V::iterator i =
+            v.insert(v.cbegin() + 10, cpp17_input_iterator<const int*>(a), cpp17_input_iterator<const int*>(a + N));
+        assert(v.size() == 100 + N);
+        assert(is_contiguous_container_asan_correct(v));
+        assert(i == v.begin() + 10);
+        int j;
+        for (j = 0; j < 10; ++j)
+          assert(v[j] == 0);
+        for (std::size_t k = 0; k < N; ++j, ++k)
+          assert(v[j] == a[k]);
+        for (; j < 105; ++j)
+          assert(v[j] == 0);
+      }
+    }
     {
         typedef std::vector<int> V;
         V v(100);

Copy link
Member

@ldionne ldionne left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the patch, this is a nice find! It looks like a really worthwhile optimization. I have some comments and questions.

@winner245 winner245 force-pushed the winner245/vector-insert_with_sentinel branch from be9ddb6 to a31c874 Compare November 7, 2024 22:11
Copy link

github-actions bot commented Nov 7, 2024

✅ With the latest revision this PR passed the C/C++ code formatter.

@winner245 winner245 force-pushed the winner245/vector-insert_with_sentinel branch from a31c874 to 9397cb4 Compare November 7, 2024 22:35
Copy link
Member

@ldionne ldionne left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you rebase this onto main?

@@ -119,6 +119,90 @@ void BM_InsertValueRehash(benchmark::State& st, Container c, GenInputs gen) {
}
}

// Wrap any Iterator into an input iterator
template <typename Iterator>
class InputIterator {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you could reuse the type from test_iterators.h like you've done in some of your recent patches.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I totally agree. This is one of my earliest PRs. During that time, I was not aware of the test utilities we already have. Now I have better understandings on the test framework libc++ offered, and will reuse the code as much as possible. In the meantime, I will try to improve this early work and provide an updated benchmark result.

@winner245 winner245 force-pushed the winner245/vector-insert_with_sentinel branch 3 times, most recently from 39373a0 to 19f0d7f Compare December 20, 2024 14:42
@winner245 winner245 changed the title Optimize __insert_with_sentinel Function in std::vector Optimize iterator-pair insert for std::vector Dec 20, 2024
@winner245 winner245 changed the title Optimize iterator-pair insert for std::vector Optimize input_iterator-pair insert for std::vector Dec 20, 2024
@winner245
Copy link
Contributor Author

winner245 commented Dec 20, 2024

Summary of the current updates to this PR:

  • PR description: This is rewritten to better explain the improvements introduced by this PR.
  • Benchmarks: The benchmark test functions have been slightly revised to yield more reliable results. Additionally, all benchmarks have been re-run with larger input sizes to make the comparisons more reliable and stable.
  • Code reusability: reuse test_iterators.h to avoid code duplication (The increased number of lines in the code change is due to the required clang-format operation).
  • Release note update: a new release note entry regarding the performance improvement for insert has been added. Additionally, an earlier release note entry (created by myself) is slightly modified to fix some formatting issues.

@winner245 winner245 force-pushed the winner245/vector-insert_with_sentinel branch from ce4ac8e to d57c222 Compare December 31, 2024 01:09
@winner245 winner245 force-pushed the winner245/vector-insert_with_sentinel branch from d57c222 to 8ab512d Compare January 12, 2025 12:52
Copy link
Member

@ldionne ldionne left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This LGTM with a small nitpick about adding a comment, and a suggestion to optimize further. Given I have no more feedback, I actually suggest that we land the patch after my nitpick comment and investigate further optimizations in later patches, to unblock this work and make sure we hit LLVM 20.

Thanks for the great patch!

if (__first == __last)
(void)std::rotate(__p, __old_last, this->__end_);
else {
__split_buffer<value_type, allocator_type&> __v(__alloc_);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we could have a heuristic of some kind here, we could pre-allocate __v to a given size and hope that we get lucky. That way, we might not have to reallocate memory for __v. Without a heuristic though, I think allocating the smallest amount of memory (what you have right now) is probably the best approach. It's worth thinking about whether such a heuristic exists.

Suggested change
__split_buffer<value_type, allocator_type&> __v(__alloc_);
__split_buffer<value_type, allocator_type&> __v(__alloc_);
__v.reserve(__recommend(capacity() + heuristic-maybe));
// now if we got lucky, perhaps __v.capacity() is enough to hold (size() + (last-first)) elements

@ldionne
Copy link
Member

ldionne commented Jan 14, 2025

Since the CI was already passing and I only added a comment, merging.

@ldionne ldionne merged commit 0298e58 into llvm:main Jan 14, 2025
12 of 22 checks passed
@winner245 winner245 deleted the winner245/vector-insert_with_sentinel branch January 14, 2025 22:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
libc++ libc++ C++ Standard Library. Not GNU libstdc++. Not libc++abi. performance
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants