-
Notifications
You must be signed in to change notification settings - Fork 13.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[libc++] Refactor the sequence container benchmarks #119763
[libc++] Refactor the sequence container benchmarks #119763
Conversation
@llvm/pr-subscribers-libcxx Author: Louis Dionne (ldionne) ChangesRewrite the sequence container benchmarks to only rely on the actual operations specified in SequenceContainer requirements and add benchmarks for std::list, which is also considered a sequence container. One of the major goals of this refactoring is also to make these container benchmarks run faster so that they can be run more frequently. The existing benchmarks have the significant problem that they take so long to run that they must basically be run overnight. This patch reduces the size of inputs such that the rewritten benchmarks each take at most a minute to run. This patch doesn't touch the string benchmarks, which were not using the generic container benchmark functions previously. Patch is 32.81 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/119763.diff 8 Files Affected:
diff --git a/libcxx/test/benchmarks/Utilities.h b/libcxx/test/benchmarks/Utilities.h
deleted file mode 100644
index fed16ba51f995f..00000000000000
--- a/libcxx/test/benchmarks/Utilities.h
+++ /dev/null
@@ -1,37 +0,0 @@
-// -*- C++ -*-
-//===----------------------------------------------------------------------===//
-//
-// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
-// See https://llvm.org/LICENSE.txt for license information.
-// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
-//
-//===----------------------------------------------------------------------===//
-
-#ifndef BENCHMARK_UTILITIES_H
-#define BENCHMARK_UTILITIES_H
-
-#include <cassert>
-#include <type_traits>
-
-#include "benchmark/benchmark.h"
-
-namespace UtilitiesInternal {
-template <class Container>
-auto HaveDataImpl(int) -> decltype((std::declval<Container&>().data(), std::true_type{}));
-template <class Container>
-auto HaveDataImpl(long) -> std::false_type;
-template <class T>
-using HasData = decltype(HaveDataImpl<T>(0));
-} // namespace UtilitiesInternal
-
-template <class Container, std::enable_if_t<UtilitiesInternal::HasData<Container>::value>* = nullptr>
-void DoNotOptimizeData(Container& c) {
- benchmark::DoNotOptimize(c.data());
-}
-
-template <class Container, std::enable_if_t<!UtilitiesInternal::HasData<Container>::value>* = nullptr>
-void DoNotOptimizeData(Container& c) {
- benchmark::DoNotOptimize(&c);
-}
-
-#endif // BENCHMARK_UTILITIES_H
diff --git a/libcxx/test/benchmarks/containers/ContainerBenchmarks.h b/libcxx/test/benchmarks/containers/ContainerBenchmarks.h
deleted file mode 100644
index 6d21e12896ec9e..00000000000000
--- a/libcxx/test/benchmarks/containers/ContainerBenchmarks.h
+++ /dev/null
@@ -1,272 +0,0 @@
-// -*- C++ -*-
-//===----------------------------------------------------------------------===//
-//
-// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
-// See https://llvm.org/LICENSE.txt for license information.
-// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
-//
-//===----------------------------------------------------------------------===//
-
-#ifndef BENCHMARK_CONTAINER_BENCHMARKS_H
-#define BENCHMARK_CONTAINER_BENCHMARKS_H
-
-#include <cassert>
-#include <iterator>
-#include <utility>
-
-#include "benchmark/benchmark.h"
-#include "../Utilities.h"
-#include "test_iterators.h"
-
-namespace ContainerBenchmarks {
-
-template <class Container>
-void BM_ConstructSize(benchmark::State& st, Container) {
- auto size = st.range(0);
- for (auto _ : st) {
- Container c(size);
- DoNotOptimizeData(c);
- }
-}
-
-template <class Container>
-void BM_CopyConstruct(benchmark::State& st, Container) {
- auto size = st.range(0);
- Container c(size);
- for (auto _ : st) {
- auto v = c;
- DoNotOptimizeData(v);
- }
-}
-
-template <class Container>
-void BM_Assignment(benchmark::State& st, Container) {
- auto size = st.range(0);
- Container c1;
- Container c2(size);
- for (auto _ : st) {
- c1 = c2;
- DoNotOptimizeData(c1);
- DoNotOptimizeData(c2);
- }
-}
-
-template <std::size_t... sz, typename Container, typename GenInputs>
-void BM_AssignInputIterIter(benchmark::State& st, Container c, GenInputs gen) {
- auto v = gen(1, sz...);
- c.resize(st.range(0), v[0]);
- auto in = gen(st.range(1), sz...);
- benchmark::DoNotOptimize(&in);
- benchmark::DoNotOptimize(&c);
- for (auto _ : st) {
- c.assign(cpp17_input_iterator(in.begin()), cpp17_input_iterator(in.end()));
- benchmark::ClobberMemory();
- }
-}
-
-template <class Container>
-void BM_ConstructSizeValue(benchmark::State& st, Container, typename Container::value_type const& val) {
- const auto size = st.range(0);
- for (auto _ : st) {
- Container c(size, val);
- DoNotOptimizeData(c);
- }
-}
-
-template <class Container, class GenInputs>
-void BM_ConstructIterIter(benchmark::State& st, Container, GenInputs gen) {
- auto in = gen(st.range(0));
- const auto begin = in.begin();
- const auto end = in.end();
- benchmark::DoNotOptimize(&in);
- while (st.KeepRunning()) {
- Container c(begin, end);
- DoNotOptimizeData(c);
- }
-}
-
-template <class Container, class GenInputs>
-void BM_ConstructFromRange(benchmark::State& st, Container, GenInputs gen) {
- auto in = gen(st.range(0));
- benchmark::DoNotOptimize(&in);
- while (st.KeepRunning()) {
- Container c(std::from_range, in);
- DoNotOptimizeData(c);
- }
-}
-
-template <class Container>
-void BM_Pushback_no_grow(benchmark::State& state, Container c) {
- int count = state.range(0);
- c.reserve(count);
- while (state.KeepRunningBatch(count)) {
- c.clear();
- for (int i = 0; i != count; ++i) {
- c.push_back(i);
- }
- benchmark::DoNotOptimize(c.data());
- }
-}
-
-template <class Container, class GenInputs>
-void BM_InsertValue(benchmark::State& st, Container c, GenInputs gen) {
- auto in = gen(st.range(0));
- const auto end = in.end();
- while (st.KeepRunning()) {
- c.clear();
- for (auto it = in.begin(); it != end; ++it) {
- benchmark::DoNotOptimize(&(*c.insert(*it).first));
- }
- benchmark::ClobberMemory();
- }
-}
-
-template <class Container, class GenInputs>
-void BM_InsertValueRehash(benchmark::State& st, Container c, GenInputs gen) {
- auto in = gen(st.range(0));
- const auto end = in.end();
- while (st.KeepRunning()) {
- c.clear();
- c.rehash(16);
- for (auto it = in.begin(); it != end; ++it) {
- benchmark::DoNotOptimize(&(*c.insert(*it).first));
- }
- benchmark::ClobberMemory();
- }
-}
-
-template <class Container, class GenInputs>
-void BM_InsertDuplicate(benchmark::State& st, Container c, GenInputs gen) {
- auto in = gen(st.range(0));
- const auto end = in.end();
- c.insert(in.begin(), in.end());
- benchmark::DoNotOptimize(&c);
- benchmark::DoNotOptimize(&in);
- while (st.KeepRunning()) {
- for (auto it = in.begin(); it != end; ++it) {
- benchmark::DoNotOptimize(&(*c.insert(*it).first));
- }
- benchmark::ClobberMemory();
- }
-}
-
-template <class Container, class GenInputs>
-void BM_EmplaceDuplicate(benchmark::State& st, Container c, GenInputs gen) {
- auto in = gen(st.range(0));
- const auto end = in.end();
- c.insert(in.begin(), in.end());
- benchmark::DoNotOptimize(&c);
- benchmark::DoNotOptimize(&in);
- while (st.KeepRunning()) {
- for (auto it = in.begin(); it != end; ++it) {
- benchmark::DoNotOptimize(&(*c.emplace(*it).first));
- }
- benchmark::ClobberMemory();
- }
-}
-
-template <class Container, class GenInputs>
-void BM_erase_iter_in_middle(benchmark::State& st, Container, GenInputs gen) {
- auto in = gen(st.range(0));
- Container c(in.begin(), in.end());
- assert(c.size() > 2);
- for (auto _ : st) {
- auto mid = std::next(c.begin(), c.size() / 2);
- auto tmp = *mid;
- auto result = c.erase(mid); // erase an element in the middle
- benchmark::DoNotOptimize(result);
- c.push_back(std::move(tmp)); // and then push it back at the end to avoid needing a new container
- }
-}
-
-template <class Container, class GenInputs>
-void BM_erase_iter_at_start(benchmark::State& st, Container, GenInputs gen) {
- auto in = gen(st.range(0));
- Container c(in.begin(), in.end());
- assert(c.size() > 2);
- for (auto _ : st) {
- auto it = c.begin();
- auto tmp = *it;
- auto result = c.erase(it); // erase the first element
- benchmark::DoNotOptimize(result);
- c.push_back(std::move(tmp)); // and then push it back at the end to avoid needing a new container
- }
-}
-
-template <class Container, class GenInputs>
-void BM_Find(benchmark::State& st, Container c, GenInputs gen) {
- auto in = gen(st.range(0));
- c.insert(in.begin(), in.end());
- benchmark::DoNotOptimize(&(*c.begin()));
- const auto end = in.data() + in.size();
- while (st.KeepRunning()) {
- for (auto it = in.data(); it != end; ++it) {
- benchmark::DoNotOptimize(&(*c.find(*it)));
- }
- benchmark::ClobberMemory();
- }
-}
-
-template <class Container, class GenInputs>
-void BM_FindRehash(benchmark::State& st, Container c, GenInputs gen) {
- c.rehash(8);
- auto in = gen(st.range(0));
- c.insert(in.begin(), in.end());
- benchmark::DoNotOptimize(&(*c.begin()));
- const auto end = in.data() + in.size();
- while (st.KeepRunning()) {
- for (auto it = in.data(); it != end; ++it) {
- benchmark::DoNotOptimize(&(*c.find(*it)));
- }
- benchmark::ClobberMemory();
- }
-}
-
-template <class Container, class GenInputs>
-void BM_Rehash(benchmark::State& st, Container c, GenInputs gen) {
- auto in = gen(st.range(0));
- c.max_load_factor(3.0);
- c.insert(in.begin(), in.end());
- benchmark::DoNotOptimize(c);
- const auto bucket_count = c.bucket_count();
- while (st.KeepRunning()) {
- c.rehash(bucket_count + 1);
- c.rehash(bucket_count);
- benchmark::ClobberMemory();
- }
-}
-
-template <class Container, class GenInputs>
-void BM_Compare_same_container(benchmark::State& st, Container, GenInputs gen) {
- auto in = gen(st.range(0));
- Container c1(in.begin(), in.end());
- Container c2 = c1;
-
- benchmark::DoNotOptimize(&(*c1.begin()));
- benchmark::DoNotOptimize(&(*c2.begin()));
- while (st.KeepRunning()) {
- bool res = c1 == c2;
- benchmark::DoNotOptimize(&res);
- benchmark::ClobberMemory();
- }
-}
-
-template <class Container, class GenInputs>
-void BM_Compare_different_containers(benchmark::State& st, Container, GenInputs gen) {
- auto in1 = gen(st.range(0));
- auto in2 = gen(st.range(0));
- Container c1(in1.begin(), in1.end());
- Container c2(in2.begin(), in2.end());
-
- benchmark::DoNotOptimize(&(*c1.begin()));
- benchmark::DoNotOptimize(&(*c2.begin()));
- while (st.KeepRunning()) {
- bool res = c1 == c2;
- benchmark::DoNotOptimize(&res);
- benchmark::ClobberMemory();
- }
-}
-
-} // namespace ContainerBenchmarks
-
-#endif // BENCHMARK_CONTAINER_BENCHMARKS_H
diff --git a/libcxx/test/benchmarks/containers/container_benchmarks.h b/libcxx/test/benchmarks/containers/container_benchmarks.h
new file mode 100644
index 00000000000000..5d476877b0a878
--- /dev/null
+++ b/libcxx/test/benchmarks/containers/container_benchmarks.h
@@ -0,0 +1,435 @@
+// -*- C++ -*-
+//===----------------------------------------------------------------------===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+
+#ifndef TEST_BENCHMARKS_CONTAINERS_CONTAINER_BENCHMARKS_H
+#define TEST_BENCHMARKS_CONTAINERS_CONTAINER_BENCHMARKS_H
+
+#include <cstddef>
+#include <iterator> // for std::next
+#include <ranges> // for std::from_range
+#include <string>
+#include <vector>
+
+#include "benchmark/benchmark.h"
+#include "test_iterators.h"
+#include "test_macros.h"
+
+namespace ContainerBenchmarks {
+
+template <class Container>
+void DoNotOptimizeData(Container& c) {
+ if constexpr (requires { c.data(); }) {
+ benchmark::DoNotOptimize(c.data());
+ } else {
+ benchmark::DoNotOptimize(&c);
+ }
+}
+
+//
+// Sequence container operations
+//
+template <class Container>
+void BM_ctor_size(benchmark::State& st) {
+ auto size = st.range(0);
+ char buffer[sizeof(Container)];
+ for (auto _ : st) {
+ std::construct_at(reinterpret_cast<Container*>(buffer), size);
+ benchmark::DoNotOptimize(buffer);
+ st.PauseTiming();
+ std::destroy_at(reinterpret_cast<Container*>(buffer));
+ st.ResumeTiming();
+ }
+}
+
+template <class Container>
+void BM_ctor_size_value(benchmark::State& st) {
+ using ValueType = typename Container::value_type;
+ const auto size = st.range(0);
+ ValueType value{};
+ benchmark::DoNotOptimize(value);
+ char buffer[sizeof(Container)];
+ for (auto _ : st) {
+ std::construct_at(reinterpret_cast<Container*>(buffer), size, value);
+ benchmark::DoNotOptimize(buffer);
+ st.PauseTiming();
+ std::destroy_at(reinterpret_cast<Container*>(buffer));
+ st.ResumeTiming();
+ }
+}
+
+template <class Container>
+void BM_ctor_iter_iter(benchmark::State& st) {
+ using ValueType = typename Container::value_type;
+ const auto size = st.range(0);
+ std::vector<ValueType> in(size);
+ const auto begin = in.begin();
+ const auto end = in.end();
+ benchmark::DoNotOptimize(in);
+ char buffer[sizeof(Container)];
+ for (auto _ : st) {
+ std::construct_at(reinterpret_cast<Container*>(buffer), begin, end);
+ benchmark::DoNotOptimize(buffer);
+ st.PauseTiming();
+ std::destroy_at(reinterpret_cast<Container*>(buffer));
+ st.ResumeTiming();
+ }
+}
+
+#if TEST_STD_VER >= 23
+template <class Container>
+void BM_ctor_from_range(benchmark::State& st) {
+ using ValueType = typename Container::value_type;
+ const auto size = st.range(0);
+ std::vector<ValueType> in(size);
+ benchmark::DoNotOptimize(in);
+ char buffer[sizeof(Container)];
+ for (auto _ : st) {
+ std::construct_at(reinterpret_cast<Container*>(buffer), std::from_range, in);
+ benchmark::DoNotOptimize(buffer);
+ st.PauseTiming();
+ std::destroy_at(reinterpret_cast<Container*>(buffer));
+ st.ResumeTiming();
+ }
+}
+#endif
+
+template <class Container>
+void BM_ctor_copy(benchmark::State& st) {
+ auto size = st.range(0);
+ Container c(size);
+ char buffer[sizeof(Container)];
+ for (auto _ : st) {
+ std::construct_at(reinterpret_cast<Container*>(buffer), c);
+ benchmark::DoNotOptimize(buffer);
+ st.PauseTiming();
+ std::destroy_at(reinterpret_cast<Container*>(buffer));
+ st.ResumeTiming();
+ }
+}
+
+template <class Container>
+void BM_assignment(benchmark::State& st) {
+ auto size = st.range(0);
+ Container c1;
+ Container c2(size);
+ for (auto _ : st) {
+ c1 = c2;
+ DoNotOptimizeData(c1);
+ DoNotOptimizeData(c2);
+ }
+}
+
+template <typename Container>
+void BM_assign_inputiter(benchmark::State& st) {
+ using ValueType = typename Container::value_type;
+ auto size = st.range(0);
+ std::vector<ValueType> inputs(size);
+ Container c(inputs.begin(), inputs.end());
+ DoNotOptimizeData(c);
+ DoNotOptimizeData(inputs);
+ ValueType* first = inputs.data();
+ ValueType* last = inputs.data() + inputs.size();
+
+ for (auto _ : st) {
+ c.assign(cpp17_input_iterator(first), cpp17_input_iterator(last));
+ benchmark::ClobberMemory();
+ }
+}
+
+template <class Container>
+void BM_insert_middle(benchmark::State& st) {
+ using ValueType = typename Container::value_type;
+ const int count = st.range(0);
+ std::vector<ValueType> inputs(count);
+ Container c(inputs.begin(), inputs.end());
+ DoNotOptimizeData(c);
+
+ ValueType value{};
+ benchmark::DoNotOptimize(value);
+
+ auto mid = std::next(c.begin(), count / 2);
+ for (auto _ : st) {
+ auto inserted = c.insert(mid, value);
+ DoNotOptimizeData(c);
+
+ st.PauseTiming();
+ mid = c.erase(inserted);
+ st.ResumeTiming();
+ }
+}
+
+template <class Container>
+void BM_insert_start(benchmark::State& st) {
+ using ValueType = typename Container::value_type;
+ const int count = st.range(0);
+ std::vector<ValueType> inputs(count);
+ Container c(inputs.begin(), inputs.end());
+ DoNotOptimizeData(c);
+
+ ValueType value{};
+ benchmark::DoNotOptimize(value);
+
+ for (auto _ : st) {
+ auto inserted = c.insert(c.begin(), value);
+ DoNotOptimizeData(c);
+
+ st.PauseTiming();
+ c.erase(inserted);
+ st.ResumeTiming();
+ }
+}
+
+template <class Container>
+void BM_erase_middle(benchmark::State& st) {
+ using ValueType = typename Container::value_type;
+ const int count = st.range(0);
+ std::vector<ValueType> inputs(count);
+ Container c(inputs.begin(), inputs.end());
+ DoNotOptimizeData(c);
+
+ ValueType value{};
+ benchmark::DoNotOptimize(value);
+
+ auto mid = std::next(c.begin(), count / 2);
+ for (auto _ : st) {
+ c.erase(mid);
+ DoNotOptimizeData(c);
+
+ st.PauseTiming();
+ c.insert(c.end(), value); // re-insert an element at the end to avoid needing a new container
+ mid = std::next(c.begin(), c.size() / 2);
+ st.ResumeTiming();
+ }
+}
+
+template <class Container>
+void BM_erase_start(benchmark::State& st) {
+ using ValueType = typename Container::value_type;
+ const int count = st.range(0);
+ std::vector<ValueType> inputs(count);
+ Container c(inputs.begin(), inputs.end());
+ DoNotOptimizeData(c);
+
+ ValueType value{};
+ benchmark::DoNotOptimize(value);
+ for (auto _ : st) {
+ c.erase(c.begin());
+ DoNotOptimizeData(c);
+
+ st.PauseTiming();
+ c.insert(c.end(), value); // re-insert an element at the end to avoid needing a new container
+ st.ResumeTiming();
+ }
+}
+
+template <class Container>
+void sequence_container_benchmarks(std::string container) {
+ benchmark::RegisterBenchmark(container + "::ctor(size)", BM_ctor_size<Container>)->Arg(1024);
+ benchmark::RegisterBenchmark(container + "::ctor(size, value_type)", BM_ctor_size_value<Container>)->Arg(1024);
+ benchmark::RegisterBenchmark(container + "::ctor(Iterator, Iterator)", BM_ctor_iter_iter<Container>)->Arg(1024);
+#if TEST_STD_VER >= 23
+ benchmark::RegisterBenchmark(container + "::ctor(Range)", BM_ctor_from_range<Container>)->Arg(1024);
+#endif
+ benchmark::RegisterBenchmark(container + "::ctor(const&)", BM_ctor_copy<Container>)->Arg(1024);
+ benchmark::RegisterBenchmark(container + "::operator=", BM_assignment<Container>)->Arg(1024);
+ benchmark::RegisterBenchmark(container + "::assign(input-iter, input-iter)", BM_assign_inputiter<Container>)
+ ->Arg(1024);
+ benchmark::RegisterBenchmark(container + "::insert(start)", BM_insert_start<Container>)->Arg(1024);
+ benchmark::RegisterBenchmark(container + "::insert(middle)", BM_insert_middle<Container>)->Arg(1024);
+ benchmark::RegisterBenchmark(container + "::erase(start)", BM_erase_start<Container>)->Arg(1024);
+ benchmark::RegisterBenchmark(container + "::erase(middle)", BM_erase_middle<Container>)->Arg(1024);
+}
+
+//
+// "Back-insertable" sequence container operations
+//
+template <class Container>
+void BM_push_back(benchmark::State& st) {
+ using ValueType = typename Container::value_type;
+ const int count = st.range(0);
+ std::vector<ValueType> inputs(count);
+ benchmark::DoNotOptimize(inputs);
+
+ Container c;
+ DoNotOptimizeData(c);
+ while (st.KeepRunningBatch(count)) {
+ c.clear();
+ for (int i = 0; i != count; ++i) {
+ c.push_back(inputs[i]);
+ }
+ DoNotOptimizeData(c);
+ }
+}
+
+template <class Container>
+void BM_push_back_with_reserve(benchmark::State& st) {
+ using ValueType = typename Container::value_type;
+ const int count = st.range(0);
+ std::vector<ValueType> inputs(count);
+ benchmark::DoNotOptimize(inputs);
+
+ Container c;
+ c.reserve(count);
+ DoNotOptimizeData(c);
+ while (st.KeepRunningBatch(count)) {
+ c.clear();
+ for (int i = 0; i != count; ++i) {
+ c.push_back(inputs[i]);
+ }
+ DoNotOptimizeData(c);
+ }
+}
+
+template <class Container>
+void back_insertable_container_benchmarks(std::string container) {
+ sequence_container_benchmarks<Container>(container);
+ benchmark::RegisterBenchmark(container + "::push_back()", BM_push_back<Container>)->Arg(1024);
+ if constexpr (requires(Container c) { c.reserve(0); }) {
+ benchmark::RegisterBenchmark(container + "::push_back() (with reserve)", BM_push_back_with_reserve<Container>)
+ ->Arg(1024);
+ }
+}
+
+//
+// Misc operations
+//
+template <class Container, class GenInputs>
+void BM_InsertValue(benchmark::State& st, Container c, GenInputs gen) {
+ auto in = gen(st.range(0));
+ const auto end = in.end();
+ while (st.KeepRunning()) {
+ c.clear();
+ for (auto it = in.begin(); it != end; ++it) {
+ benchmark::DoNotOptimize(&(*c.insert(*it).first));
+ }
+ benchmark::ClobberMemory();
+ }
+}
+
+template <class Container, class GenInputs>
+void BM_InsertValueRehash(benchmark::State& st, Container c, GenInputs gen) {
+ auto in = gen(st.range(0));
+ const auto end = in.end();
+ while (st.KeepRunning()) {
+ c.clear();
+ c.rehash(16);
+ for (auto it = in.begin(); it != end; ++it) {
+ benchmark::DoNotOptimize(&(*c.insert(*it).first));
+ }
+ benchmark::ClobberMemory();
+ }
+}
+
+template <class Container, class GenInputs>
+void BM_InsertDuplica...
[truncated]
|
a4d1696
to
06709b9
Compare
} | ||
|
||
template <class Container> | ||
void BM_insert_input_iter_with_reserve_half_filled(benchmark::State& st) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@winner245 For some reason this gives me essentially the same timing as the no_realloc
case above. Does my mistake jump at you?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like BM_insert_input_iter_with_reserve_half_filled
is slightly different than what I had before. Your current implementation would only reserve space for c
, but will not insert half amount of elements. This means that c
is empty and the subsequent insert
would append to the tail (which would run faster than my test). In the test I wrote, I first reserved twice the space for c
, and then assigned half the space with values. Then my insert would leads to more elements relocations than yours.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My version of the benchmark does insert count / 2
elements when it constructs the vector in the first place:
Container c(count / 2);
I then reserve space for count
elements, which ensures the vector has count / 2
elements at the front but enough capacity for count
elements total. That is, unless I've made a big mistake.
In contrast, your benchmark was reserving and then assigning, which will not take advantage of the additional capacity. That is because assignment will replace the underlying buffer, in this case effectively shrinking the vector. Do you agree?
If that's the case, then I think perhaps this version of the benchmark is more correct than the previous one, but I'd like to know your thoughts on this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry. I might have misread your code... Yes, it does insert count / 2
elements during construction. I am trying to understand why your test leads to different performance measurement than mine. I can tell there two major differences which might have caused the difference:
- Test Setup
In my test, I first reserved space for2*n
elements, and then assignedn
elements to containerc
. So I am left with a free space ofn
. Next, I insertedn + 10
elements, where the firstn
elements were inserted into the free space, and the extra 10 were inserted into a temporary__split_buffer
. I have intentionally chosenn + 10
to favor the scenario for the improvement in my PR. This setting tests reallocation without significantly affecting the performance due to the extra buffer. The primary improvement comes from avoiding unnecessary rotation and movement of ranges, which are irrelevant to the extra buffer. If the extra buffer takes too much time to process, it would offset my improvement. Therefore, under the specific choice ofn + a
for some constanta = O(1)
, my performance improvement is maximized.
In your test, let’s denoten = count / 2
. Your code first constructs a container c
with n
elements and then reserves space for 2n
elements. This is similar to my test so far. Then, you insert 2*n
elements, where the first n
fit into the free space, and the remaining n
require a __split_buffer
of size n
(compared to the constant 10
in my test). Hence, the buffer processing time in your test is linear (compared to constant in my test), which takes a significant portion of time.
- Input size I noticed that the input size in your tests is 1024. I have used much larger input size to obtain a reliable result.
In summary, my original tests were performed under a setting favorable to the proposed changes in my PR, and they used a much larger input size, which might have led to the performance difference.
In contrast, your benchmark was reserving and then assigning, which will not take advantage of the additional capacity. That is because assignment will replace the underlying buffer, in this case effectively shrinking the vector. Do you agree?
For std::deque
, yes. The assignment operator calls __maybe_remove_back_spare()
to remove spare space under certain conditions, which might lead to shrinking of the container following assignment. For vector
, assignment leads to reallocation only when the LHS and RHS vectors have incompatible allocators. If the allocators are compatible (as in our test), the assignment reuses the LHS's current space (the reserved 2n
space in my test). Hence, my tests work well for vector (as my PR dealt with vector), but might not for deque
. Since your refactoring aims to generalize these tests for sequence containers, I agree that the changes you made are necessary. Please go ahead and apply the changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah! Thanks a lot for the analysis, that makes sense. This got me thinking and I actually changed the benchmarks for insert back to something that, I think, should be closer to what you were originally trying to achieve. I think that makes more sense. I also fixed the benchmark to avoid destroying the whole container every time, which was probably screwing with the results.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sample output with the latest version of this patch:
# | ---------------------------------------------------------------------------------------------------------------------------------------------
# | Benchmark Time CPU Iterations
# | ---------------------------------------------------------------------------------------------------------------------------------------------
# | std::deque<int>::ctor(size)/1024 157 ns 157 ns 4239906
# | std::deque<int>::ctor(size, value_type) (cheap elements)/1024 149 ns 149 ns 4312708
# | std::deque<int>::ctor(Iterator, Iterator) (cheap elements)/1024 155 ns 155 ns 4268006
# | std::deque<int>::ctor(Range) (cheap elements)/1024 134 ns 134 ns 3759903
# | std::deque<int>::ctor(const&) (cheap elements)/1024 724 ns 724 ns 973669
# | std::deque<int>::operator=(const&) (cheap elements)/1024 59.9 ns 59.9 ns 11704903
# | std::deque<int>::assign(input-iter, input-iter) (full container) (cheap elements)/1024 487 ns 487 ns 1436499
# | std::deque<int>::insert(begin) (cheap elements)/1024 11.0 ns 11.0 ns 64027514
# | std::deque<int>::insert(middle) (cheap elements)/1024 53.6 ns 53.6 ns 13210040
# | std::deque<int>::erase(start) (cheap elements)/1024 11.5 ns 11.5 ns 63959651
# | std::deque<int>::erase(middle) (cheap elements)/1024 45.1 ns 45.1 ns 15448480
# | std::deque<int>::push_back() (cheap elements)/1024 0.754 ns 0.754 ns 929171456
# | std::deque<std::string>::ctor(size)/1024 1121 ns 1121 ns 621079
# | std::deque<std::string>::ctor(size, value_type) (cheap elements)/1024 1480 ns 1479 ns 472201
# | std::deque<std::string>::ctor(size, value_type) (expensive elements)/1024 57387 ns 57309 ns 12082
# | std::deque<std::string>::ctor(Iterator, Iterator) (cheap elements)/1024 1400 ns 1399 ns 514615
# | std::deque<std::string>::ctor(Iterator, Iterator) (expensive elements)/1024 56631 ns 56629 ns 12424
# | std::deque<std::string>::ctor(Range) (cheap elements)/1024 1361 ns 1361 ns 506905
# | std::deque<std::string>::ctor(Range) (expensive elements)/1024 56507 ns 56496 ns 12438
# | std::deque<std::string>::ctor(const&) (cheap elements)/1024 1583 ns 1583 ns 447324
# | std::deque<std::string>::ctor(const&) (expensive elements)/1024 57127 ns 57113 ns 12361
# | std::deque<std::string>::operator=(const&) (cheap elements)/1024 727 ns 727 ns 961169
# | std::deque<std::string>::operator=(const&) (expensive elements)/1024 10100 ns 10099 ns 69139
# | std::deque<std::string>::assign(input-iter, input-iter) (full container) (cheap elements)/1024 1002 ns 1002 ns 698032
# | std::deque<std::string>::assign(input-iter, input-iter) (full container) (expensive elements)/1024 10277 ns 10277 ns 68167
# | std::deque<std::string>::insert(begin) (cheap elements)/1024 15.2 ns 15.2 ns 45803425
# | std::deque<std::string>::insert(begin) (expensive elements)/1024 72.2 ns 72.2 ns 9785420
# | std::deque<std::string>::insert(middle) (cheap elements)/1024 452 ns 452 ns 1550041
# | std::deque<std::string>::insert(middle) (expensive elements)/1024 510 ns 510 ns 1379365
# | std::deque<std::string>::erase(start) (cheap elements)/1024 11.1 ns 11.1 ns 63154096
# | std::deque<std::string>::erase(start) (expensive elements)/1024 60.7 ns 60.7 ns 11413292
# | std::deque<std::string>::erase(middle) (cheap elements)/1024 422 ns 422 ns 1678730
# | std::deque<std::string>::erase(middle) (expensive elements)/1024 459 ns 459 ns 1520440
# | std::deque<std::string>::push_back() (cheap elements)/1024 2.97 ns 2.97 ns 232748032
# | std::deque<std::string>::push_back() (expensive elements)/1024 55.1 ns 55.1 ns 12525568
# | ------------------------------------------------------------------------------------------------------------------------------------------------------------
# | Benchmark Time CPU Iterations
# | ------------------------------------------------------------------------------------------------------------------------------------------------------------
# | std::vector<int>::ctor(size)/1024 62.5 ns 62.5 ns 9578675
# | std::vector<int>::ctor(size, value_type) (cheap elements)/1024 61.6 ns 61.6 ns 11241007
# | std::vector<int>::ctor(Iterator, Iterator) (cheap elements)/1024 76.0 ns 76.0 ns 9375460
# | std::vector<int>::ctor(Range) (cheap elements)/1024 76.3 ns 76.3 ns 9188884
# | std::vector<int>::ctor(const&) (cheap elements)/1024 75.6 ns 75.6 ns 9316812
# | std::vector<int>::operator=(const&) (cheap elements)/1024 52.9 ns 52.9 ns 13133208
# | std::vector<int>::assign(input-iter, input-iter) (full container) (cheap elements)/1024 334 ns 334 ns 2109005
# | std::vector<int>::insert(begin) (cheap elements)/1024 56.3 ns 56.3 ns 12439579
# | std::vector<int>::insert(middle) (cheap elements)/1024 29.2 ns 29.2 ns 23941200
# | std::vector<int>::insert(input-iter, input-iter) (insert at front, no realloc) (cheap elements)/1024 3288 ns 3289 ns 214065
# | std::vector<int>::insert(input-iter, input-iter) (insert at front, half filled) (cheap elements)/1024 1640 ns 1640 ns 445732
# | std::vector<int>::insert(input-iter, input-iter) (insert at front, near full) (cheap elements)/1024 1674 ns 1676 ns 422420
# | std::vector<int>::erase(start) (cheap elements)/1024 61.6 ns 57.8 ns 12691966
# | std::vector<int>::erase(middle) (cheap elements)/1024 29.7 ns 29.6 ns 23644016
# | std::vector<int>::push_back() (cheap elements)/1024 0.324 ns 0.324 ns 2161321984
# | std::vector<int>::push_back() (with reserve) (cheap elements)/1024 0.321 ns 0.321 ns 2143887360
# | std::vector<std::string>::ctor(size)/1024 620 ns 619 ns 1128341
# | std::vector<std::string>::ctor(size, value_type) (cheap elements)/1024 843 ns 843 ns 810917
# | std::vector<std::string>::ctor(size, value_type) (expensive elements)/1024 51571 ns 51558 ns 13618
# | std::vector<std::string>::ctor(Iterator, Iterator) (cheap elements)/1024 975 ns 975 ns 730498
# | std::vector<std::string>::ctor(Iterator, Iterator) (expensive elements)/1024 52427 ns 52413 ns 13518
# | std::vector<std::string>::ctor(Range) (cheap elements)/1024 960 ns 960 ns 730201
# | std::vector<std::string>::ctor(Range) (expensive elements)/1024 51903 ns 51899 ns 13290
# | std::vector<std::string>::ctor(const&) (cheap elements)/1024 968 ns 968 ns 728196
# | std::vector<std::string>::ctor(const&) (expensive elements)/1024 52606 ns 52523 ns 13495
# | std::vector<std::string>::operator=(const&) (cheap elements)/1024 1289 ns 1288 ns 543757
# | std::vector<std::string>::operator=(const&) (expensive elements)/1024 10153 ns 10153 ns 68801
# | std::vector<std::string>::assign(input-iter, input-iter) (full container) (cheap elements)/1024 804 ns 804 ns 872296
# | std::vector<std::string>::assign(input-iter, input-iter) (full container) (expensive elements)/1024 10333 ns 10329 ns 68142
# | std::vector<std::string>::insert(begin) (cheap elements)/1024 806 ns 774 ns 949165
# | std::vector<std::string>::insert(begin) (expensive elements)/1024 810 ns 809 ns 852038
# | std::vector<std::string>::insert(middle) (cheap elements)/1024 372 ns 372 ns 1885131
# | std::vector<std::string>::insert(middle) (expensive elements)/1024 430 ns 430 ns 1642329
# | std::vector<std::string>::insert(input-iter, input-iter) (insert at front, no realloc) (cheap elements)/1024 2633 ns 2635 ns 266754
# | std::vector<std::string>::insert(input-iter, input-iter) (insert at front, no realloc) (expensive elements)/1024 30652 ns 30642 ns 22873
# | std::vector<std::string>::insert(input-iter, input-iter) (insert at front, half filled) (cheap elements)/1024 3683 ns 3686 ns 193378
# | std::vector<std::string>::insert(input-iter, input-iter) (insert at front, half filled) (expensive elements)/1024 67043 ns 67040 ns 10477
# | std::vector<std::string>::insert(input-iter, input-iter) (insert at front, near full) (cheap elements)/1024 5045 ns 5049 ns 139478
# | std::vector<std::string>::insert(input-iter, input-iter) (insert at front, near full) (expensive elements)/1024 79903 ns 79890 ns 8747
# | std::vector<std::string>::erase(start) (cheap elements)/1024 768 ns 767 ns 914782
# | std::vector<std::string>::erase(start) (expensive elements)/1024 825 ns 825 ns 857391
# | std::vector<std::string>::erase(middle) (cheap elements)/1024 387 ns 387 ns 1815847
# | std::vector<std::string>::erase(middle) (expensive elements)/1024 437 ns 437 ns 1542081
# | std::vector<std::string>::push_back() (cheap elements)/1024 0.995 ns 0.994 ns 700705792
# | std::vector<std::string>::push_back() (expensive elements)/1024 50.7 ns 50.7 ns 13876224
# | std::vector<std::string>::push_back() (with reserve) (cheap elements)/1024 1.00 ns 1.00 ns 705978368
# | std::vector<std::string>::push_back() (with reserve) (expensive elements)/1024 50.4 ns 50.4 ns 13864960
# `-----------------------------
# | --------------------------------------------------------------------------------------------------------------------------------------------
# | Benchmark Time CPU Iterations
# | --------------------------------------------------------------------------------------------------------------------------------------------
# | std::list<int>::ctor(size)/1024 19568 ns 19557 ns 35065
# | std::list<int>::ctor(size, value_type) (cheap elements)/1024 19848 ns 19757 ns 35019
# | std::list<int>::ctor(Iterator, Iterator) (cheap elements)/1024 20111 ns 19880 ns 34990
# | std::list<int>::ctor(Range) (cheap elements)/1024 19918 ns 19650 ns 36775
# | std::list<int>::ctor(const&) (cheap elements)/1024 19820 ns 19805 ns 34853
# | std::list<int>::operator=(const&) (cheap elements)/1024 1048 ns 1047 ns 662578
# | std::list<int>::assign(input-iter, input-iter) (full container) (cheap elements)/1024 950 ns 950 ns 743408
# | std::list<int>::insert(begin) (cheap elements)/1024 16.2 ns 16.2 ns 43460175
# | std::list<int>::erase(start) (cheap elements)/1024 15.7 ns 15.7 ns 44239958
# | std::list<int>::push_back() (cheap elements)/1024 19.5 ns 19.4 ns 35828736
# | std::list<std::string>::ctor(size)/1024 19835 ns 19826 ns 33362
# | std::list<std::string>::ctor(size, value_type) (cheap elements)/1024 21075 ns 20823 ns 35994
# | std::list<std::string>::ctor(size, value_type) (expensive elements)/1024 73713 ns 73696 ns 9104
# | std::list<std::string>::ctor(Iterator, Iterator) (cheap elements)/1024 20894 ns 20821 ns 33653
# | std::list<std::string>::ctor(Iterator, Iterator) (expensive elements)/1024 76080 ns 76065 ns 9481
# | std::list<std::string>::ctor(Range) (cheap elements)/1024 20677 ns 20675 ns 33837
# | std::list<std::string>::ctor(Range) (expensive elements)/1024 77519 ns 77359 ns 9388
# | std::list<std::string>::ctor(const&) (cheap elements)/1024 21527 ns 21519 ns 33310
# | std::list<std::string>::ctor(const&) (expensive elements)/1024 78045 ns 78042 ns 8960
# | std::list<std::string>::operator=(const&) (cheap elements)/1024 2194 ns 2192 ns 365202
# | std::list<std::string>::operator=(const&) (expensive elements)/1024 11145 ns 11141 ns 63478
# | std::list<std::string>::assign(input-iter, input-iter) (full container) (cheap elements)/1024 1040 ns 1040 ns 671566
# | std::list<std::string>::assign(input-iter, input-iter) (full container) (expensive elements)/1024 10672 ns 10672 ns 66023
# | std::list<std::string>::insert(begin) (cheap elements)/1024 17.5 ns 17.5 ns 40064332
# | std::list<std::string>::insert(begin) (expensive elements)/1024 67.4 ns 67.4 ns 10386219
# | std::list<std::string>::erase(start) (cheap elements)/1024 20.6 ns 20.6 ns 35532093
# | std::list<std::string>::erase(start) (expensive elements)/1024 67.1 ns 67.1 ns 10526632
# | std::list<std::string>::push_back() (cheap elements)/1024 22.0 ns 22.0 ns 30441472
# | std::list<std::string>::push_back() (expensive elements)/1024 72.5 ns 72.5 ns 9376768
# `-----------------------------
} | ||
|
||
template <class Container> | ||
void BM_insert_input_iter_with_reserve_half_filled(benchmark::State& st) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My version of the benchmark does insert count / 2
elements when it constructs the vector in the first place:
Container c(count / 2);
I then reserve space for count
elements, which ensures the vector has count / 2
elements at the front but enough capacity for count
elements total. That is, unless I've made a big mistake.
In contrast, your benchmark was reserving and then assigning, which will not take advantage of the additional capacity. That is because assignment will replace the underlying buffer, in this case effectively shrinking the vector. Do you agree?
If that's the case, then I think perhaps this version of the benchmark is more correct than the previous one, but I'd like to know your thoughts on this.
Rewrite the sequence container benchmarks to only rely on the actual operations specified in SequenceContainer requirements and add benchmarks for std::list, which is also considered a sequence container. One of the major goals of this refactoring is also to make these container benchmarks run faster so that they can be run more frequently. The existing benchmarks have the significant problem that they take so long to run that they must basically be run overnight. This patch reduces the size of inputs such that the rewritten benchmarks each take at most a minute to run. This patch doesn't touch the string benchmarks, which were not using the generic container benchmark functions previously.
…rks (with the goal of benchmarking SSO vs non-SSO values mainly)
8ce6d7f
to
345ac67
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM assuming the answer to the comment is yes.
c.insert(c.begin(), cpp17_input_iterator(first), cpp17_input_iterator(last)); | ||
DoNotOptimizeData(c); | ||
|
||
st.PauseTiming(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Forgot to remove? Same in the next two.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, I purposefully left these calls to PauseTiming()
here because we are both inserting and erasing more than 1 elements. Thus, the operation being measured (c.insert(c.begin(), cpp17_input_iterator(first), cpp17_input_iterator(last));
) should have a reasonably significant duration, making PauseTiming()
's latency negligible in comparison. Similarly, the call to erase
will remove a lot of elements from the vector, so failure to ignore that latency with pause/resume would add a lot of noise.
For benchmarks where we measure an operation on a single element (such as insert
above) and where we then erase a single element to shrink the container back, I have omitted PauseTiming()
and ResumeTiming()
for the reasons we previously discussed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM with green CI.
template <class Container, class GenInputs> | ||
void BM_Find(benchmark::State& st, Container c, GenInputs gen) { | ||
auto in = gen(st.range(0)); | ||
c.insert(in.begin(), in.end()); | ||
benchmark::DoNotOptimize(&(*c.begin())); | ||
const auto end = in.data() + in.size(); | ||
while (st.KeepRunning()) { | ||
for (auto it = in.data(); it != end; ++it) { | ||
benchmark::DoNotOptimize(&(*c.find(*it))); | ||
} | ||
benchmark::ClobberMemory(); | ||
} | ||
} | ||
|
||
template <class Container, class GenInputs> | ||
void BM_FindRehash(benchmark::State& st, Container c, GenInputs gen) { | ||
c.rehash(8); | ||
auto in = gen(st.range(0)); | ||
c.insert(in.begin(), in.end()); | ||
benchmark::DoNotOptimize(&(*c.begin())); | ||
const auto end = in.data() + in.size(); | ||
while (st.KeepRunning()) { | ||
for (auto it = in.data(); it != end; ++it) { | ||
benchmark::DoNotOptimize(&(*c.find(*it))); | ||
} | ||
benchmark::ClobberMemory(); | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a specific reason to use in.data()
instead of in.begin()
? Note that data()
is available only in std::vector
and std::basic_string
. Many other sequence containers, such as std::deque
and std::vector<bool>
, do not have data()
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
First, these benchmarks were actually pre-existing and will eventually be refactored when I look into unordered containers.
As to your question, I think the reason these benchmarks were originally written with .data()
is to get a raw pointer instead of a container iterator, which is free to be more complex than a raw pointer and may not optimize the same (although in practice they almost always should).
Merging, @philnik777 let me know if you want me to change something post-commit per the discussion above. |
Rewrite the sequence container benchmarks to only rely on the actual operations specified in SequenceContainer requirements and add benchmarks for std::list, which is also considered a sequence container.
One of the major goals of this refactoring is also to make these container benchmarks run faster so that they can be run more frequently. The existing benchmarks have the significant problem that they take so long to run that they must basically be run overnight. This patch reduces the size of inputs such that the rewritten benchmarks each take at most a minute to run.
This patch doesn't touch the string benchmarks, which were not using the generic container benchmark functions previously.