Skip to content

Commit dae346d

Browse files
committed
fixes is_avalanching default, adds documentation for hashing
Removes is_avalanching from default implementation that uses std::hash. That would inherit the is_avalanching when the user creates a specialization. Adds an example test with custom hashes. Describes hashing in the README.md in a special section, with examples
1 parent 1371371 commit dae346d

8 files changed

+229
-26
lines changed

CMakeLists.txt

+1-1
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
cmake_minimum_required(VERSION 3.12)
22
project("unordered_dense"
3-
VERSION 1.3.1
3+
VERSION 1.3.2
44
DESCRIPTION "A fast & densely stored hashmap and hashset based on robin-hood backward shift deletion"
55
HOMEPAGE_URL "https://github.com/martinus/unordered_dense")
66

README.md

+130-16
Original file line numberDiff line numberDiff line change
@@ -16,14 +16,20 @@ The classes `ankerl::unordered_dense::map` and `ankerl::unordered_dense::set` ar
1616
- [2. Installation](#2-installation)
1717
- [2.1. Installing using cmake](#21-installing-using-cmake)
1818
- [3. Extensions](#3-extensions)
19-
- [3.1. Container API](#31-container-api)
20-
- [3.1.1. `auto extract() && -> value_container_type`](#311-auto-extract----value_container_type)
21-
- [3.1.2. `[[nodiscard]] auto values() const noexcept -> value_container_type const&`](#312-nodiscard-auto-values-const-noexcept---value_container_type-const)
22-
- [3.1.3. `auto replace(value_container_type&& container)`](#313-auto-replacevalue_container_type-container)
23-
- [3.2. Custom Container Types](#32-custom-container-types)
24-
- [3.3. Custom Bucket Tyeps](#33-custom-bucket-tyeps)
25-
- [3.3.1. `ankerl::unordered_dense::bucket_type::standard`](#331-ankerlunordered_densebucket_typestandard)
26-
- [3.3.2. `ankerl::unordered_dense::bucket_type::big`](#332-ankerlunordered_densebucket_typebig)
19+
- [3.1. Hash](#31-hash)
20+
- [A Simple Hash](#a-simple-hash)
21+
- [A High Quality Hash](#a-high-quality-hash)
22+
- [Specialize `ankerl::unordered_dense::hash`](#specialize-ankerlunordered_densehash)
23+
- [Automatic Fallback to `std::hash`](#automatic-fallback-to-stdhash)
24+
- [Hash the Whole Memory](#hash-the-whole-memory)
25+
- [3.2. Container API](#32-container-api)
26+
- [3.2.1. `auto extract() && -> value_container_type`](#321-auto-extract----value_container_type)
27+
- [3.2.2. `[[nodiscard]] auto values() const noexcept -> value_container_type const&`](#322-nodiscard-auto-values-const-noexcept---value_container_type-const)
28+
- [3.2.3. `auto replace(value_container_type&& container)`](#323-auto-replacevalue_container_type-container)
29+
- [3.3. Custom Container Types](#33-custom-container-types)
30+
- [3.4. Custom Bucket Tyeps](#34-custom-bucket-tyeps)
31+
- [3.4.1. `ankerl::unordered_dense::bucket_type::standard`](#341-ankerlunordered_densebucket_typestandard)
32+
- [3.4.2. `ankerl::unordered_dense::bucket_type::big`](#342-ankerlunordered_densebucket_typebig)
2733
- [4. Design](#4-design)
2834
- [4.1. Inserts](#41-inserts)
2935
- [4.2. Lookups](#42-lookups)
@@ -79,37 +85,145 @@ target_link_libraries(your_project_name unordered_dense::unordered_dense)
7985

8086
## 3. Extensions
8187

82-
### 3.1. Container API
88+
### 3.1. Hash
89+
90+
`ankerl::unordered_dense::hash` is a fast and high quality hash, based on [wyhash](https://github.com/wangyi-fudan/wyhash). The `ankerl::unordered_dense` map/set differentiates between hashes of high quality (good [avalanching effect](https://en.wikipedia.org/wiki/Avalanche_effect)) and bad quality. Hashes with good quality contain a special marker:
91+
92+
```cpp
93+
using is_avalanching = void;
94+
```
95+
96+
This is the cases for the specializations `bool`, `char`, `signed char`, `unsigned char`, `char8_t`, `char16_t`, `char32_t`, `wchar_t`, `short`, `unsigned short`, `int`, `unsigned int`, `long`, `long long`, `unsigned long`, `unsigned long long`, `T*`, `std::unique_ptr<T>`, `std::shared_ptr<T>`, `enum`, `std::basic_string<C>`, and `std::basic_string_view<C>`.
97+
98+
Hashes that do not contain such a marker are assumed to be of bad quality and receive an additional mixing step inside the map/set implementation.
99+
100+
#### A Simple Hash
101+
102+
Consider a simple custom key type:
103+
104+
```cpp
105+
struct id {
106+
uint64_t value{};
107+
108+
auto operator==(id const& other) const -> bool {
109+
return value == other.value;
110+
}
111+
};
112+
```
113+
114+
The simplest implementation of a hash is this:
115+
116+
```cpp
117+
struct custom_hash_simple {
118+
auto operator()(id const& x) const noexcept -> uint64_t {
119+
return x.value;
120+
}
121+
};
122+
```
123+
This can be used e.g. with
124+
125+
```cpp
126+
auto ids = ankerl::unordered_dense::set<id, custom_hash_simple>();
127+
```
128+
129+
Since `custom_hash_simple` doesn't have a `using is_avalanching = void;` marker it is considered to be of bad quality and additional mixing of `x.value` is automatically provided inside the set.
130+
131+
#### A High Quality Hash
132+
133+
Back to the `id` example, we can easily implement a higher quality hash:
134+
135+
```cpp
136+
struct custom_hash_avalanching {
137+
using is_avalanching = void;
138+
139+
auto operator()(id const& x) const noexcept -> uint64_t {
140+
return ankerl::unordered_dense::detail::wyhash::hash(x.value);
141+
}
142+
};
143+
```
144+
145+
We know `wyhash::hash` is of high quality, so we can add `using is_avalanching = void;` which makes the map/set directly use the returned value.
146+
147+
148+
#### Specialize `ankerl::unordered_dense::hash`
149+
150+
Instead of creating a new class you can also specialize `ankerl::unordered_dense::hash`:
151+
152+
```cpp
153+
template <>
154+
struct ankerl::unordered_dense::hash<id> {
155+
using is_avalanching = void;
156+
157+
[[nodiscard]] auto operator()(id const& x) const noexcept -> uint64_t {
158+
return detail::wyhash::hash(x.value);
159+
}
160+
};
161+
```
162+
163+
#### Automatic Fallback to `std::hash`
164+
165+
When an implementation for `std::hash` of a custom type is available, this is automatically used and assumed to be of bad quality (thus `std::hash` is used, but an additional mixing step is performed).
166+
167+
168+
#### Hash the Whole Memory
169+
170+
When the type [has a unique object representation](https://en.cppreference.com/w/cpp/types/has_unique_object_representations) (no padding, trivially copyable), one can just hash the object's memory. Consider a simple class
171+
172+
```cpp
173+
struct point {
174+
int x{};
175+
int y{};
176+
177+
auto operator==(point const& other) const -> bool {
178+
return x == other.x && y == other.y;
179+
}
180+
};
181+
```
182+
183+
A fast and high quality hash can be easily provided like so:
184+
185+
```cpp
186+
struct custom_hash_unique_object_representation {
187+
using is_avalanching = void;
188+
189+
[[nodiscard]] auto operator()(point const& f) const noexcept -> uint64_t {
190+
static_assert(std::has_unique_object_representations_v<point>);
191+
return ankerl::unordered_dense::detail::wyhash::hash(&f, sizeof(f));
192+
}
193+
};
194+
```
195+
196+
### 3.2. Container API
83197

84198
In addition to the standard `std::unordered_map` API (see https://en.cppreference.com/w/cpp/container/unordered_map) we have additional API leveraging the fact that we're using a random access container internally:
85199

86-
#### 3.1.1. `auto extract() && -> value_container_type`
200+
#### 3.2.1. `auto extract() && -> value_container_type`
87201

88202
Extracts the internally used container. `*this` is emptied.
89203

90-
#### 3.1.2. `[[nodiscard]] auto values() const noexcept -> value_container_type const&`
204+
#### 3.2.2. `[[nodiscard]] auto values() const noexcept -> value_container_type const&`
91205

92206
Exposes the underlying values container.
93207

94-
#### 3.1.3. `auto replace(value_container_type&& container)`
208+
#### 3.2.3. `auto replace(value_container_type&& container)`
95209

96210
Discards the internally held container and replaces it with the one passed. Non-unique elements are
97211
removed, and the container will be partly reordered when non-unique elements are found.
98212

99-
### 3.2. Custom Container Types
213+
### 3.3. Custom Container Types
100214

101215
`unordered_dense` accepts a custom allocator, but you can also specify a custom container for that template argument. That way it is possible to replace the internally used `std::vector` with e.g. `std::deque` or any other container like `boost::interprocess::vector`. This supports fancy pointers (e.g. [offset_ptr](https://www.boost.org/doc/libs/1_80_0/doc/html/interprocess/offset_ptr.html)), so the container can be used with e.g. shared memory provided by `boost::interprocess`.
102216

103-
### 3.3. Custom Bucket Tyeps
217+
### 3.4. Custom Bucket Tyeps
104218

105219
The map/set supports two different bucket types. The default should be good for pretty much everyone.
106220

107-
#### 3.3.1. `ankerl::unordered_dense::bucket_type::standard`
221+
#### 3.4.1. `ankerl::unordered_dense::bucket_type::standard`
108222

109223
* Up to 2^32 = 4.29 billion elements.
110224
* 8 bytes overhead per bucket.
111225

112-
#### 3.3.2. `ankerl::unordered_dense::bucket_type::big`
226+
#### 3.4.2. `ankerl::unordered_dense::bucket_type::big`
113227

114228
* up to 2^63 = 9223372036854775808 elements.
115229
* 12 bytes overhead per bucket.

include/ankerl/unordered_dense.h

+3-4
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
///////////////////////// ankerl::unordered_dense::{map, set} /////////////////////////
22

33
// A fast & densely stored hashmap and hashset based on robin-hood backward shift deletion.
4-
// Version 1.3.1
4+
// Version 1.3.2
55
// https://github.com/martinus/unordered_dense
66
//
77
// Licensed under the MIT License <http://opensource.org/licenses/MIT>.
@@ -32,7 +32,7 @@
3232
// see https://semver.org/spec/v2.0.0.html
3333
#define ANKERL_UNORDERED_DENSE_VERSION_MAJOR 1 // NOLINT(cppcoreguidelines-macro-usage) incompatible API changes
3434
#define ANKERL_UNORDERED_DENSE_VERSION_MINOR 3 // NOLINT(cppcoreguidelines-macro-usage) backwards compatible functionality
35-
#define ANKERL_UNORDERED_DENSE_VERSION_PATCH 1 // NOLINT(cppcoreguidelines-macro-usage) backwards compatible bug fixes
35+
#define ANKERL_UNORDERED_DENSE_VERSION_PATCH 2 // NOLINT(cppcoreguidelines-macro-usage) backwards compatible bug fixes
3636

3737
// API versioning with inline namespace, see https://www.foonathan.net/2018/11/inline-namespaces/
3838
#define ANKERL_UNORDERED_DENSE_VERSION_CONCAT1(major, minor, patch) v##major##_##minor##_##patch
@@ -214,10 +214,9 @@ static inline void mum(uint64_t* a, uint64_t* b) {
214214

215215
template <typename T, typename Enable = void>
216216
struct hash {
217-
using is_avalanching = void;
218217
auto operator()(T const& obj) const noexcept(noexcept(std::declval<std::hash<T>>().operator()(std::declval<T const&>())))
219218
-> uint64_t {
220-
return detail::wyhash::hash(std::hash<T>{}(obj));
219+
return std::hash<T>{}(obj);
221220
}
222221
};
223222

meson.build

+1-1
Original file line numberDiff line numberDiff line change
@@ -18,7 +18,7 @@
1818
#
1919

2020
project('unordered_dense', 'cpp',
21-
version: '1.3.1',
21+
version: '1.3.2',
2222
license: 'MIT',
2323
default_options : ['cpp_std=c++17', 'warning_level=3', 'werror=true'])
2424

test/meson.build

+1
Original file line numberDiff line numberDiff line change
@@ -28,6 +28,7 @@ test_sources = [
2828
'unit/ctors.cpp',
2929
'unit/custom_container_boost.cpp',
3030
'unit/custom_container.cpp',
31+
'unit/custom_hash.cpp',
3132
'unit/deduction_guides.cpp',
3233
'unit/diamond.cpp',
3334
'unit/empty.cpp',

test/unit/custom_hash.cpp

+87
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,87 @@
1+
#include <ankerl/unordered_dense.h>
2+
3+
#include <doctest.h>
4+
#include <type_traits>
5+
6+
namespace {
7+
8+
struct id {
9+
uint64_t value{};
10+
11+
auto operator==(id const& other) const -> bool {
12+
return value == other.value;
13+
}
14+
};
15+
16+
struct custom_hash_simple {
17+
[[nodiscard]] auto operator()(id const& x) const noexcept -> uint64_t {
18+
return x.value;
19+
}
20+
};
21+
22+
struct custom_hash_avalanching {
23+
using is_avalanching = void;
24+
25+
auto operator()(id const& x) const noexcept -> uint64_t {
26+
return ankerl::unordered_dense::detail::wyhash::hash(x.value);
27+
}
28+
};
29+
30+
struct point {
31+
int x{};
32+
int y{};
33+
34+
auto operator==(point const& other) const -> bool {
35+
return x == other.x && y == other.y;
36+
}
37+
};
38+
39+
struct custom_hash_unique_object_representation {
40+
using is_avalanching = void;
41+
42+
[[nodiscard]] auto operator()(point const& f) const noexcept -> uint64_t {
43+
static_assert(std::has_unique_object_representations_v<point>);
44+
return ankerl::unordered_dense::detail::wyhash::hash(&f, sizeof(f));
45+
}
46+
};
47+
48+
} // namespace
49+
50+
template <>
51+
struct ankerl::unordered_dense::hash<id> {
52+
using is_avalanching = void;
53+
54+
[[nodiscard]] auto operator()(id const& x) const noexcept -> uint64_t {
55+
return detail::wyhash::hash(x.value);
56+
}
57+
};
58+
59+
TEST_CASE("custom_hash") {
60+
{
61+
auto set = ankerl::unordered_dense::set<id, custom_hash_simple>();
62+
set.insert(id{124});
63+
}
64+
{
65+
auto set = ankerl::unordered_dense::set<id, custom_hash_avalanching>();
66+
set.insert(id{124});
67+
}
68+
{
69+
auto set = ankerl::unordered_dense::set<point, custom_hash_unique_object_representation>();
70+
set.insert(point{123, 321});
71+
}
72+
{
73+
auto set = ankerl::unordered_dense::set<id>();
74+
set.insert(id{124});
75+
}
76+
}
77+
78+
static_assert(
79+
!ankerl::unordered_dense::detail::is_detected_v<ankerl::unordered_dense::detail::detect_avalanching, custom_hash_simple>);
80+
81+
static_assert(ankerl::unordered_dense::detail::is_detected_v<ankerl::unordered_dense::detail::detect_avalanching,
82+
custom_hash_avalanching>);
83+
static_assert(ankerl::unordered_dense::detail::is_detected_v<ankerl::unordered_dense::detail::detect_avalanching,
84+
custom_hash_unique_object_representation>);
85+
86+
static_assert(!ankerl::unordered_dense::detail::is_detected_v<ankerl::unordered_dense::detail::detect_avalanching,
87+
ankerl::unordered_dense::hash<point>>);

test/unit/namespace.cpp

+3-3
Original file line numberDiff line numberDiff line change
@@ -2,10 +2,10 @@
22

33
#include <doctest.h>
44

5-
static_assert(std::is_same_v<ankerl::unordered_dense::v1_3_1::map<int, int>, ankerl::unordered_dense::map<int, int>>);
6-
static_assert(std::is_same_v<ankerl::unordered_dense::v1_3_1::hash<int>, ankerl::unordered_dense::hash<int>>);
5+
static_assert(std::is_same_v<ankerl::unordered_dense::v1_3_2::map<int, int>, ankerl::unordered_dense::map<int, int>>);
6+
static_assert(std::is_same_v<ankerl::unordered_dense::v1_3_2::hash<int>, ankerl::unordered_dense::hash<int>>);
77

88
TEST_CASE("version_namespace") {
9-
auto map = ankerl::unordered_dense::v1_3_1::map<int, int>{};
9+
auto map = ankerl::unordered_dense::v1_3_2::map<int, int>{};
1010
REQUIRE(map.empty());
1111
}

test/unit/std_hash.cpp

+3-1
Original file line numberDiff line numberDiff line change
@@ -26,6 +26,8 @@ TEST_CASE("std_hash") {
2626
auto f = foo{12345};
2727
REQUIRE(std::hash<foo>{}(f) == 12346U);
2828
// unordered_dense::hash blows that up to 64bit!
29-
REQUIRE(ankerl::unordered_dense::hash<foo>{}(f) == UINT64_C(0x3F645BE4CE24110C));
29+
30+
// Just wraps std::hash
31+
REQUIRE(ankerl::unordered_dense::hash<foo>{}(f) == UINT64_C(12346));
3032
REQUIRE(ankerl::unordered_dense::hash<uint64_t>{}(12346U) == UINT64_C(0x3F645BE4CE24110C));
3133
}

0 commit comments

Comments
 (0)