You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Here we introduce some techniques to construct a completely static and efficient Map:
No initialization, no parsing, no memory allocation
Keep fast compilation and small size
Efficient static queryable Map
MPHF: Minimal Perfect Hash Function, a technique for construct efficient static Maps
The hashmap has a time complexity of O(1) and does not take up any extra space,
but this is only true when there is no key conflict.
And phf can guarantee this, make the hashmap always perfect and compact.
The simplest MPHF
the MPHF is defined as a completely injective function (aka bijection)
that can map a specified set of keys one-to-one to a set of values of the same size.
The simplest MPHF can be constructed as follows:
f(x) = h(seed, x) mod n
Rotate the seed
Until there is a seed that can make all x generated hash mod n have no conflict
But for large number of keys, find a good seed in a reasonable time is almost impossible.
The practical MPHF
Practical MPHF implementations usually use buckets to resolve conflicts
Divide keys into multiple buckets and record a value in each bucket is called displace
When a bucket conflicts, modify the displace value
Use the displace value to map the keys in bucket
Repeat until all buckets no conflict
The newest MPHF: PtrHash
Ptrhash has many improvements over PtHash,
the most interesting of which is the use of cuckoo hash strategy for bucket adjustment.
Cuckoo hash is a classic hashmap conflict resolve algorithm
When a key conflict occurs, the new key will kick out old key, and old key will be reassigned index use new hash function
If a conflict occurs when old key is reassigned index, repeat the above steps
This sounds like 鸠占鹊巢, so it is called Cuckoo hash
The efficient collision resolve strategy allows ptrhash to use a lower-quality (faster) hash algorithm
and use only 1byte for each displace, which reduces time and space overhead compared to traditional solutions.
Benefits of Cuckoo hash strategy
Kick out old keys and rearrange them sounds very inefficient, why is this a good construction strategy?
The basic idea is:
Under appropriate load, there is a high probability that there is an arrangement that can accommodate all keys
But in traditional hashmap, the position of old bucket will not change once it is generated
This is probably not best arrangement, especially the displace of bucket inserted early is basically 0
The kick out strategy of Cuckoo hash can give the old bucket a chance to rearrange its arrangement
Slow and bloat...
MPHF solves the data query problem,
but if we write code like the following, we will find that it compiles very slow and size is very large.
staticNAMES:&[&str] = &["alice","bob",// ...];
What's the problem?
&[&str] is bad
About &str
If have some understand of Rust's memory layout, it's easy to notice that &str is essentially (*const u8, usize).
This means that in &[&str], each string has an additional size of 16 bytes.
for short strings, the binary size overhead of its index is even greater than the string itself.
But that’s not all
About relocation
I constructed a program that uses a lot of &[&str] and printed the section headers and found
9 .rela.dyn 00499248 0000000000000fd8 DATA
11 .rodata 00419c10 000000000049a240 DATA
24 .data.rel.ro 00496b80 00000000009042a8 DATA
Its largest section is not .rodata which is used to store constant data, but .rela.dyn and .data.rel.ro!
Take a look at the global variable NAMES we defined. This is a list of &str, which contains pointers to strings.
but before so is loaded, its base address is still unknown.
How can the compiler generate a pointer list that is still uncertain?
The answer is that the dynamic linker needs to adjust the data when so is loaded, and this process is called relocation.
When compile, the linker generates .data.rel.ro and .rela.dyn sections
.data.rel.ro stores readonly data that can only be changed in the relocation
And .rela.dyn stores metadata of data that needs relocation
When so is loaded, the dynamic linker traverses the entire .rela.dyn section to adjust the .data.rel.ro section (and others)
This makes each string index cost only 4byte and completely eliminates the relocation overhead.
Everything is in .rodata and can be accessed directly through relative addresses.
String pool
The Position index solution is compact, and we can use the Elias-Fano technique to make it even more compact.
However, simply concatenat strings will prevent the compiler from merge repeated strings,
which makes it perform worse in specific cases.
Looking at following code,
we can observe that the two string addresses of S[0] and S[1] are exactly the same,
proving that the compiler merged them.
As a space-saving trick, we can pack the offset and length into a single u32 using bitpack.
This makes it the same as position index solution, with an index overhead of only 4byte per string,
but at the cost of limit the maximum length of a single string to 255.
Compile speed
As an added benefit of string packing, this also significantly improves compile time and memory usage.
The three phases that rustc spends a lot of time on with the naive &[&str] solution
are reduced to negligible amounts of time after string pack.
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
What do you do when you need to embed a large amount of queryable data statically in your application?
A common style is to embed json data directly and then parse it during initial access:
But this has some disadvantages:
Here we introduce some techniques to construct a completely static and efficient Map:
Efficient static queryable Map
The hashmap has a time complexity of
O(1)
and does not take up any extra space,but this is only true when there is no key conflict.
And phf can guarantee this, make the hashmap always perfect and compact.
The simplest MPHF
the MPHF is defined as a completely injective function (aka bijection)
that can map a specified set of keys one-to-one to a set of values of the same size.
The simplest MPHF can be constructed as follows:
f(x) = h(seed, x) mod n
seed
that can make allx
generatedhash mod n
have no conflictBut for large number of keys, find a good seed in a reasonable time is almost impossible.
The practical MPHF
Practical MPHF implementations usually use buckets to resolve conflicts
The newest MPHF: PtrHash
Ptrhash has many improvements over PtHash,
the most interesting of which is the use of cuckoo hash strategy for bucket adjustment.
Cuckoo hash is a classic hashmap conflict resolve algorithm
The efficient collision resolve strategy allows ptrhash to use a lower-quality (faster) hash algorithm
and use only 1byte for each displace, which reduces time and space overhead compared to traditional solutions.
Slow and bloat...
MPHF solves the data query problem,
but if we write code like the following, we will find that it compiles very slow and size is very large.
What's the problem?
&[&str]
is badAbout
&str
If have some understand of Rust's memory layout, it's easy to notice that
&str
is essentially(*const u8, usize)
.This means that in
&[&str]
, each string has an additional size of 16 bytes.for short strings, the binary size overhead of its index is even greater than the string itself.
But that’s not all
About relocation
I constructed a program that uses a lot of
&[&str]
and printed the section headers and foundIts largest section is not
.rodata
which is used to store constant data, but.rela.dyn
and.data.rel.ro
!Take a look at the global variable
NAMES
we defined. This is a list of&str
, which contains pointers to strings.but before so is loaded, its base address is still unknown.
How can the compiler generate a pointer list that is still uncertain?
The answer is that the dynamic linker needs to adjust the data when so is loaded, and this process is called relocation.
.data.rel.ro
and.rela.dyn
sections.data.rel.ro
stores readonly data that can only be changed in the relocation.rela.dyn
stores metadata of data that needs relocation.rela.dyn
section to adjust the.data.rel.ro
section (and others)This means that each string in
&[&str]
has an additional 24 bytes of size overhead,and will affect the startup performance of program!
String pack
Position index
Since compiler generated string indexes has so much overhead,
we can pack strings and create indexes manually.
A simple solution is to concatenate all the strings and then use the string end position as the index.
This makes each string index cost only 4byte and completely eliminates the relocation overhead.
.rodata
and can be accessed directly through relative addresses.String pool
The Position index solution is compact, and we can use the Elias-Fano technique to make it even more compact.
However, simply concatenat strings will prevent the compiler from merge repeated strings,
which makes it perform worse in specific cases.
Looking at following code,
we can observe that the two string addresses of
S[0]
andS[1]
are exactly the same,proving that the compiler merged them.
We can use string pool to manually merge strings to avoid this shortcoming.
As a space-saving trick, we can pack the offset and length into a single u32 using bitpack.
This makes it the same as position index solution, with an index overhead of only 4byte per string,
but at the cost of limit the maximum length of a single string to 255.
Compile speed
As an added benefit of string packing, this also significantly improves compile time and memory usage.
The three phases that rustc spends a lot of time on with the naive
&[&str]
solutionare reduced to negligible amounts of time after string pack.
before:
after:
The string pack saves the compiler from have to deal with a lot of objects and from have to do a lot of unnecessary optimizations.
Conclusions
Traditionally, embedding static hashmaps has various costs,
but the MPHF and string pack techniques we introduced perform well in all aspects.
&[&str]
Beta Was this translation helpful? Give feedback.
All reactions