How to embed a HashMap with lots of strings in program #27

quininer · 2025-07-07T08:03:06Z

quininer
Jul 7, 2025

What do you do when you need to embed a large amount of queryable data statically in your application?

A common style is to embed json data directly and then parse it during initial access:

static MAP: LazyLock<HashMap<&str, usize>> =
    LazyLock::new(|| serde_json::from_str(include_str!("data.json")).unwrap());

But this has some disadvantages:

Need to block the call on the first access
Need to allocate never release resident memory
Also need to introduce a complex json parser

Here we introduce some techniques to construct a completely static and efficient Map:

No initialization, no parsing, no memory allocation
Keep fast compilation and small size

Efficient static queryable Map

MPHF: Minimal Perfect Hash Function, a technique for construct efficient static Maps

The hashmap has a time complexity of O(1) and does not take up any extra space,
but this is only true when there is no key conflict.

And phf can guarantee this, make the hashmap always perfect and compact.

The simplest MPHF

the MPHF is defined as a completely injective function (aka bijection)
that can map a specified set of keys one-to-one to a set of values of the same size.

The simplest MPHF can be constructed as follows:

f(x) = h(seed, x) mod n
Rotate the seed
Until there is a seed that can make all x generated hash mod n have no conflict

But for large number of keys, find a good seed in a reasonable time is almost impossible.

The practical MPHF

Practical MPHF implementations usually use buckets to resolve conflicts

Divide keys into multiple buckets and record a value in each bucket is called displace
When a bucket conflicts, modify the displace value
- Use the displace value to map the keys in bucket
- Repeat until all buckets no conflict

The newest MPHF: PtrHash

Ptrhash has many improvements over PtHash,
the most interesting of which is the use of cuckoo hash strategy for bucket adjustment.

Cuckoo hash is a classic hashmap conflict resolve algorithm

When a key conflict occurs, the new key will kick out old key, and old key will be reassigned index use new hash function
If a conflict occurs when old key is reassigned index, repeat the above steps
This sounds like 鸠占鹊巢, so it is called Cuckoo hash

The efficient collision resolve strategy allows ptrhash to use a lower-quality (faster) hash algorithm
and use only 1byte for each displace, which reduces time and space overhead compared to traditional solutions.

Benefits of Cuckoo hash strategy

Kick out old keys and rearrange them sounds very inefficient, why is this a good construction strategy?

The basic idea is:

Under appropriate load, there is a high probability that there is an arrangement that can accommodate all keys

But in traditional hashmap, the position of old bucket will not change once it is generated

This is probably not best arrangement, especially the displace of bucket inserted early is basically 0

The kick out strategy of Cuckoo hash can give the old bucket a chance to rearrange its arrangement

Slow and bloat...

MPHF solves the data query problem,
but if we write code like the following, we will find that it compiles very slow and size is very large.

static NAMES: &[&str] = &[
    "alice", "bob", // ...
];

What's the problem?

`&[&str]` is bad

About `&str`

If have some understand of Rust's memory layout, it's easy to notice that &str is essentially (*const u8, usize).

This means that in &[&str], each string has an additional size of 16 bytes.
for short strings, the binary size overhead of its index is even greater than the string itself.

But that’s not all

About relocation

I constructed a program that uses a lot of &[&str] and printed the section headers and found

  9 .rela.dyn          00499248 0000000000000fd8 DATA
 11 .rodata            00419c10 000000000049a240 DATA
 24 .data.rel.ro       00496b80 00000000009042a8 DATA

Its largest section is not .rodata which is used to store constant data, but .rela.dyn and .data.rel.ro!

Take a look at the global variable NAMES we defined. This is a list of &str, which contains pointers to strings.
but before so is loaded, its base address is still unknown.
How can the compiler generate a pointer list that is still uncertain?

The answer is that the dynamic linker needs to adjust the data when so is loaded, and this process is called relocation.

When compile, the linker generates .data.rel.ro and .rela.dyn sections
- .data.rel.ro stores readonly data that can only be changed in the relocation
- And .rela.dyn stores metadata of data that needs relocation
When so is loaded, the dynamic linker traverses the entire .rela.dyn section to adjust the .data.rel.ro section (and others)

typedef struct {
  Elf64_Addr    r_offset;   // Address
  Elf64_Xword   r_info;     // 32-bit relocation type; 32-bit symbol index
  Elf64_Sxword  r_addend;   // Addend
} Elf64_Rela;

This means that each string in &[&str] has an additional 24 bytes of size overhead,
and will affect the startup performance of program!

ELF's new relocation format RELR can greatly improve this problem, reduce the size overhead of each string to only 2 bits.
see https://maskray.me/blog/2021-10-31-relative-relocations-and-relr

The relocation usage on Mac and Windows is slightly better than ELF's RELA format, but not as good as the new RELR format

String pack

Position index

Since compiler generated string indexes has so much overhead,
we can pack strings and create indexes manually.

A simple solution is to concatenate all the strings and then use the string end position as the index.

const STRINGS: &str = include_str!("string.bin");
const INDEX: &[u32] = [
    4, 10, 15, // ...
];

fn get(index: usize) -> Option<&'static str> {
    let end = INDEX.get(index)? as usize;
    let start = index.checked_sub(1)
        .map(|i| INDEX[i])
        .unwrap_or_default() as usize;
    Some(&STRINGS[start..end])
}

This makes each string index cost only 4byte and completely eliminates the relocation overhead.

Everything is in .rodata and can be accessed directly through relative addresses.

String pool

The Position index solution is compact, and we can use the Elias-Fano technique to make it even more compact.
However, simply concatenat strings will prevent the compiler from merge repeated strings,
which makes it perform worse in specific cases.

Looking at following code,
we can observe that the two string addresses of S[0] and S[1] are exactly the same,
proving that the compiler merged them.

    static S: &[&str] = &[
        "alice", "alice"
    ];

    assert_eq!(S[0].as_ptr(), S[1].as_ptr());

We can use string pool to manually merge strings to avoid this shortcoming.

#[derive(Default)]
struct StrPool<'s> {
    pool: String,
    map: HashMap<&'s str, u32>,
}

impl<'s> StrPool<'s> {
    pub fn insert(&mut self, s: &'s str) -> u32 {
        *self.map.entry(s).or_insert_with(|| {
            let offset = self.pool.len();
            self.pool.push_str(&s);
            let len: u8 = (self.pool.len() - offset).try_into().unwrap();
            let offset: u32 = offset.try_into().unwrap();

            if offset > (1 << 24) {
                panic!("string too large");
            }

            offset | (u32::from(len) << 24)
        })
    }
}

As a space-saving trick, we can pack the offset and length into a single u32 using bitpack.
This makes it the same as position index solution, with an index overhead of only 4byte per string,
but at the cost of limit the maximum length of a single string to 255.

Compile speed

As an added benefit of string packing, this also significantly improves compile time and memory usage.

The three phases that rustc spends a lot of time on with the naive &[&str] solution
are reduced to negligible amounts of time after string pack.

before:

time:   1.414; rss:  258MB ->  761MB ( +503MB)        type_check_crate
time:   2.325; rss:  712MB ->  651MB (  -61MB)        LLVM_passes
time:   2.217; rss:  476MB ->  597MB ( +121MB)        finish_ongoing_codegen

after:

time:   0.022; rss:  107MB ->  142MB (  +35MB)        type_check_crate
time:   0.100; rss:  194MB ->  183MB (  -11MB)        LLVM_passes
time:   0.098; rss:  159MB ->  177MB (  +18MB)        finish_ongoing_codegen

The string pack saves the compiler from have to deal with a lot of objects and from have to do a lot of unnecessary optimizations.

Conclusions

Traditionally, embedding static hashmaps has various costs,
but the MPHF and string pack techniques we introduced perform well in all aspects.

	startup time	query time	memory usage	size usage	compile speed
lazy + json parse	O(N)	Θ(1)	O(N)	compact	fast
mphf + `&[&str]`	_	O(1)	_	bloat	slow
mphf + string pack	_	O(1)	_	compact	fast

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Web Infra

How to embed a HashMap with lots of strings in program #27

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

Web Infra

How to embed a HashMap with lots of strings in program #27

Uh oh!

Uh oh!

quininer Jul 7, 2025

Efficient static queryable Map

The simplest MPHF

The practical MPHF

The newest MPHF: PtrHash

Slow and bloat...

&[&str] is bad

About &str

About relocation

String pack

Position index

String pool

Compile speed

Conclusions

Replies: 0 comments

quininer
Jul 7, 2025

`&[&str]` is bad

About `&str`