rpc: use `XXHash64` instead of `FNV-1a` for hashing tensors #16753

jukofyork · 2025-10-24T10:18:31Z

This PR replaces the very slow FNV-1a hash (which processes each byte separately), with XXHash64 which processes blocks of 4x 8-bytes (64bit words) in a loop that the compiler should be able to unroll to process the bulk of the data in 32-byte blocks:

const uint64_t* block = reinterpret_cast<const uint64_t*>(p);
for (int i = 0; i < 4; ++i) {
    uint64_t v = state[i] + block[i] * prime2;
    state[i] = ((v << 31) | (v >> (64 - 31))) * prime1;
}
p += 32;

NOTE: I did try manually unrolling the loop:

Spolier

      const uint64_t* w       = reinterpret_cast<const uint64_t*>(p);
      const uint64_t* wLimit  = reinterpret_cast<const uint64_t*>(end - 32); // last start for a full 32-byte stripe
      for (; w <= wLimit; w += 4) {
          // unrolled lanes
          uint64_t v0 = state[0] + w[0] * prime2;
          state[0] = ((v0 << 31) | (v0 >> (64 - 31))) * prime1;

          uint64_t v1 = state[1] + w[1] * prime2;
          state[1] = ((v1 << 31) | (v1 >> (64 - 31))) * prime1;

          uint64_t v2 = state[2] + w[2] * prime2;
          state[2] = ((v2 << 31) | (v2 >> (64 - 31))) * prime1;

          uint64_t v3 = state[3] + w[3] * prime2;
          state[3] = ((v3 << 31) | (v3 >> (64 - 31))) * prime1;
      }
      // advance byte pointer by the processed stripes (32 bytes per iteration)
      p = reinterpret_cast<const uint8_t*>(w);

but it didn't seem to help.

I haven't done any benchmarking, but it is clearly many times faster and in line with the reported benchmarks:

A few notes:

It assumes all will use the same Endianness.
It assumes the input uint8_t* data is aligned to 8-byte boundaries.
It was only really possible to figure out using this blog post as the original author's code is near impenetrable...
Any old FNV-1a-hashed tensors will still live on until the .cache/llama.cpp/rpc/ folder is cleared, but otherwise I don't think there will be any problems with 64bit FNV-1a <--> XXHash64 collisions (ie: the chance of this will be miniscule...).

NOTE: I haven't had time yet to compare against the reference implementation, so leaving this as a draft for now.

@rgerganov can you give this a try and see what you think?

rgerganov · 2025-10-24T12:46:45Z

Thanks for the patch, I will benchmark this on several RPC setups and get back to you.

Switched to use XXHash64 instead of FNV-1a in ggml-rpc.cpp

9df3dfd

github-actions bot added the ggml changes relating to the ggml tensor library for machine learning label Oct 24, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

rpc: use `XXHash64` instead of `FNV-1a` for hashing tensors #16753

rpc: use `XXHash64` instead of `FNV-1a` for hashing tensors #16753

jukofyork commented Oct 24, 2025 •

edited

Loading

Uh oh!

rgerganov commented Oct 24, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

rpc: use XXHash64 instead of FNV-1a for hashing tensors #16753

Are you sure you want to change the base?

rpc: use XXHash64 instead of FNV-1a for hashing tensors #16753

Conversation

jukofyork commented Oct 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rgerganov commented Oct 24, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

rpc: use `XXHash64` instead of `FNV-1a` for hashing tensors #16753

rpc: use `XXHash64` instead of `FNV-1a` for hashing tensors #16753

jukofyork commented Oct 24, 2025 •

edited

Loading