rpc: use XXHash64 instead of FNV-1a for hashing tensors
          #16753
        
          
      
      
        
          +85
        
        
          −9
        
        
          
        
      
    
  
  Add this suggestion to a batch that can be applied as a single commit.
  This suggestion is invalid because no changes were made to the code.
  Suggestions cannot be applied while the pull request is closed.
  Suggestions cannot be applied while viewing a subset of changes.
  Only one suggestion per line can be applied in a batch.
  Add this suggestion to a batch that can be applied as a single commit.
  Applying suggestions on deleted lines is not supported.
  You must change the existing code in this line in order to create a valid suggestion.
  Outdated suggestions cannot be applied.
  This suggestion has been applied or marked resolved.
  Suggestions cannot be applied from pending reviews.
  Suggestions cannot be applied on multi-line comments.
  Suggestions cannot be applied while the pull request is queued to merge.
  Suggestion cannot be applied right now. Please check back later.
  
    
  
    
This PR replaces the very slow FNV-1a hash (which processes each byte separately), with XXHash64 which processes blocks of 4x 8-bytes (64bit words) in a loop that the compiler should be able to unroll to process the bulk of the data in 32-byte blocks:
NOTE: I did try manually unrolling the loop:
Spolier
but it didn't seem to help.
I haven't done any benchmarking, but it is clearly many times faster and in line with the reported benchmarks:
A few notes:
uint8_t* datais aligned to 8-byte boundaries.FNV-1a-hashed tensors will still live on until the.cache/llama.cpp/rpc/folder is cleared, but otherwise I don't think there will be any problems with 64bitFNV-1a <--> XXHash64collisions (ie: the chance of this will be miniscule...).NOTE: I haven't had time yet to compare against the reference implementation, so leaving this as a draft for now.
@rgerganov can you give this a try and see what you think?