MSM performance benefits for Groth16 #54

waamm · 2024-12-26T19:51:21Z

Hi! After quickly replacing the VariableBaseMSM::multi_scalar_mul invocations here with msm_cuda::multi_scalar_mult_arkworks, I am not noticing any performance difference. Does that make sense, or am I using it incorrectly? I'm using a Tesla T4.

Edit: Running cargo build --release --features=bn254 -vv shows a lot of stable-x86_64-unknown-linux-gnu... and after installing nvidia-cuda-toolkit, I think I am having gcc incompatibility issues

Edit2: It compiled now after updating CUDA. Benchmark is failing:

     Running benches/msm.rs (target/release/deps/msm-933742322995d626)
Benchmarking CUDA/2**23: Warming up for 3.0000 serror: bench failed, to rerun pass `--bench msm`

Caused by:
  process didn't exit successfully: `/home/wicher/sppark/poc/msm-cuda/target/release/deps/msm-933742322995d626 --bench` (signal: 11, SIGSEGV: invalid memory reference)

Edit3: Seems to work now for some reason. Is there a specific reason this library uses v0.3 of arkworks?

The text was updated successfully, but these errors were encountered:

dot-asm · 2025-01-09T18:34:44Z

Seems to work now for some reason. Is there a specific reason this library uses v0.3 of arkworks?

This is not true. This library has literally no dependencies (build-dependencies don't count). Arkworks is used in the test suite purely for verification purposes.

dot-asm · 2025-01-09T18:47:16Z

After quickly replacing the VariableBaseMSM::multi_scalar_mul invocations here with msm_cuda::multi_scalar_mult_arkworks, I am not noticing any performance difference.

The trouble is that any "quick" thing is more likely to fail to deliver the expected improvement. At least based on what we've seen. Orchestrating the data flow and more tight integration with application is the key. See https://github.com/supranational/supra_seal/tree/main/c2 for an example. Well, "see" is a bit of a misnomer, because you're unlikely to figure it out just like that. The keyword is rather that it takes over a bigger section of the prover in order to to do the "orchestrating and tight integration" thing.

dot-asm · 2025-01-09T19:03:02Z

Though there is another caveat. Current MSM implementation is oversensitive to repeating bit patterns in scalars. I mean if scalars have some bit structure, and some provers were observed to produce more repetitive patterns than others, current MSM implementation would tend to underperform. The reason for why the issue is not resolved is because there was no strong motivating factor so far...

dot-asm · 2025-01-09T19:12:03Z

See https://github.com/supranational/supra_seal/tree/main/c2 for an example.

Just in case, this is Groth16, a demanding one, operating on vectors with ~2**27 elements. And the payoff was absolutely worth the effort.

waamm changed the title ~~MSM performance benefits~~ MSM performance benefits for Groth16 Dec 26, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MSM performance benefits for Groth16 #54

MSM performance benefits for Groth16 #54

waamm commented Dec 26, 2024 •

edited

Loading

dot-asm commented Jan 9, 2025

dot-asm commented Jan 9, 2025 •

edited

Loading

dot-asm commented Jan 9, 2025

dot-asm commented Jan 9, 2025

MSM performance benefits for Groth16 #54

MSM performance benefits for Groth16 #54

Comments

waamm commented Dec 26, 2024 • edited Loading

dot-asm commented Jan 9, 2025

dot-asm commented Jan 9, 2025 • edited Loading

dot-asm commented Jan 9, 2025

dot-asm commented Jan 9, 2025

waamm commented Dec 26, 2024 •

edited

Loading

dot-asm commented Jan 9, 2025 •

edited

Loading