Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Argon2 to JS style scoring #47

Merged
merged 3 commits into from
Feb 26, 2025
Merged

Conversation

kmiller68
Copy link
Contributor

Second try at this.

This version rebuilds the benchmark without the ffi code originally used. This avoids manually instrumenting the bundle code to make it work in JetStream. I verified that both the old SIMD version and this version don't spend significant time in the glue code so changing from the ffi to directly calling the wasm instance isn't changing the workload much. I also didn't recreate the non-SIMD version of the benchmark per our discussions.

I also made a couple of other changes:

  • salts are now "randomly" generated rather than static strings.
  • We default to a time count of 2, previously it was 1 but a few subtests would change that. I picked 2 because 3+ was quite slow. 3+ is what's recommended to prevent side-channel attacks.
  • I reduced the number of passwords checked against. Simply because the test is very long with a lot of passwords.
  • I added more non-ascii passwords
  • Since emscripten doesn't wrap strings with an object that frees them on finalization, which IMO is what I would do I added that. If folks don't like this I ok with going back to manually freeing. The finalizers do run, but I didn't check if it was inside the benchmark window (My assumption is no though since FinalizationRegistries run on the event loop IIRC).
  • The test also iterates through the three argon2 modes. The old test did this for a few runs so I tried to at least somewhat replicate the behavior. I'm a bit unsure on this since I doubt anyone would do this in the wild, you'd probably pick based on the use case and stick with it. On the other hand, maybe it's useful to test each of the different backends.
  • Added a drive by change to the driver so if a startDelay is set it will automatically run the benchmark.

For reference on the last point https://en.wikipedia.org/wiki/Argon2 has a good summary of each of the different modes. If I were to pick only one I would probably go with "Argon2id".

One last thing to note is that the old "startup time" is actually slower than the new "first iteration". This is because the old "startup time" actually ran the "Run time" work too.

Copy link
Contributor

@danleh danleh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could remove wasm/argon2/build-wasm.sh now, right?

const tCost = 2;
const mCost = 1024;
const parallelism = 1;
const argon2NumberOfTypes = 2;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we make it more self-explanatory what the types are, maybe simply with a comment:

// There are three argon2 algorithm variants. We test all three.
// See https://github.com/P-H-C/phc-winner-argon2/blob/f57e61e19229e23c4445b85494dbf7c07de721cb/include/argon2.h#L221. 

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or if we only test a single variant, we could change this to simply

// See https://en.wikipedia.org/wiki/Argon2 for the three algorithm variants.
const argon2idType = 2

./clean-cmake.sh
EMCC_SDK_PATH=$EMCC_SDK_PATH ARGON_JS_BUILD_BUILD_WITH_SIMD=0 ./build-wasm.sh
mv dist/argon2.wasm ../argon2.wasm
# FIXME: Redownload the source if argon2 ever has source updates. At the time of writing it was last changed 5 years ago so this is probably not a high priority.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add a comment to the repository where the source files are coming from? Is it https://github.com/P-H-C/phc-winner-argon2?

@danleh
Copy link
Contributor

danleh commented Feb 12, 2025

Very cool, thanks for the recompile and refactoring, much cleaner/more streamlined now!

To your questions/points:

Since emscripten doesn't wrap strings with an object that frees them on finalization, which IMO is what I would do I added that. If folks don't like this I ok with going back to manually freeing. The finalizers do run, but I didn't check if it was inside the benchmark window (My assumption is no though since FinalizationRegistries run on the event loop IIRC).

It is a bit unfortunate that we cannot control whether free is run inside the benchmark window or not, so I would prefer manual free-ing (and for that assume we stay on the happy path, i.e., not worry about early returns from throw new Error(...)). However, free has <0.01% of the CPU cycles in the profile, so in practice it doesn't matter much.

Regarding crypto algorithm/parameter choices:

We default to a time count of 2, previously it was 1 but a few subtests would change that. I picked 2 because 3+ was quite slow. 3+ is what's recommended to prevent side-channel attacks.

I am not a cryptographer, but I trust the default choices of the well-vetted libsodium library. They only allow a time factor of >=3 for argon2i, so let's change const tCost = 3. This increases the Wall time by ~50% on my machine but still stays <3s Wall time, so I think it's fine.

The test also iterates through the three argon2 modes. The old test did this for a few runs so I tried to at least somewhat replicate the behavior. I'm a bit unsure on this since I doubt anyone would do this in the wild, you'd probably pick based on the use case and stick with it. On the other hand, maybe it's useful to test each of the different backends.
[...]
For reference on the last point https://en.wikipedia.org/wiki/Argon2 has a good summary of each of the different modes. If I were to pick only one I would probably go with "Argon2id".

Libsodium also picks argon2id by default, so I agree. I also took a profile with iterating all three types vs. just a fixed one (2 == argon2id) and in both cases the hottest function with >90% of the self CPU cycles is fill_block (function 22), so I don't think it matters much.

@kmiller68
Copy link
Contributor Author

It is a bit unfortunate that we cannot control whether free is run inside the benchmark window or not, so I would prefer manual free-ing (and for that assume we stay on the happy path, i.e., not worry about early returns from throw new Error(...)). However, free has <0.01% of the CPU cycles in the profile, so in practice it doesn't matter much.

Yeah, I kinda agree. On the other hand this is likely how I would have written the API were I to write it myself. In theory we could use https://github.com/tc39/proposal-explicit-resource-management if that ever ships. I'll switch it to free if that's what folks prefer.

I am not a cryptographer, but I trust the default choices of the well-vetted libsodium library. They only allow a time factor of >=3 for argon2i, so let's change const tCost = 3. This increases the Wall time by ~50% on my machine but still stays <3s Wall time, so I think it's fine.

I think if we switch to tCost = 3 I'd like to cut the iterations down. That said if we're switching to argon2id then each pass after the first is the same. See:
data_independent_addressing = (instance->type == Argon2_i) || (instance->type == Argon2_id && (position.pass == 0) && (position.slice < ARGON2_SYNC_POINTS / 2)); in fill_segment. So it would be covering both modes even with tCost = 2.

Libsodium also picks argon2id by default, so I agree. I also took a profile with iterating all three types vs. just a fixed one (2 == argon2id) and in both cases the hottest function with >90% of the self CPU cycles is fill_block (function 22), so I don't think it matters much.

Sounds good to me I'll switch to just argon2id.

@eqrion Do you have thoughts?

1) Move to an explicit `free` call.
2) Only do `Argon2id`
3) Use `tCost = 3` rather than 2 per best practices.
4) Lower iteration count from 50 -> 30 and worst case from 4 -> 3
@kmiller68
Copy link
Contributor Author

Just gonna merge this. We can address further feedback later.

@kmiller68 kmiller68 merged commit 3d36c5b into WebKit:main Feb 26, 2025
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants