Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Investigate "Balloon Hash" as alternative for Argon2 or Scrypt #40

Open
cipriancraciun opened this issue Nov 15, 2024 · 15 comments
Open

Comments

@cipriancraciun
Copy link
Member

cipriancraciun commented Nov 15, 2024

@hakavlad
Copy link

https://github.com/samuel-lucas6/Cryptography-Guidelines?tab=readme-ov-file#password-hashingpassword-based-key-derivation

Avoid (not in order because they’re all bad):

Balloon hashing: arguably better than Argon2 since it's similar in strength whilst having a more impressive design (e.g. no separate variants, resistance to cache attacks, easy to implement with standard cryptographic hash functions, and performant). Unfortunately, it has seen virtually no adoption. There seems to be no information on recommended parameters, the reference implementation is henrycg/balloon#5 (comment), there are no official test vectors, there's no RFC draft, and only a handful of people have implemented the algorithm, with it not being in any popular libraries. Therefore, just use Argon2, which has now been standardised and widely adopted.

@cipriancraciun
Copy link
Member Author

I'll take one-by-one your arguments:

arguably better than Argon2 [...]

  • at this moment, I don't think any PBKDF algorithm can be called "the best" based on actual facts and research; yes, Argon2 has won the password hashing contest, but since then there have been many more contenders (yescrypt for example, Balloon hashing another one), there have been some research papers, but nothing definitive...
  • to conclude, all we have is mainly suggestions and best-practices;

[...] easy to implement with standard cryptographic hash functions, and performant [...]

  • and this is exactly why I consider Balloon hashing better than Argon2:
    • it can use any hash function (and I do like Blake3), thus I can reduce the amount of cryptographic primitives in my code;
    • I can actually understand what is happening in there;
    • as opposed to Argon2 which (before being standardized in a RFC) had quite a few missing details from the paper, and is very convoluted;

Unfortunately, it has seen virtually no adoption.

  • the same was true for both Bcrypt, Scrypt, and Argon2 before they became best practices;

there seems to be no information on recommended parameters

  • I don't see this as a problem, because in the end it doesn't matter what the "recommended parameters" are, if I can't run it on a RaspberryPi Zero (which has only 512 MiB of RAM), or it takes half a minute to execute;
  • the best parameters are thus identified through trial and error, until they "work" for a given use-case;

the reference implementation is [no-longer maintained]
[...] and only a handful of people have implemented the algorithm, with it not being in any popular libraries.

  • it doesn't matter, as the algorithm is quite trivial to implement;

there are no official test vectors

  • it doesn't matter, as there isn't yet a standard one;

there's no RFC draft

  • indeed, but there is active work on one;
  • most of my tool doesn't follow an RFC, thus there isn't much additional risk; (i.e. I already have a lot of experimental constructs;)

As such, I don't see yet any compelling reason why not to use Balloon Hash.

@hakavlad
Copy link

I'll take one-by-one your arguments:

BTW it's arguments of Samuel Lucas

@hakavlad
Copy link

@samuel-lucas6 Could you comments this, please?

@samuel-lucas6
Copy link

I'd recommend sticking with Argon2, at least for the time being. It's among the best we have in practice but not perfect. It should be stronger and significantly faster than Balloon/BKDF. I don't see any reason to use scrypt over Argon2.

I don't think it's a good idea to use algorithms that don't have a proper specification or that have a specification that's subject to change. This doesn't have to be an RFC, but it needs to be documented in some fashion with rationale.

I technically did do a version of the Internet-Draft for the Rust Crypto implementation of Balloon, but it's not the ideal way to specify Balloon. Then BKDF is a work in progress that I'd like further feedback on, just cryptographers/engineers aren't particularly interested. There doesn't seem to be as much research/reading on password hashing compared to some other topics, and it feels like most people have given up after Argon2. The non-academic, cache-hardness proposals like Pufferfish2 and bscrypt have the same issue with no specification and a lack of peer review. They shouldn't be considered suitable for production.

I agree with the remainder of what you've said.

@cipriancraciun
Copy link
Member Author

This doesn't have to be an RFC, but it needs to be documented in some fashion with rationale.

Well, I assume that the original paper describing the algorithm does fit this bill, right?

@samuel-lucas6
Copy link

Well, I assume that the original paper describing the algorithm does fit this bill, right?

Nope because they didn't take the time to properly specify the algorithm. It leaves lots of implementation details unspecified, meaning you'll end up with non-interoperable implementations that potentially offer different security properties/levels of security.

@cipriancraciun
Copy link
Member Author

[...] It leaves lots of implementation details unspecified [...]

Besides not specifying which hash algorithm to use, I don't see any meaningful missing details. Could you point to some?

[...] you'll end up with non-interoperable implementations [...]

To put things into perspective, this project offers an "experimental" encryption scheme based on ChaCha20, Blake3, X25519, and Argon2, but it doesn't implement any existing standard, outside of "best practices". Thus, interoperability is not an issue.

@samuel-lucas6
Copy link

Besides not specifying which hash algorithm to use, I don't see any meaningful missing details. Could you point to some?

The paper pseudocode doesn't define things like ints_to_block(), the size of the counter, the max parameters, the encoding, and how to avoid canonicalization attacks. Then there's Balloon and Balloon-M as well as other variants of Balloon in older versions of the paper (e.g., Double-Buffer).

To put things into perspective, this project offers an "experimental" encryption scheme based on ChaCha20, Blake3, X25519, and Argon2, but it doesn't implement any existing standard, outside of "best practices". Thus, interoperability is not an issue.

Those are all algorithms with a specification. That sounds more like a protocol than a scheme. I don't think there's anything wrong with doing something like ChaCha20-BLAKE3 as an AEAD with justification, but there's very little justification for using Balloon currently. If you're certain you want to implement it, I'd recommend writing a specification for your version of Balloon.

@cipriancraciun
Copy link
Member Author

[I find all of the issues a bit like nit-picking...]

The paper pseudocode doesn't define things like ints_to_block()

Well, let's assume they are 64 bits in size, and just represent them as big-endian bytes and concatenate them.

the size of the counter

64 bit big-endian again.

the max parameters

Which "max parameters"? The rounds and memory size can be arbitrarily large without any issues.

the encoding

The "encoding" of what?

and how to avoid canonicalization attacks.

I believe this is a solved problem, especially since the algorithm works only with counters or indices (64 bit thus) or full blocks.


Then there's Balloon and Balloon-M as well as other variants of Balloon in older versions of the paper (e.g., Double-Buffer).

Indeed, this is the biggest issue, but one could use the latest variant described in the paper, because most other papers have chosen the latest variant to review.

With regard to the parallel version (I'm assuming it's the -M) version, I would just skip it for the purpose of this project, because my target includes low-power devices (think of RaspberryPi Zero), thus parallelism doesn't actually work.

@samuel-lucas6
Copy link

[I find all of the issues a bit like nit-picking...]

In that case, I'm going to unsubscribe from this conversation as there's no point going around in circles. I will just leave you with the following:

  • Little-endian is generally preferred. Ascon is getting changed from big-endian to little-endian.
  • The rounds and memory size can't be arbitrarily large because the counter will overflow, even with a 64-bit counter.
  • The counter and any integer values need to be converted to bytes.
  • The password and salt are variable-length so canonicalization attacks aren't a solved problem.
  • to_int() also isn't defined, which I skimmed over.
  • Using the Balloon paper suffers from other issues like salt-based memory accesses, modulo bias, poor performance, a lack of variable-length output, etc.

@cipriancraciun
Copy link
Member Author

In that case, I'm going to unsubscribe from this conversation as there's no point going around in circles.

I am sorry, I didn't meant to offend you.


The rounds and memory size can't be arbitrarily large because the counter will overflow, even with a 64-bit counter.

Here by "arbitrarily" I meant some limit that does make sense in the real world. One would have to apply limits in a practical sense, else the actual code wouldn't be able to run (wouldn't be able to actually allocate memory), or wouldn't finish (if the iterations are too large).

With regard to the 64bit counter overflow, given what I've read from the paper, I believe its main purpose is to aid the security proof, because wherever hash is used, it's inputs already depend on a previous step, and thus can't be parallelized even without the counter.

However, the counter can be then increased to a 128 bit integer, or even let to wrap, as I don't believe the whole strength of the algorithm lies in the counter. (Even with 128 bits, if one chooses the max 64 bit value for both memory and size, it can overflow, as for each iteration it's incremented 1+delta*2 times.)


Using the Balloon paper suffers from other issues like salt-based memory accesses

Indeed, but there are two choices, neither of which are good:

  • depend on salt -- side-channels might leak information about the salt;
  • depend on password -- side-channels might leak information about the password;
  • plus, do both -- side-channels might leak some (less) information about both;

modulo bias

With regard to modulo bias, mod is used only twice:

  • once to choose the "previous" block wrapping around the blocks vector if needed, thus no modulo bias here;
  • and once when choosing the other blocks, where indeed modulo bias might be an issue, but the inputs to mod is a hash, and the outcome of the mod is again hashed, thus perhaps evening the odds; (thus I doubt it really impacts the overall quality;)

poor performance

Compared to what?

a lack of variable-length output, etc.

Not an issue for my particular use-case. (I need a function that takes a 32 byte key (salt), some data (password), munge on it to thwart brute-force guessing attacks, and spit out another 32 bytes.)

@hakavlad
Copy link

neither of which are good

depend on salt -- side-channels might leak information about the salt;

Salt is not secret. Then what is the problem in this case?

@cipriancraciun
Copy link
Member Author

Salt is not secret. Then what is the problem in this case?

From what I gather, the general consensus is that an attacker could observe which salt is being used, kind of fingerprinting if not exactly the exact value of the salt, for example by noticing patterns in memory access.

Thus, it can expose "which" user is login-in, or which key is being encrypted / decrypted.

@hakavlad
Copy link

IMHO revealing non-secret data is better than revealing secret data.

or which key is being encrypted

This is like metadata leakage, not secret data leakage.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

3 participants