-
Notifications
You must be signed in to change notification settings - Fork 469
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WIP Password auth design doc #32005
base: main
Are you sure you want to change the base?
WIP Password auth design doc #32005
Conversation
WIP password auth design doc
## Open questions | ||
1. **Where exactly are passwords stored**: While it makes sense to store passwords in the catalog, I have not yet identified exactly where they should be stored. Should it go alongside `mz_roles`, or in a new `mz_auth_id` table. The latter makes sense as it could store password metadata, and might let us store multiple passwords down the road. | ||
|
||
2. **Audit Logging and Brute Force Detection**: Do we need to log all failed password attempts and the IPs of those attempts? Do we need to take action to lock an IP or User when an attempt threshold is met? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any action taken should be user configurable. We don't want to allow DOS attacks by a user just spamming incorrect passwords, for example.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree with @alex-hunt-materialize - there's too much risk to availability from DOS or automatic retries with an incorrect password.
Our default behavior should log all login attempts, and we should aim to replicate PostgreSQL log messages. This will allow customers to send the data to their SIEM and/or use fail2ban or another tool to act on any failures.
We'd need to add a way for customers to lock users with fail2ban or a script, but I think we can leave that out for now unless it is very simple to implement.
here's some discussion on how to configure fail2ban w/ Postgres, for example
https://gist.github.com/rc9000/fd1be13b5c8820f63d982d0bf8154db1
Regarding closing off the internal port for Cloud in the cloud sync:
|
Option A) - preferred Option B) - backup
|
Teleport prefers mutual TLS, and we've actually gone out of our way to disable it. It's been on the roadmap for a long time to get that working, but never got to it. |
Here are my thoughts on implementation:
The rest of it as laid out in the doc makes sense to me |
On system users... We might be able to pass in a bootstrap user that is also considered internal/system. This user should be considered superuser and internal user and it's password should come from a k8s secret. The k8s secret passed in can be used as it's password secrets reference in the catalog. |
|
||
### 4. Configurable admin system login: | ||
|
||
Passwords for `mz_system` and `mz_support` roles will be settable via environment variables `MZ_MZ_<USER>_EXTERNAL_LOGIN_PASSWORD`. Orchestratord should provide a set of parameters to set these variables via a Kubernetes secret. Additionally, We should enable login of system users through the external ports when they have external login passwords set. Login through the external port must not be possible unless this flag is set, this logic should not rely on whether the internal user has a password. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: Should be MZ_<USER>_EXTERNAL_LOGIN_PASSWORD
given mz_system
would give MZ_MZ_MZ_SYSTEM_EXTERNAL_LOGIN_PASSWORD
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe the user should be last? MZ_EXTERNAL_LOGIN_PASSWORD_<USER>
That also gives a consistent prefix when grepping.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good!
EXTERNAL_LOGIN_PASSWORD_<USER>
with a env variable of
MZ_EXTERNAL_LOGIN_PASSWORD_<USER>
Still not clear to me the flow of setting up auth. In my head, it looks like:
Does that sound correct? Then we just make it a limitation that a user can't use the Console unless there's a registered role. In the Console, we'll most likely need the following information
Should also consider the migration path from self managed customers without auth to customers with auth. |
|
||
**Password Rotation or Expiration**: While not currently in scope, the design should allow for additions in the future, such as password rotation and expiration. | ||
|
||
**Abuse Detection**: Action should be taken action when brute force attempts are detected, either by locking the user or blocking the requesting client IP. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we implement this, it should be optional. We don't want to enable DOS attacks just by spamming broken passwords at us.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should write logs and let customers use fail2ban or whatever solutions they use elsewhere.
|
||
- **Password Dating**: The creation date of passwords will be recorded along with the hashed and salted password. When the password changes a creation date should be reset. The creation date of passwords should be queryable. | ||
|
||
- **OPTIONAL: Password Versioning**: Updates to the password hashing mechanisms may be required. As long as we are receiving passwords in plain text we should be able to take a validated password and replace the existing a new securely hashed value. This may require prefixing the password with data about the hash algorithm or parameters. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is trivial so we should just do it non-optionally. We don't have to implement any replacement now, just store the metadata about the algorithm and parameters.
EDIT: To clarify, the trivial part is to store things like the algorithm used and parameters like number of iterations, not implementing the full replacement process.
|
||
### 4. Configurable admin system login: | ||
|
||
Passwords for `mz_system` and `mz_support` roles will be settable via environment variables `MZ_MZ_<USER>_EXTERNAL_LOGIN_PASSWORD`. Orchestratord should provide a set of parameters to set these variables via a Kubernetes secret. Additionally, We should enable login of system users through the external ports when they have external login passwords set. Login through the external port must not be possible unless this flag is set, this logic should not rely on whether the internal user has a password. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This may need an updated Materialize CRD to add a new field for the name of the secret.
|
||
3. **Store Passwords in K8s Secrets**: It would be viable to store passwords in Kubernetes secrets. | ||
|
||
- **Reasons Not Chosen**: This may create an untenable number of Kubernetes secrets. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought we just discussed this as what we were going to do for our initial implementation. This is also just a detail of the implementation (where we store things).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I think this is from the original pass. Let's remove
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After thinking about this some more and talking it over, I think we should just store passwords in the catalog. This is considered user metadata, it seems like it should be relatively simple to keep there may simplify security reviews.
see:
https://materializeinc.slack.com/archives/C07PN7KSB0T/p1743173985961509
@SangJunBak +1 to everything you said.
Maybe we can pass that to the console also? Maybe through an env var? That gets a bit tricky, since I think we update the balancers and console immediately upon getting an updated Materialize, while only updating environmentd if we trigger a rollout. Perhaps this logic needs to change, but that may be difficult too. |
Your initial flow is correct!
Unless they have what? The standard auth for postgres has login and non login users see pg_authid. For logins, one can first check that the role allows login then compare passwords. This would be done fully on the server side. I imagine we want to enable rolcanlogin if a password is set, and by that just check to see |
@alex-hunt-materialize and with the current design, an env variable won't do much good given the Console's already built by the time orchestratord installs the console pod. There's currently no way for orchestratord to edit build variables for the Console, which I think we should fix! |
Console shouldn't need to query pg_authid. It should just need to attempt to connect using username and password. If the role isn't a login role or there's a password mismatch that should all be handled server side. To determine if auth is enabled it could just attempt to run a query and, on failed login, route to the login box, or we can install a configurable endpoint on NGINX that console can hit to tell it what kind of auth it should attempt to use. |
291b450
to
2c5754d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM on my end! Thanks for writing all this out!
@@ -127,7 +130,7 @@ The authentication mechanism for an environment must be configurable. A new flag | |||
|
|||
### Note on Console builds: | |||
|
|||
We currently re-use the same Console build for the emulator and self-managed. For practical purposes we can build both with password auth enabled with no option to disable. | |||
Console does currently do not support runtime or startup configuration. Configuration is handled only at build time. To resolve this we should add a `config.json` or `config.js` file which can be mounted directly into the Nginx container assets. This file should come from a materialize-console config map which must be setup by Orchestratord. We will also need changes to the console to support reading in configuration from this map. The initial config value here should be `authentication_type: password`, in cloud we should use `authentication_type: frontegg` or `authentication_type: jwt`. The console build process can still be used to set default values for this config file. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in the case where someone wants to bump the Console version or if we ever need to reinstall the console
service, we can most likely just swap out the config
file given the Console will get it on runtime.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We'll most likely just block the UI until the config map is loaded in (i.e. via a loading screen). This is fine given loading it in should be really quick.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My "hard requirements"
- Set the minimum iteration count >=200,000.
- Do not allow lockouts by default.
- Log successful and failed login attempts by default.
I've made some additional comments and recommendations for discussion / consideration. Thanks for writing this all up and the consideration in this design.
a. The coordinator layer will check the password against the stored hash. If the password is correct, the user will be authenticated. | ||
b. If the password is incorrect, the user will be denied access. | ||
|
||
3. **OPTIONAL: Book-Keeping**: Login and failed login requests should be tracked. Metrics should be created to monitor potential security events. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should log successful and failed login attempts, as closely to what Postgres does as possible.
|
||
**Password Rotation or Expiration**: While not currently in scope, the design should allow for additions in the future, such as password rotation and expiration. | ||
|
||
**Abuse Detection**: Action should be taken action when brute force attempts are detected, either by locking the user or blocking the requesting client IP. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should write logs and let customers use fail2ban or whatever solutions they use elsewhere.
|
||
### 3. Hashed Password Storage | ||
|
||
- **Secure Hashing**: For future compatibility with SCRAM, passwords should be hashed using `scram-sha-256` encryption. The RFC for [SCRAM](https://datatracker.ietf.org/doc/html/rfc5802) dictates an iteration count of at least 4096. However, the RFC for SCRAM-SHA-256 dictates that "[the hash iteration-count should be such that a modern machine will take 0.1 seconds to perform the complete algorithm](https://www.rfc-editor.org/rfc/rfc7677)". This also dictates at least 4096 with a recommendation of 15000. We should aim for 15000 if it does not negatively impact connection times or load. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This RFC is 10 years old.
I ran a quick benchmark on two computers:
t4g.nano - 200,000 iterations in 0.100s
Macbook Pro M3 Max - 450,000 iterations in 0.102s
I recommend we plan on a minimum of 400,000 iterations for now. I would also like us to store the iteration count in a column in the database. If practical, it would be very nice to randomly generate an iteration count for each password, perhaps in the range of 400,000-450,000 iterations. This would significantly reduce the probability of birthday attacks and the usefulness of rainbow tables. We will want to track the iteration count for passwords regardless, because we may need to increase the number of iterations as computers get faster.
I will try and do some more research and might adjust my recommended iteration count.
(note: edited to increase iterations given the 2021 draft RFC https://www.ietf.org/archive/id/draft-ietf-kitten-password-storage-07.txt)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This (expired) draft RFC from 2021 recommends a minimum of 310,000 iterations.
https://www.ietf.org/archive/id/draft-ietf-kitten-password-storage-07.txt
We should also specify a length on salts. We should use at least 32 bytes of cryptographically random data for salts.
|
||
3. **Store Passwords in K8s Secrets**: It would be viable to store passwords in Kubernetes secrets. | ||
|
||
- **Reasons Not Chosen**: This may create an untenable number of Kubernetes secrets, and may cause issues with passing security reviews. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using Kubernetes secrets also creates some additional complexity for backup / restore / migration. There's an advantage to keeping the same mental model as Postgres where possible.
|
||
2. **Audit Logging and Brute Force Detection**: Do we need to log all failed password attempts and the IPs of those attempts? Do we need to take action to lock an IP or User when an attempt threshold is met? | ||
|
||
3. **Password Strength Validation**: Should Materialize include built-in password strength validation, or should it be handled outside the system (e.g., via Kubernetes or other security tools)? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure if this is necessary yet. If we included the requirements, we would want to include an override setting for developer mode.
If we were to implement this, I would suggest copying the behavior of Postgres passwordcheck without cracklib + requiring an entropy score of 3+ from zxcvbn.
https://crates.io/crates/zxcvbn
Docs on Postgres passwordcheck
https://www.postgresql.org/docs/current/passwordcheck.html
https://github.com/postgres/postgres/blob/master/contrib/passwordcheck/passwordcheck.c
Motivation
Tips for reviewer
Checklist
$T ⇔ Proto$T
mapping (possibly in a backwards-incompatible way), then it is tagged with aT-proto
label.