Skip to content

WIP Password auth design doc #32005

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 7 commits into from
May 5, 2025

Conversation

jubrad
Copy link
Contributor

@jubrad jubrad commented Mar 25, 2025

Motivation

Tips for reviewer

Checklist

  • This PR has adequate test coverage / QA involvement has been duly considered. (trigger-ci for additional test/nightly runs)
  • This PR has an associated up-to-date design doc, is a design doc (template), or is sufficiently small to not require a design.
  • If this PR evolves an existing $T ⇔ Proto$T mapping (possibly in a backwards-incompatible way), then it is tagged with a T-proto label.
  • If this PR will require changes to cloud orchestration or tests, there is a companion cloud PR to account for those changes that is tagged with the release-blocker label (example).
  • If this PR includes major user-facing behavior changes, I have pinged the relevant PM to schedule a changelog post.

WIP password auth design doc
@SangJunBak
Copy link
Contributor

SangJunBak commented Mar 26, 2025

Regarding closing off the internal port for Cloud in the cloud sync:

  • We should consider Teleport too since i think it uses some internal port?
  • Are we able to easily disable this for the emulator?

@jubrad
Copy link
Contributor Author

jubrad commented Mar 26, 2025

Option A) - preferred
We should consider adding confimap syncing as a self-managed replacement for LD for system vars. If we do this we can probably drop the need for mz_system (in self-managed) and just close off the internal port for self-managed, perhaps not even start it up.

Option B) - backup

  • We need to evaluate other mz_system capabilities and ensure that it is not needed for any other purposes.
    If mz_system is needed then we may need to optionally allow external port login for mz_system with a password.

@alex-hunt-materialize
Copy link
Contributor

We should consider Teleport too since i think it uses some internal port?

Teleport prefers mutual TLS, and we've actually gone out of our way to disable it. It's been on the roadmap for a long time to get that working, but never got to it.

@DAlperin
Copy link
Member

Here are my thoughts on implementation:

  • A system table for storing sensitive role data like the (hashed and salted) password of a role
    • In the future we could transition this to secrets once those are supported for self managed
  • We will expand the adapter::Client surface area to expose an API to the protocol layer which can be used to exchange a username and password for a role
    • when running in cloud we will continue to use frontegg and this codepath will be unexercised
  • we will do the same adapter call or equivalent in tower middlewares for the http endpoint

The rest of it as laid out in the doc makes sense to me

@jubrad
Copy link
Contributor Author

jubrad commented Mar 27, 2025

On system users... We might be able to pass in a bootstrap user that is also considered internal/system. This user should be considered superuser and internal user and it's password should come from a k8s secret. The k8s secret passed in can be used as it's password secrets reference in the catalog.

@SangJunBak
Copy link
Contributor

SangJunBak commented Mar 27, 2025

Still not clear to me the flow of setting up auth. In my head, it looks like:

  1. As they initially set up via the helm chart, orchestratord will have the env variable to set the password of the mz_system user. This is probably configurable via the helm chart?
  2. Once the user deploys the materialize CR, orchestratord will pass that env variable in order to start the environment with an mz_system type user
  3. The user can now login via that user and add more roles

Does that sound correct? Then we just make it a limitation that a user can't use the Console unless there's a registered role. In the Console, we'll most likely need the following information

  • Is (password) auth enabled?
  • Does there exist an authenticated user?
    But I'm also confused how the Console might receive this information if we can't query the database to begin with.

Should also consider the migration path from self managed customers without auth to customers with auth.


### 4. Configurable admin system login:

Passwords for `mz_system` and `mz_support` roles will be settable via environment variables `MZ_MZ_<USER>_EXTERNAL_LOGIN_PASSWORD`. Orchestratord should provide a set of parameters to set these variables via a Kubernetes secret. Additionally, We should enable login of system users through the external ports when they have external login passwords set. Login through the external port must not be possible unless this flag is set, this logic should not rely on whether the internal user has a password.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This may need an updated Materialize CRD to add a new field for the name of the secret.

@alex-hunt-materialize
Copy link
Contributor

@SangJunBak +1 to everything you said.

But I'm also confused how the Console might receive this information if we can't query the database to begin with.

Maybe we can pass that to the console also? Maybe through an env var? That gets a bit tricky, since I think we update the balancers and console immediately upon getting an updated Materialize, while only updating environmentd if we trigger a rollout. Perhaps this logic needs to change, but that may be difficult too.

@jubrad
Copy link
Contributor Author

jubrad commented Mar 27, 2025

Still not clear to me the flow of setting up auth. In my head, it looks like:

  1. As they initially set up via the helm chart, orchestratord will have the env variable to set the password of the mz_system user. This is probably configurable via the helm chart?
  2. Once the user deploys the materialize CR, orchestratord will pass that env variable in order to start the environment with an mz_system type user
  3. The user can now login via that user and add more roles

Does that sound correct? Then we just make it a limitation that a user can't use the Console unless they have . In the Console, we'll most likely need the following information

  • Is (password) auth enabled?
  • Does there exist an authenticated user?
    But I'm also confused how the Console might receive this information if we can't query the database to begin with.

Should also consider the migration path from self managed customers without auth to customers with auth.

Your initial flow is correct!

Then we just make it a limitation that a user can't use the Console unless they have .

Unless they have what?

The standard auth for postgres has login and non login users see pg_authid. For logins, one can first check that the role allows login then compare passwords. This would be done fully on the server side. I imagine we want to enable rolcanlogin if a password is set, and by that just check to see password.is_some() or something.

@SangJunBak
Copy link
Contributor

@jubrad

Unless they have what?
Ah my bad. Unless there's a registered role! Edited.

The standard auth for postgres has login and non login users see pg_authid.

But I mean in order for the Console to query pg_authid via the HTTP API, wouldn't they need to be authenticated?

@SangJunBak
Copy link
Contributor

Maybe we can pass that to the console also? Maybe through an env var? That gets a bit tricky, since I think we update the balancers and console immediately upon getting an updated Materialize

@alex-hunt-materialize and with the current design, an env variable won't do much good given the Console's already built by the time orchestratord installs the console pod. There's currently no way for orchestratord to edit build variables for the Console, which I think we should fix!

@jubrad
Copy link
Contributor Author

jubrad commented Mar 27, 2025

Unless they have what?
Ah my bad. Unless there's a registered role! Edited.

The standard auth for postgres has login and non login users see pg_authid.

But I mean in order for the Console to query pg_authid via the HTTP API, wouldn't they need to be authenticated?

Console shouldn't need to query pg_authid. It should just need to attempt to connect using username and password. If the role isn't a login role or there's a password mismatch that should all be handled server side. To determine if auth is enabled it could just attempt to run a query and, on failed login, route to the login box, or we can install a configurable endpoint on NGINX that console can hit to tell it what kind of auth it should attempt to use.

@jubrad jubrad force-pushed the password-auth-design-doc branch from 291b450 to 2c5754d Compare March 28, 2025 18:52
Copy link
Contributor

@SangJunBak SangJunBak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM on my end! Thanks for writing all this out!

@jubrad jubrad marked this pull request as ready for review April 1, 2025 01:00
Copy link
Contributor

@jasonhernandez jasonhernandez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My "hard requirements"

  1. Set the minimum iteration count >=200,000.
  2. Do not allow lockouts by default.
  3. Log successful and failed login attempts by default.

I've made some additional comments and recommendations for discussion / consideration. Thanks for writing this all up and the consideration in this design.

@DAlperin DAlperin mentioned this pull request Apr 8, 2025
5 tasks
@jubrad jubrad force-pushed the password-auth-design-doc branch from 6671220 to 80398a2 Compare April 10, 2025 03:01
@ptravers
Copy link
Contributor

should this include a threat analysis?

DAlperin added a commit that referenced this pull request Apr 14, 2025
This is the bones of self managed password auth. This is missing
integration tests and documentation which will come next. You can test
this locally like so:
```shell
#this starts materialize with auth enabled on the external port but not the internal port
$ bin/environmentd --bazel --reset -- --all-features --unsafe-mode --enable-self-hosted-auth

#start a mz_system session, turn on sql support and create a user
$ psql -U mz_system -h localhost -p 6877 materialize
NOTICE:  connected to Materialize v0.139.0-dev.0
  Org ID: 1bd2c405-c638-44cc-b917-6d05dfb832ac
  Region: local/az1
  User: mz_system
  Cluster: mz_system
  Database: materialize
  Schema: public
  Session UUID: fe6d2dbe-3a94-430d-82e1-5e79cb14b91f

Issue a SQL query to get started. Need help?
  View documentation: https://materialize.com/s/docs
  Join our Slack community: https://materialize.com/s/chat
    
psql (14.17 (Homebrew), server 9.5.0)
Type "help" for help.

materialize=> alter system set enable_self_managed_auth=true;
NOTICE:  variable "enable_self_managed_auth" was updated for the system, this will have no effect on the current session
ALTER SYSTEM
materialize=> create role foo with superuser password 'bar';
CREATE ROLE

# Now connect over the port with auth enabled
$ psql -U foo -h localhost -p 6875 materialize
Password for user foo: 

```

This begins implementing
#32005
<!--
Describe the contents of the PR briefly but completely.

If you write detailed commit messages, it is acceptable to copy/paste
them
here, or write "see commit messages for details." If there is only one
commit
in the PR, GitHub will have already added its commit message above.
-->

### Motivation

<!--
Which of the following best describes the motivation behind this PR?

  * This PR fixes a recognized bug.

    [Ensure issue is linked somewhere.]

  * This PR adds a known-desirable feature.

    [Ensure issue is linked somewhere.]

  * This PR fixes a previously unreported bug.

    [Describe the bug in detail, as if you were filing a bug report.]

  * This PR adds a feature that has not yet been specified.

[Write a brief specification for the feature, including justification
for its inclusion in Materialize, as if you were writing the original
     feature specification.]

   * This PR refactors existing code.

[Describe what was wrong with the existing code, if it is not obvious.]
-->

### Tips for reviewer

<!--
Leave some tips for your reviewer, like:

    * The diff is much smaller if viewed with whitespace hidden.
    * [Some function/module/file] deserves extra attention.
* [Some function/module/file] is pure code movement and only needs a
skim.

Delete this section if no tips.
-->

### Checklist

- [ ] This PR has adequate test coverage / QA involvement has been duly
considered. ([trigger-ci for additional test/nightly
runs](https://trigger-ci.dev.materialize.com/))
- [ ] This PR has an associated up-to-date [design
doc](https://github.com/MaterializeInc/materialize/blob/main/doc/developer/design/README.md),
is a design doc
([template](https://github.com/MaterializeInc/materialize/blob/main/doc/developer/design/00000000_template.md)),
or is sufficiently small to not require a design.
  <!-- Reference the design in the description. -->
- [ ] If this PR evolves [an existing `$T ⇔ Proto$T`
mapping](https://github.com/MaterializeInc/materialize/blob/main/doc/developer/command-and-response-binary-encoding.md)
(possibly in a backwards-incompatible way), then it is tagged with a
`T-proto` label.
- [ ] If this PR will require changes to cloud orchestration or tests,
there is a companion cloud PR to account for those changes that is
tagged with the release-blocker label
([example](MaterializeInc/cloud#5021)).
<!-- Ask in #team-cloud on Slack if you need help preparing the cloud
PR. -->
- [ ] If this PR includes major [user-facing behavior
changes](https://github.com/MaterializeInc/materialize/blob/main/doc/developer/guide-changes.md#what-changes-require-a-release-note),
I have pinged the relevant PM to schedule a changelog post.
@jubrad jubrad force-pushed the password-auth-design-doc branch from 80398a2 to a91ab68 Compare April 16, 2025 01:58
@jubrad
Copy link
Contributor Author

jubrad commented Apr 16, 2025

should this include a threat analysis?

This will be a self-managed only feature which will be scoped different for security audits. I'll leave it up to @jasonhernandez to sort out how we deal with this.

Copy link
Contributor

@jasonhernandez jasonhernandez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for setting a 400k iteration count as the default. This looks like a good v1 design to ship.

@jubrad jubrad merged commit c094b21 into MaterializeInc:main May 5, 2025
7 checks passed
@jubrad jubrad deleted the password-auth-design-doc branch May 5, 2025 19:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants