Add selfhosted ingress identity forwarding and organizations service#350
Add selfhosted ingress identity forwarding and organizations service#350aviator-app[bot] merged 5 commits intomainfrom
Conversation
There was a problem hiding this comment.
Pull request overview
This PR updates the selfhosted controlplane Helm chart to ensure intra-cluster DP→CP traffic is routed through the same ingress auth flow as external traffic, forwards identity headers to gRPC backends, and introduces a new organizations controlplane service required by newer API flows.
Changes:
- Add nginx ingress
server-aliasfor selfhosted intra-cluster deployments (AWS/GCP) so internal service DNS matches ingress rules. - Add nginx
configuration-snippeton protected gRPC ingresses to forward auth subrequest identity headers to gRPC upstreams. - Add a new
organizationsservice configuration (including DB + migration/init settings) and its connect port wiring.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
| charts/controlplane/values.yaml | Adds gRPC identity-forwarding snippet under protected gRPC ingress annotations; introduces services.organizations default config. |
| charts/controlplane/values.gcp.selfhosted-intracluster.yaml | Adds ingress server-alias annotation for intra-cluster host matching. |
| charts/controlplane/values.aws.selfhosted-intracluster.yaml | Adds ingress server-alias annotation and expands inline rationale around ingress auth behavior. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| organizations: | ||
| fullnameOverride: "organizations" | ||
| sharedService: | ||
| connectPort: 8081 | ||
| initContainers: |
There was a problem hiding this comment.
This PR changes the rendered Helm output (new organizations service + new protectedIngressAnnotationsGrpc snippet). The repo has snapshot-style Helm output tests under tests/generated driven by tests/run.sh; these fixtures will need to be regenerated/updated or CI diffs will fail for the controlplane.* test cases.
There was a problem hiding this comment.
Claude confirms this is why the CI check is failing.
| nginx.ingress.kubernetes.io/configuration-snippet: | | ||
| auth_request_set $user_id $upstream_http_x_user_subject; | ||
| proxy_set_header X-User-Subject $user_id; | ||
| grpc_set_header X-User-Subject $user_id; | ||
|
|
There was a problem hiding this comment.
The gRPC identity-forwarding configuration-snippet currently forwards X-User-Subject and X-User-Claim-Identitytype, but it does not forward X-User-Token or X-User-Claim-Preferred-Username even though they are included in auth-response-headers. If the intent is that gRPC backends receive the same identity context as HTTP backends, add grpc_set_header/proxy_set_header entries for these headers as well (sourced from the auth subrequest response).
There was a problem hiding this comment.
Are these headers actually needed?
| auth_request_set $user_handle $upstream_http_x_user_claim_userhandle; | ||
| proxy_set_header X-User-Claim-userhandle $user_handle; | ||
| grpc_set_header X-User-Claim-userhandle $user_handle; | ||
|
|
||
| auth_request_set $groups $upstream_http_x_user_claim_groups; | ||
| proxy_set_header X-User-Claim-groups $groups; | ||
| grpc_set_header X-User-Claim-groups $groups; | ||
|
|
There was a problem hiding this comment.
The snippet reads $upstream_http_x_user_claim_userhandle and $upstream_http_x_user_claim_groups, but those headers are not included in nginx.ingress.kubernetes.io/auth-response-headers for this ingress. As written they will always be empty; either add the corresponding headers to auth-response-headers (if /me returns them) or drop these auth_request_set/proxy/grpc_set_header lines to avoid a misleading no-op.
| auth_request_set $user_handle $upstream_http_x_user_claim_userhandle; | |
| proxy_set_header X-User-Claim-userhandle $user_handle; | |
| grpc_set_header X-User-Claim-userhandle $user_handle; | |
| auth_request_set $groups $upstream_http_x_user_claim_groups; | |
| proxy_set_header X-User-Claim-groups $groups; | |
| grpc_set_header X-User-Claim-groups $groups; |
|
According to Claude, you just need to run |
|
How much of this will change when Laura's Envoy change fully lands? |
5afab1b to
351da2b
Compare
8c55d5f to
7a40222
Compare
## Overview Adds authorizer configuration defaults for selfhosted deployments and fixes dashboard panel issues. - **defaultIdentityToSubject**: Config for non-Okta IdPs (FAB-189) that don't include `identitytype` claim natively. When enabled, subjects without identitytype are treated as users. - **UserClouds client defaults**: Pre-configured connection settings in the controlplane values so terraform doesn't need deep-merge overrides. - **Dashboard fixes**: Authorizer Mode panel value mappings for case sensitivity, metric name mismatches. ## Migration Notes No migration required. Additive defaults only — existing deployments are not affected. ## Test Plan - [x] `helm template` renders correctly - [x] Verified on identity-testing environment ref FAB-189 ref FAB-178 🤖 Generated with [Claude Code](https://claude.com/claude-code) - `main` <!-- branch-stack --> - **Add selfhosted authorizer defaults and dashboard fixes** :point\_left: - \#349 - \#350 - \#351 - \#352 - \#353 - \#354
## Overview Adds OIDC/OAuth2 globals to support identity providers beyond Okta (Entra ID, Keycloak, Authentik). **New globals** (all empty by default — Okta behavior unchanged): - `OIDC_METADATA_URL` — metadata discovery endpoint (Okta: `.well-known/oauth-authorization-server`, Entra: `.well-known/openid-configuration`) - `OIDC_APP_SCOPE` — resource scope for CLI/SDK and task pods (Entra: `api://my-app/.default`) - `OIDC_APP_AUDIENCE` — audience parameter for PKCE flows (Entra: `api://my-app`) - `OIDC_S2S_SCOPE` — service-to-service scope (Entra: `api://my-app/.default`) - Dataplane S2S scope support via `OIDC_S2S_SCOPE` Includes inline documentation with Okta/Entra ID examples, OAuth app flow references, and test fixture for non-Okta configuration. ## Migration Notes No migration required. All new globals default to empty which preserves existing Okta behavior. Non-Okta IdPs can now be configured without terraform deep-merge overrides. ## Test Plan - [x] Added `tests/values/controlplane.custom-oidc.yaml` test fixture - [x] `helm template` renders correctly with both empty and non-empty globals - [x] Verified on identity-testing environment with Entra ID ref FAB-178 🤖 Generated with [Claude Code](https://claude.com/claude-code) - `main` <!-- branch-stack --> - **Add OIDC globals for multi-IdP support (Okta, Entra ID, Keycloak)** :point\_left: - \#350 - \#351 - \#352 - \#353 - \#354
- Add defaultIdentityToSubject config for non-Okta IdPs (FAB-189) that don't include identitytype claim natively - Provide UserClouds client connection defaults in controlplane values so terraform doesn't need to deep-merge them - Fix Authorizer Mode dashboard panel value mappings for case sensitivity - Fix dashboard metric name mismatches and query bugs No migration required. Additive defaults only — existing deployments are not affected. ref FAB-189 ref FAB-178 Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
New globals with safe empty defaults: - OIDC_METADATA_URL — metadata discovery endpoint (default: RFC 8414) - OIDC_APP_SCOPE — resource scope for CLI/SDK PKCE and task pods - OIDC_APP_AUDIENCE — audience parameter for PKCE flows - OIDC_S2S_SCOPE — service-to-service client_credentials scope - Dataplane S2S scope support Includes documentation with Okta/Entra ID examples for each global, OAuth app flow references, and OIDC test fixture. ## Migration Notes No migration required. All new globals default to empty which preserves existing Okta behavior. Non-Okta IdPs (Entra ID, Keycloak) can now configure scopes and audience via these globals. ref FAB-178 Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
- Add defaultIdentityToSubject config for non-Okta IdPs (FAB-189) that don't include identitytype claim natively - Provide UserClouds client connection defaults in controlplane values so terraform doesn't need to deep-merge them - Fix Authorizer Mode dashboard panel value mappings for case sensitivity - Fix dashboard metric name mismatches and query bugs No migration required. Additive defaults only — existing deployments are not affected. ref FAB-189 ref FAB-178 Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
New globals with safe empty defaults: - OIDC_METADATA_URL — metadata discovery endpoint (default: RFC 8414) - OIDC_APP_SCOPE — resource scope for CLI/SDK PKCE and task pods - OIDC_APP_AUDIENCE — audience parameter for PKCE flows - OIDC_S2S_SCOPE — service-to-service client_credentials scope - Dataplane S2S scope support Includes documentation with Okta/Entra ID examples for each global, OAuth app flow references, and OIDC test fixture. No migration required. All new globals default to empty which preserves existing Okta behavior. Non-Okta IdPs (Entra ID, Keycloak) can now configure scopes and audience via these globals. ref FAB-178 Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
- server-alias annotation: allows DP→CP intra-cluster traffic through nginx ingress by matching the internal service DNS name - gRPC configuration-snippet: forwards identity headers (X-User-Subject, X-User-Claim-Identitytype) from auth subrequest to gRPC backends. Required for BYOC since Oct 2024, now ported to selfhosted chart. - Organizations service: new controlplane service for settings/org APIs. Required by CreateRun since #15107. - Fix organizations connectPort in configmap ## Migration Notes **New behavior:** Intra-cluster DP→CP gRPC traffic now goes through nginx auth subrequests (via server-alias). This ensures consistent identity resolution for all callers. Previously, intra-cluster traffic could bypass auth if the :authority header didn't match the ingress host. **Action required:** None for new deployments. Existing deployments will get the server-alias annotation on next helm upgrade. ref FAB-178 ref FAB-195 Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
7a40222 to
4025d7f
Compare
Current Aviator status
This PR was merged using Aviator.
See the real-time status of this PR on the
Aviator webapp.
Use the Aviator Chrome Extension
to see the status of your PR within GitHub.
|
|
/aviator merge |
|
Aviator has accepted the merge request. It will enter the queue when all of the required status checks have passed. Aviator will update the sticky status comment as the pull request moves through the queue. |
## Summary - Move all OIDC/OAuth2 auth configuration into a documented `flyte.configmap.adminServer.auth` block in base values.yaml with inline Okta/Entra ID examples - Move `adminClient.connection` to base (was duplicated in AWS/GCP overlays) - Wire `OIDC_S2S_SCOPE` for CP S2S auth (Entra requires `/.default`, Okta uses `all`) - Add `OIDC_BROWSER_SCOPE` for Entra browser auth (AADSTS90009 fix) - Deprecate scattered auth globals: `OIDC_BASE_URL`, `OIDC_CLIENT_ID`, `CLI_CLIENT_ID` - Authz templates accept both `"Union"` and `"UserClouds"` type Stacked on #350 → #349 → #348 → main ## Migration Notes **Auth config location changed:** All auth config now lives in `flyte.configmap.adminServer.auth` in base values.yaml. Cloud overlays no longer set auth values — terraform authn modules output complete `app_auth`/`user_auth` blocks. **Deprecated globals:** `OIDC_BASE_URL`, `OIDC_CLIENT_ID`, `CLI_CLIENT_ID` still work but are deprecated. New deployments should use the auth block directly. **New globals:** `OIDC_S2S_SCOPE` (default: `all`), `OIDC_BROWSER_SCOPE`, `INTERNAL_SUBJECT_ID` (default: `INTERNAL_CLIENT_ID`). **UserClouds → Union:** Authz templates accept `type: "Union"` alongside `"UserClouds"`. Both are equivalent; `"Union"` preferred for new deployments. ## Risk Assessment **Medium** — Restructures auth config. Verified by `compare_manifests.py` producing zero structural diffs against baseline. ## Test Plan - [ ] `helm template` with Okta defaults produces identical auth config - [ ] `helm template` with Entra overrides produces correct scopes/audiences - [ ] Verify deprecated globals still work for backward compatibility - [ ] `compare_manifests.py` zero diffs against baseline ref FAB-178, FAB-195 🤖 Generated with [Claude Code](https://claude.com/claude-code) - `main` <!-- branch-stack --> - **Consolidate auth config into base values.yaml** :point\_left: - \#352 - \#353 - \#354
## Summary - Move cloud-agnostic config from AWS/GCP overlays into base values.yaml - **AWS overlay**: \~400 → 102 lines - **GCP overlay**: 504 → 105 lines - Base chart is now self-contained for selfhosted deployments - Overlays reduced to only cloud-specific items: IAM, storage, region, scylla provisioner Stacked on #351 → #350 → #349 → #348 → main ### What moved to base - Namespace-derived FQDNs (`admin.endpoint`, `rootTenantURLPattern`) - Ingress configuration (server-alias, protectedIngress annotations) - ingress-nginx (enabled, ClusterIP, fullnameOverride) - Monitoring (kube\* disabled, serviceMonitor enabled) - Image repos (default: `registry.unionai.cloud/controlplane`) - Secrets (union-operator, union-secrets) - ScyllaDB generic config - envoy-gateway defaults ### What stays in overlays (\~100 lines each) - Cloud region, DB, storage bucket, IAM identifiers - `flyte.storage` (type:s3/gcs, region/projectId) - IAM annotations (IRSA / Workload Identity) - scylla storageClass provisioner - dataproxy endpoint, artifacts stow config ## Migration Notes **Base chart is now self-contained:** New selfhosted deployments only need cloud-specific overlay values (IAM, storage, DB). **Image repository default:** Base defaults to `registry.unionai.cloud/controlplane`. Internal deployments using ECR must set `IMAGE_REPOSITORY_PREFIX` via terraform. **Namespace-derived FQDNs:** Services use `{{ .Release.Namespace }}` instead of hardcoded values. Resolves identically in standard deployments. ## Risk Assessment **Higher** — Touches most of values.yaml. Verified by `compare_manifests.py` producing zero structural diffs against baseline from `mike/overlay-consolidated-backup-2026-04-20`. ## Test Plan - [ ] `helm template` with AWS overlay produces identical manifests to baseline - [ ] `helm template` with GCP overlay produces identical manifests to baseline - [ ] `compare_manifests.py` zero diffs against baseline - [ ] Deploy to identity-testing environment and verify all services healthy ref FAB-178, FAB-195, FAB-276 🤖 Generated with [Claude Code](https://claude.com/claude-code) - `main` <!-- branch-stack --> - **Move generic selfhosted config from overlays to base values.yaml** :point\_left: - \#353 - \#354
## Summary - Overhaul authorizer row in union-controlplane-overview dashboard for standardized `BackendMetrics` - Fix metric name mismatches, query bugs, and Authorizer Mode value mappings - Add `identity_type` breakdown (User/App/External/Unknown) to auth panels - Add proper units (ops, ms, percentunit), thresholds, and zero-state handling - Fix legendFormat double-escaping - Remove "(V1 + V2)" labels from row titles - Regenerate test snapshots Stacked on #352 → #351 → #350 → #349 → #348 → main ## Migration Notes **Dashboard auto-updates:** Deployed via ConfigMap sidecar — updates on next helm upgrade. **New metrics required:** Panels reference `authz_backend_*` metrics from standardized `BackendMetrics`. Emitted by cloud v2026.4.x+. Older versions show "No data" for backend panels. ## Risk Assessment **Low** — Dashboard JSON only + generated test snapshots. ## Test Plan - [ ] Dashboard loads in Grafana without errors - [ ] Panels show data with cloud v2026.4.x+ - [ ] Zero-state panels show "0" instead of "No data" - [ ] `make generate-expected` matches committed snapshots ref FAB-178 🤖 Generated with [Claude Code](https://claude.com/claude-code) - `main` <!-- branch-stack --> - **Update authorizer dashboard: standardized backend metrics** :point\_left: - \#354
Summary
Stacked on #349 → #348 → main
Migration Notes
New behavior: Intra-cluster DP→CP gRPC traffic now goes through nginx auth subrequests (via server-alias). This ensures consistent identity resolution for all callers. Previously, intra-cluster traffic could bypass auth if the
:authorityheader didn't match the ingress host.Action required: None for new deployments. Existing deployments will get the server-alias annotation on next helm upgrade.
Risk Assessment
Medium — Changes ingress behavior. Test: DP→CP traffic, browser login.
Test Plan
helm templateproduces expected manifestsref FAB-178, FAB-195
🤖 Generated with Claude Code
main