Skip to content

Add selfhosted ingress identity forwarding and organizations service#350

Merged
aviator-app[bot] merged 5 commits intomainfrom
mike/selfhosted-ingress-identity
Apr 21, 2026
Merged

Add selfhosted ingress identity forwarding and organizations service#350
aviator-app[bot] merged 5 commits intomainfrom
mike/selfhosted-ingress-identity

Conversation

@mhotan
Copy link
Copy Markdown
Contributor

@mhotan mhotan commented Apr 20, 2026

Summary

  • server-alias annotation: Allows DP→CP intra-cluster traffic through nginx ingress by matching the internal service DNS name
  • gRPC configuration-snippet: Forwards identity headers (X-User-Subject, X-User-Claim-Identitytype) from auth subrequest to gRPC backends. Required for BYOC since Oct 2024, now ported to selfhosted chart.
  • Organizations service: New controlplane service for settings/org APIs. Required by CreateRun since #15107.
  • Fix organizations connectPort in configmap

Stacked on #349#348 → main

Migration Notes

New behavior: Intra-cluster DP→CP gRPC traffic now goes through nginx auth subrequests (via server-alias). This ensures consistent identity resolution for all callers. Previously, intra-cluster traffic could bypass auth if the :authority header didn't match the ingress host.

Action required: None for new deployments. Existing deployments will get the server-alias annotation on next helm upgrade.

Risk Assessment

Medium — Changes ingress behavior. Test: DP→CP traffic, browser login.

Test Plan

  • Verify DP→CP gRPC traffic still works through ingress
  • Verify browser login flow unchanged
  • Verify organizations service endpoints respond
  • helm template produces expected manifests

ref FAB-178, FAB-195

🤖 Generated with Claude Code

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates the selfhosted controlplane Helm chart to ensure intra-cluster DP→CP traffic is routed through the same ingress auth flow as external traffic, forwards identity headers to gRPC backends, and introduces a new organizations controlplane service required by newer API flows.

Changes:

  • Add nginx ingress server-alias for selfhosted intra-cluster deployments (AWS/GCP) so internal service DNS matches ingress rules.
  • Add nginx configuration-snippet on protected gRPC ingresses to forward auth subrequest identity headers to gRPC upstreams.
  • Add a new organizations service configuration (including DB + migration/init settings) and its connect port wiring.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.

File Description
charts/controlplane/values.yaml Adds gRPC identity-forwarding snippet under protected gRPC ingress annotations; introduces services.organizations default config.
charts/controlplane/values.gcp.selfhosted-intracluster.yaml Adds ingress server-alias annotation for intra-cluster host matching.
charts/controlplane/values.aws.selfhosted-intracluster.yaml Adds ingress server-alias annotation and expands inline rationale around ingress auth behavior.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +569 to +573
organizations:
fullnameOverride: "organizations"
sharedService:
connectPort: 8081
initContainers:
Copy link

Copilot AI Apr 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR changes the rendered Helm output (new organizations service + new protectedIngressAnnotationsGrpc snippet). The repo has snapshot-style Helm output tests under tests/generated driven by tests/run.sh; these fixtures will need to be regenerated/updated or CI diffs will fail for the controlplane.* test cases.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude confirms this is why the CI check is failing.

Comment on lines +261 to +265
nginx.ingress.kubernetes.io/configuration-snippet: |
auth_request_set $user_id $upstream_http_x_user_subject;
proxy_set_header X-User-Subject $user_id;
grpc_set_header X-User-Subject $user_id;

Copy link

Copilot AI Apr 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The gRPC identity-forwarding configuration-snippet currently forwards X-User-Subject and X-User-Claim-Identitytype, but it does not forward X-User-Token or X-User-Claim-Preferred-Username even though they are included in auth-response-headers. If the intent is that gRPC backends receive the same identity context as HTTP backends, add grpc_set_header/proxy_set_header entries for these headers as well (sourced from the auth subrequest response).

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are these headers actually needed?

Comment on lines +270 to +277
auth_request_set $user_handle $upstream_http_x_user_claim_userhandle;
proxy_set_header X-User-Claim-userhandle $user_handle;
grpc_set_header X-User-Claim-userhandle $user_handle;

auth_request_set $groups $upstream_http_x_user_claim_groups;
proxy_set_header X-User-Claim-groups $groups;
grpc_set_header X-User-Claim-groups $groups;

Copy link

Copilot AI Apr 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The snippet reads $upstream_http_x_user_claim_userhandle and $upstream_http_x_user_claim_groups, but those headers are not included in nginx.ingress.kubernetes.io/auth-response-headers for this ingress. As written they will always be empty; either add the corresponding headers to auth-response-headers (if /me returns them) or drop these auth_request_set/proxy/grpc_set_header lines to avoid a misleading no-op.

Suggested change
auth_request_set $user_handle $upstream_http_x_user_claim_userhandle;
proxy_set_header X-User-Claim-userhandle $user_handle;
grpc_set_header X-User-Claim-userhandle $user_handle;
auth_request_set $groups $upstream_http_x_user_claim_groups;
proxy_set_header X-User-Claim-groups $groups;
grpc_set_header X-User-Claim-groups $groups;

Copilot uses AI. Check for mistakes.
@jmonty42
Copy link
Copy Markdown
Contributor

According to Claude, you just need to run make generate-expected && make test and commit to get the failing check to pass.

@jmonty42
Copy link
Copy Markdown
Contributor

How much of this will change when Laura's Envoy change fully lands?

@mhotan mhotan force-pushed the mike/selfhosted-oidc-globals branch from 5afab1b to 351da2b Compare April 20, 2026 21:24
@mhotan mhotan force-pushed the mike/selfhosted-ingress-identity branch from 8c55d5f to 7a40222 Compare April 20, 2026 21:27
aviator-app Bot pushed a commit that referenced this pull request Apr 20, 2026
## Overview

Adds authorizer configuration defaults for selfhosted deployments and fixes dashboard panel issues.

- **defaultIdentityToSubject**: Config for non-Okta IdPs (FAB-189) that don't include `identitytype` claim natively. When enabled, subjects without identitytype are treated as users.
- **UserClouds client defaults**: Pre-configured connection settings in the controlplane values so terraform doesn't need deep-merge overrides.
- **Dashboard fixes**: Authorizer Mode panel value mappings for case sensitivity, metric name mismatches.

## Migration Notes

No migration required. Additive defaults only — existing deployments are not affected.

## Test Plan

- [x] `helm template` renders correctly
- [x] Verified on identity-testing environment

ref FAB-189
ref FAB-178

🤖 Generated with [Claude Code](https://claude.com/claude-code)

- `main` <!-- branch-stack -->
  - **Add selfhosted authorizer defaults and dashboard fixes** :point\_left:
    - \#349
      - \#350
        - \#351
          - \#352
            - \#353
              - \#354
aviator-app Bot pushed a commit that referenced this pull request Apr 20, 2026
## Overview

Adds OIDC/OAuth2 globals to support identity providers beyond Okta (Entra ID, Keycloak, Authentik).

**New globals** (all empty by default — Okta behavior unchanged):

- `OIDC_METADATA_URL` — metadata discovery endpoint (Okta: `.well-known/oauth-authorization-server`, Entra: `.well-known/openid-configuration`)
- `OIDC_APP_SCOPE` — resource scope for CLI/SDK and task pods (Entra: `api://my-app/.default`)
- `OIDC_APP_AUDIENCE` — audience parameter for PKCE flows (Entra: `api://my-app`)
- `OIDC_S2S_SCOPE` — service-to-service scope (Entra: `api://my-app/.default`)
- Dataplane S2S scope support via `OIDC_S2S_SCOPE`

Includes inline documentation with Okta/Entra ID examples, OAuth app flow references, and test fixture for non-Okta configuration.

## Migration Notes

No migration required. All new globals default to empty which preserves existing Okta behavior. Non-Okta IdPs can now be configured without terraform deep-merge overrides.

## Test Plan

- [x] Added `tests/values/controlplane.custom-oidc.yaml` test fixture
- [x] `helm template` renders correctly with both empty and non-empty globals
- [x] Verified on identity-testing environment with Entra ID

ref FAB-178

🤖 Generated with [Claude Code](https://claude.com/claude-code)

- `main` <!-- branch-stack -->
  - **Add OIDC globals for multi-IdP support (Okta, Entra ID, Keycloak)** :point\_left:
    - \#350
      - \#351
        - \#352
          - \#353
            - \#354
Base automatically changed from mike/selfhosted-oidc-globals to main April 20, 2026 22:28
mhotan and others added 5 commits April 21, 2026 10:50
- Add defaultIdentityToSubject config for non-Okta IdPs (FAB-189)
  that don't include identitytype claim natively
- Provide UserClouds client connection defaults in controlplane values
  so terraform doesn't need to deep-merge them
- Fix Authorizer Mode dashboard panel value mappings for case sensitivity
- Fix dashboard metric name mismatches and query bugs

No migration required. Additive defaults only — existing deployments
are not affected.

ref FAB-189
ref FAB-178

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
New globals with safe empty defaults:
- OIDC_METADATA_URL — metadata discovery endpoint (default: RFC 8414)
- OIDC_APP_SCOPE — resource scope for CLI/SDK PKCE and task pods
- OIDC_APP_AUDIENCE — audience parameter for PKCE flows
- OIDC_S2S_SCOPE — service-to-service client_credentials scope
- Dataplane S2S scope support

Includes documentation with Okta/Entra ID examples for each global,
OAuth app flow references, and OIDC test fixture.

## Migration Notes

No migration required. All new globals default to empty which preserves
existing Okta behavior. Non-Okta IdPs (Entra ID, Keycloak) can now
configure scopes and audience via these globals.

ref FAB-178

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
- Add defaultIdentityToSubject config for non-Okta IdPs (FAB-189)
  that don't include identitytype claim natively
- Provide UserClouds client connection defaults in controlplane values
  so terraform doesn't need to deep-merge them
- Fix Authorizer Mode dashboard panel value mappings for case sensitivity
- Fix dashboard metric name mismatches and query bugs

No migration required. Additive defaults only — existing deployments
are not affected.

ref FAB-189
ref FAB-178

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
New globals with safe empty defaults:
- OIDC_METADATA_URL — metadata discovery endpoint (default: RFC 8414)
- OIDC_APP_SCOPE — resource scope for CLI/SDK PKCE and task pods
- OIDC_APP_AUDIENCE — audience parameter for PKCE flows
- OIDC_S2S_SCOPE — service-to-service client_credentials scope
- Dataplane S2S scope support

Includes documentation with Okta/Entra ID examples for each global,
OAuth app flow references, and OIDC test fixture.

No migration required. All new globals default to empty which preserves
existing Okta behavior. Non-Okta IdPs (Entra ID, Keycloak) can now
configure scopes and audience via these globals.

ref FAB-178

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
- server-alias annotation: allows DP→CP intra-cluster traffic through
  nginx ingress by matching the internal service DNS name
- gRPC configuration-snippet: forwards identity headers (X-User-Subject,
  X-User-Claim-Identitytype) from auth subrequest to gRPC backends.
  Required for BYOC since Oct 2024, now ported to selfhosted chart.
- Organizations service: new controlplane service for settings/org APIs.
  Required by CreateRun since #15107.
- Fix organizations connectPort in configmap

## Migration Notes

**New behavior:** Intra-cluster DP→CP gRPC traffic now goes through nginx
auth subrequests (via server-alias). This ensures consistent identity
resolution for all callers. Previously, intra-cluster traffic could
bypass auth if the :authority header didn't match the ingress host.

**Action required:** None for new deployments. Existing deployments will
get the server-alias annotation on next helm upgrade.

ref FAB-178
ref FAB-195

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
@mhotan mhotan force-pushed the mike/selfhosted-ingress-identity branch from 7a40222 to 4025d7f Compare April 21, 2026 00:52
@aviator-app
Copy link
Copy Markdown
Contributor

aviator-app Bot commented Apr 21, 2026

Current Aviator status

Aviator will automatically update this comment as the status of the PR changes.
Comment /aviator refresh to force Aviator to re-examine your PR (or learn about other /aviator commands).

This PR was merged using Aviator.


See the real-time status of this PR on the Aviator webapp.
Use the Aviator Chrome Extension to see the status of your PR within GitHub.

@mhotan
Copy link
Copy Markdown
Contributor Author

mhotan commented Apr 21, 2026

/aviator merge

@aviator-app
Copy link
Copy Markdown
Contributor

aviator-app Bot commented Apr 21, 2026

Aviator has accepted the merge request. It will enter the queue when all of the required status checks have passed. Aviator will update the sticky status comment as the pull request moves through the queue.

@aviator-app aviator-app Bot merged commit 4c27345 into main Apr 21, 2026
5 checks passed
@aviator-app aviator-app Bot deleted the mike/selfhosted-ingress-identity branch April 21, 2026 01:52
aviator-app Bot pushed a commit that referenced this pull request Apr 21, 2026
## Summary

- Move all OIDC/OAuth2 auth configuration into a documented `flyte.configmap.adminServer.auth` block in base values.yaml with inline Okta/Entra ID examples
- Move `adminClient.connection` to base (was duplicated in AWS/GCP overlays)
- Wire `OIDC_S2S_SCOPE` for CP S2S auth (Entra requires `/.default`, Okta uses `all`)
- Add `OIDC_BROWSER_SCOPE` for Entra browser auth (AADSTS90009 fix)
- Deprecate scattered auth globals: `OIDC_BASE_URL`, `OIDC_CLIENT_ID`, `CLI_CLIENT_ID`
- Authz templates accept both `"Union"` and `"UserClouds"` type

Stacked on #350#349#348 → main

## Migration Notes

**Auth config location changed:** All auth config now lives in `flyte.configmap.adminServer.auth` in base values.yaml. Cloud overlays no longer set auth values — terraform authn modules output complete `app_auth`/`user_auth` blocks.

**Deprecated globals:** `OIDC_BASE_URL`, `OIDC_CLIENT_ID`, `CLI_CLIENT_ID` still work but are deprecated. New deployments should use the auth block directly.

**New globals:** `OIDC_S2S_SCOPE` (default: `all`), `OIDC_BROWSER_SCOPE`, `INTERNAL_SUBJECT_ID` (default: `INTERNAL_CLIENT_ID`).

**UserClouds → Union:** Authz templates accept `type: "Union"` alongside `"UserClouds"`. Both are equivalent; `"Union"` preferred for new deployments.

## Risk Assessment

**Medium** — Restructures auth config. Verified by `compare_manifests.py` producing zero structural diffs against baseline.

## Test Plan

- [ ] `helm template` with Okta defaults produces identical auth config
- [ ] `helm template` with Entra overrides produces correct scopes/audiences
- [ ] Verify deprecated globals still work for backward compatibility
- [ ] `compare_manifests.py` zero diffs against baseline

ref FAB-178, FAB-195

🤖 Generated with [Claude Code](https://claude.com/claude-code)

- `main` <!-- branch-stack -->
  - **Consolidate auth config into base values.yaml** :point\_left:
    - \#352
      - \#353
        - \#354
aviator-app Bot pushed a commit that referenced this pull request Apr 21, 2026
## Summary

- Move cloud-agnostic config from AWS/GCP overlays into base values.yaml
- **AWS overlay**: \~400 → 102 lines
- **GCP overlay**: 504 → 105 lines
- Base chart is now self-contained for selfhosted deployments
- Overlays reduced to only cloud-specific items: IAM, storage, region, scylla provisioner

Stacked on #351#350#349#348 → main

### What moved to base

- Namespace-derived FQDNs (`admin.endpoint`, `rootTenantURLPattern`)
- Ingress configuration (server-alias, protectedIngress annotations)
- ingress-nginx (enabled, ClusterIP, fullnameOverride)
- Monitoring (kube\* disabled, serviceMonitor enabled)
- Image repos (default: `registry.unionai.cloud/controlplane`)
- Secrets (union-operator, union-secrets)
- ScyllaDB generic config
- envoy-gateway defaults

### What stays in overlays (\~100 lines each)

- Cloud region, DB, storage bucket, IAM identifiers
- `flyte.storage` (type:s3/gcs, region/projectId)
- IAM annotations (IRSA / Workload Identity)
- scylla storageClass provisioner
- dataproxy endpoint, artifacts stow config

## Migration Notes

**Base chart is now self-contained:** New selfhosted deployments only need cloud-specific overlay values (IAM, storage, DB).

**Image repository default:** Base defaults to `registry.unionai.cloud/controlplane`. Internal deployments using ECR must set `IMAGE_REPOSITORY_PREFIX` via terraform.

**Namespace-derived FQDNs:** Services use `{{ .Release.Namespace }}` instead of hardcoded values. Resolves identically in standard deployments.

## Risk Assessment

**Higher** — Touches most of values.yaml. Verified by `compare_manifests.py` producing zero structural diffs against baseline from `mike/overlay-consolidated-backup-2026-04-20`.

## Test Plan

- [ ] `helm template` with AWS overlay produces identical manifests to baseline
- [ ] `helm template` with GCP overlay produces identical manifests to baseline
- [ ] `compare_manifests.py` zero diffs against baseline
- [ ] Deploy to identity-testing environment and verify all services healthy

ref FAB-178, FAB-195, FAB-276

🤖 Generated with [Claude Code](https://claude.com/claude-code)

- `main` <!-- branch-stack -->
  - **Move generic selfhosted config from overlays to base values.yaml** :point\_left:
    - \#353
      - \#354
aviator-app Bot pushed a commit that referenced this pull request Apr 21, 2026
## Summary

- Overhaul authorizer row in union-controlplane-overview dashboard for standardized `BackendMetrics`
- Fix metric name mismatches, query bugs, and Authorizer Mode value mappings
- Add `identity_type` breakdown (User/App/External/Unknown) to auth panels
- Add proper units (ops, ms, percentunit), thresholds, and zero-state handling
- Fix legendFormat double-escaping
- Remove "(V1 + V2)" labels from row titles
- Regenerate test snapshots

Stacked on #352#351#350#349#348 → main

## Migration Notes

**Dashboard auto-updates:** Deployed via ConfigMap sidecar — updates on next helm upgrade.

**New metrics required:** Panels reference `authz_backend_*` metrics from standardized `BackendMetrics`. Emitted by cloud v2026.4.x+. Older versions show "No data" for backend panels.

## Risk Assessment

**Low** — Dashboard JSON only + generated test snapshots.

## Test Plan

- [ ] Dashboard loads in Grafana without errors
- [ ] Panels show data with cloud v2026.4.x+
- [ ] Zero-state panels show "0" instead of "No data"
- [ ] `make generate-expected` matches committed snapshots

ref FAB-178

🤖 Generated with [Claude Code](https://claude.com/claude-code)

- `main` <!-- branch-stack -->
  - **Update authorizer dashboard: standardized backend metrics** :point\_left:
    - \#354
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants