feat(runs): populate RunDetails.executed_by from forwarded auth headers#7547
feat(runs): populate RunDetails.executed_by from forwarded auth headers#7547pingsutw wants to merge 10 commits into
Conversation
Auth now terminates at the load balancer (ALB), which validates the request and forwards the caller's identity to the runs service. Capture it at CreateRun and surface it as ActionMetadata.executed_by so RunDetails records who ran a task — previously always empty. - identity.go: subjectFromHeaders reads the OIDC subject from X-Amzn-Oidc-Identity (ALB authenticate-oidc / cookie path) or, failing that, the `sub` claim of the forwarded Authorization: Bearer token (JWT-validation / SDK path). The LB already validated, so the token is decoded but not re-verified. subjectOnlyIdentity builds a minimal EnrichedIdentity (subject only), mirroring the cloud transformer fallback. - DB: new nullable actions.created_by (VARCHAR(255)) storing the subject; written by CreateAction, read back into the model via SELECT *. - CreateRun threads the subject into persistRunModel; actionMetadataFromModel maps created_by -> ActionMetadata.executed_by. Signed-off-by: Kevin Su <pingsutw@apache.org>
There was a problem hiding this comment.
Pull request overview
This PR adds run attribution in the standalone runs service by capturing an authenticated subject from load-balancer-forwarded auth headers during CreateRun, persisting it to the actions table, and surfacing it back on reads as ActionMetadata.executed_by.
Changes:
- Add header/JWT-subject extraction helpers (
subjectFromHeaders,subjectFromBearer) and a subject-onlyEnrichedIdentityconstructor. - Persist the extracted subject to DB via new nullable
actions.created_by, and map it toActionMetadata.executed_bywhen building metadata. - Add a SQL migration to introduce the
created_bycolumn.
Reviewed changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| runs/service/run_service.go | Capture subject on create; persist CreatedBy; map DB created_by → executed_by on read. |
| runs/service/identity.go | Implement subject extraction from ALB header or Bearer JWT payload (decoded only). |
| runs/service/identity_test.go | Unit tests for header extraction precedence, trimming, and malformed/missing cases. |
| runs/repository/models/action.go | Add CreatedBy field to the Action/Run DB model. |
| runs/repository/impl/action.go | Extend INSERT INTO actions ... to write created_by. |
| runs/migrations/sql/20260618120000_add_actions_created_by.sql | Add created_by column to actions. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| // subjectFromHeaders extracts the authenticated subject (OIDC `sub`) from the auth | ||
| // headers the load balancer forwards. Auth is enforced upstream (e.g. ALB OIDC / | ||
| // JWT validation), so the claims are trusted and only decoded here — not | ||
| // re-verified. Returns "" when no authenticated identity is present. | ||
| func subjectFromHeaders(h http.Header) string { | ||
| // authenticate-oidc (browser/cookie) path: subject is forwarded directly. | ||
| if sub := strings.TrimSpace(h.Get(albIdentityHeader)); sub != "" { | ||
| return sub | ||
| } | ||
| // JWT (SDK/CLI) path: read the `sub` claim from the forwarded Bearer token. | ||
| return subjectFromBearer(h.Get(authorizationHeader)) | ||
| } |
There was a problem hiding this comment.
Fixed in e7dfbf10: gated behind Config.TrustForwardedIdentityHeaders (default true). When false, the forwarded Authorization / X-Amzn-Oidc-* headers are not trusted and executed_by is left unset, so a direct caller can't spoof attribution. The runs service does no authz of its own, so this flag is the explicit "prove the request came through a trusted proxy" guard — operators whose service can be reached without one set it false. See runs/config/config.go and run_service.go (s.trustHeaders).
| if action.CreatedBy.Valid { | ||
| metadata.ExecutedBy = subjectOnlyIdentity(action.CreatedBy.String) | ||
| } |
There was a problem hiding this comment.
Fixed: added TestActionMetadataFromModel_ExecutedBy (runs/service/executed_by_test.go) covering full identity from executed_by, fallback to subject-only from created_by, corrupt executed_by bytes, and the no-identity (nil) case.
…st subject
Decode the OIDC claims the load balancer forwards — sub, email, given_name,
family_name — and store the full common.EnrichedIdentity so RunDetails shows who
ran a task with their name and email, not just an opaque subject id. The standalone
runs service has no identity service to enrich a subject after the fact, so the
claims are captured at CreateRun from the token instead.
- identity.go: identityFromHeaders now decodes the signed x-amzn-oidc-data JWT
(cookie path) or the forwarded Bearer token (SDK path) into User.Id.Subject +
User.Spec{FirstName,LastName,Email}; still decoded, not re-verified. Falls back
to x-amzn-oidc-identity (subject only) when the data header is absent.
- DB: new actions.executed_by BYTEA holding the serialized EnrichedIdentity;
created_by keeps the bare subject for querying. CreateAction writes both.
- actionMetadataFromModel prefers the full executed_by, falling back to a
subject-only identity for rows written before this column existed.
Names require the IdP to return given_name/family_name (e.g. ALB auth-scope must
include `profile`); email needs the `email` scope.
Signed-off-by: Kevin Su <pingsutw@apache.org>
…arer path The SDK/CLI forwards an OAuth access token that carries only the subject — profile claims (email, given_name, family_name) live in userinfo, not the token. So on the Bearer path executed_by was subject-only. Add an identity enricher that calls the IdP userinfo endpoint with the caller's access token to fetch the profile, mirroring how cloud resolves a subject to a full user (cloud uses its identity service; the standalone runs service queries the IdP directly). - identity_enricher.go: resolves userinfo_endpoint from OIDC discovery (cached), GETs userinfo with the forwarded access token, caches subject->identity (10m TTL), and is fully best-effort — any timeout/error/missing-claim falls back to the subject-only identity and never blocks run creation. Driven by the existing runs.authMetadata.externalAuthServerBaseUrl; nil (no-op) when unset. - CreateRun enriches the parsed identity using the Bearer token or the ALB-forwarded X-Amzn-Oidc-Accesstoken. The cookie path already carries claims via x-amzn-oidc-data, so it short-circuits (hasProfile) without an extra call. - Removed the temporary AUTHDEBUG diagnostic. Signed-off-by: Kevin Su <pingsutw@apache.org>
| // accessTokenFromHeaders returns the caller's access token: the forwarded Bearer | ||
| // token (SDK/JWT path) or the ALB-provided access token (cookie path). | ||
| func accessTokenFromHeaders(h http.Header) string { | ||
| if authz := h.Get(authorizationHeader); len(authz) > len(bearerPrefix) && | ||
| strings.EqualFold(authz[:len(bearerPrefix)], bearerPrefix) { | ||
| return strings.TrimSpace(authz[len(bearerPrefix):]) | ||
| } | ||
| return strings.TrimSpace(h.Get(albAccessTokenHdr)) | ||
| } |
There was a problem hiding this comment.
By design accessTokenFromHeaders reads only the Bearer token — the cookie/ALB path isn't userinfo-enriched (its forwarded access token is already expired by request time; it uses the claims ALB injects into x-amzn-oidc-data, which is a complete profile, so enrich short-circuits on isCompleteProfile). And if a mismatched Bearer were ever present, its userinfo sub wouldn't match the caller's subject and is now rejected by the subject-mismatch guard (e7dfbf10). So the wrong profile can't be associated; I kept the single-source token read rather than adding ALB-vs-Authorization precedence.
| func (e *identityEnricher) cachedFor(subject string) *common.EnrichedIdentity { | ||
| e.mu.Lock() | ||
| defer e.mu.Unlock() | ||
| if c, ok := e.cache[subject]; ok && time.Now().Before(c.expires) { | ||
| return c.id | ||
| } | ||
| return nil | ||
| } |
There was a problem hiding this comment.
Fixed in e7dfbf10: cachedFor deletes the stale entry when it detects expiry, and store sweeps expired entries — dead keys no longer accumulate.
| func TestAccessTokenFromHeaders(t *testing.T) { | ||
| h := http.Header{} | ||
| h.Set("Authorization", "Bearer abc") | ||
| assert.Equal(t, "abc", accessTokenFromHeaders(h)) | ||
|
|
||
| h = http.Header{} | ||
| h.Set(albAccessTokenHdr, "alb-tok") | ||
| assert.Equal(t, "alb-tok", accessTokenFromHeaders(h)) | ||
|
|
||
| assert.Equal(t, "", accessTokenFromHeaders(http.Header{})) | ||
| } |
There was a problem hiding this comment.
Since accessTokenFromHeaders intentionally reads only the Bearer token (no ALB-vs-Authorization precedence — see the reply on the enricher), the behavior that actually guards against a mismatched token is the subject-mismatch rejection, which is covered by TestEnrich_RejectsSubjectMismatch. So I didn't add a mixed-header precedence assertion here.
…othing Previously the enricher skipped userinfo whenever the headers already supplied any profile field, so the cookie path kept its email but never gained a name, and the Bearer path got a name but no email. Now it fetches userinfo whenever the profile is incomplete and fills only the missing fields, keeping header-provided values authoritative. Claims are cached per subject (raw), so a partial profile still costs at most one userinfo call per subject per TTL. Result: full name + email on both paths, subject to the token's granted scopes. Signed-off-by: Kevin Su <pingsutw@apache.org>
The SDK forwards the ID token as `flyte-authorization: IDToken <jwt>`. Unlike the access token, the ID token carries the user's profile claims (email, given_name, family_name), so decoding it gives full attribution with no userinfo call. Checked before the ALB/Bearer paths; falls through to them when absent. Signed-off-by: Kevin Su <pingsutw@apache.org>
| // enrich fills any profile fields (email, first/last name) missing from base with | ||
| // userinfo claims fetched using the access token. Fields already present on base | ||
| // (e.g. from x-amzn-oidc-data) are authoritative and kept. userinfo is queried only | ||
| // when the profile is incomplete and not cached. base is returned unchanged on any | ||
| // miss or error — enrichment never blocks or fails run creation. |
There was a problem hiding this comment.
Fixed: the comment now states that enrich makes a synchronous userinfo call (bounded by userinfoHTTPTimeout) on cache miss, which adds latency to run creation, and that it is best-effort — it never fails run creation (rather than "never blocks").
| func (e *identityEnricher) cachedFor(subject string) *oidcClaims { | ||
| e.mu.Lock() | ||
| defer e.mu.Unlock() | ||
| if c, ok := e.cache[subject]; ok && time.Now().Before(c.expires) { | ||
| return c.claims | ||
| } | ||
| return nil | ||
| } |
There was a problem hiding this comment.
Fixed in e7dfbf10: expired entries are now deleted in cachedFor (on detection) and swept in store, so the map doesn't accumulate dead keys.
| // CreatedBy is the OIDC subject of the identity that created this run, captured | ||
| // from the auth headers the load balancer forwards. Kept for querying/filtering. | ||
| // NULL for runs created without an authenticated identity. |
There was a problem hiding this comment.
Clarified the field comment: created_by is intentionally not in ActionColumnsSet. It's persisted for internal querying/attribution only, not user-supplied list filters/sorts — left out to avoid widening the API surface. The comment now says to add it there only if API-level filtering is desired.
| if len(action.ExecutedBy) > 0 { | ||
| var id common.EnrichedIdentity | ||
| if err := proto.Unmarshal(action.ExecutedBy, &id); err == nil { | ||
| metadata.ExecutedBy = &id | ||
| } |
There was a problem hiding this comment.
Fixed: TestActionMetadataFromModel_ExecutedBy asserts (1) executed_by bytes populate metadata.executed_by, and (2) fallback to a subject-only identity from created_by when executed_by is empty, plus corrupt-bytes and no-identity cases.
…validated token The flyte-authorization ID token header is client-controlled and not validated by the load balancer, so decoding it without verifying the signature is an identity spoofing vector (flagged by security review). Revert reading it. The userinfo enricher stays the name/email source: it sends the LB-validated access token to the IdP, which validates it and returns the real claims — no spoofing possible. Also drop the cookie-path x-amzn-oidc-accesstoken fallback (expired by request time, only produced 401s); the cookie path uses the proxy-signed x-amzn-oidc-data claims. Signed-off-by: Kevin Su <pingsutw@apache.org>
AWS ALB's x-amzn-oidc-data JWT pads the base64url payload, which strict RawURLEncoding rejects — so the cookie/UI path silently dropped the user's claims (email, name) and fell back to subject-only. Decode with padding tolerance. This surfaced once `profile`/`email` scopes enlarged the payload so its length stopped being padding-free. Also removed the temporary AUTHDEBUG diagnostic. Signed-off-by: Kevin Su <pingsutw@apache.org>
| func (e *identityEnricher) cachedFor(subject string) *oidcClaims { | ||
| e.mu.Lock() | ||
| defer e.mu.Unlock() | ||
| if c, ok := e.cache[subject]; ok && time.Now().Before(c.expires) { | ||
| return c.claims | ||
| } | ||
| return nil | ||
| } |
There was a problem hiding this comment.
Fixed in e7dfbf10: cachedFor deletes the expired entry on detection and store sweeps expired entries, so the map no longer grows with dead keys.
| claims, err := e.fetchUserinfo(ctx, accessToken) | ||
| if err != nil { | ||
| logger.Warnf(ctx, "identity enrichment: userinfo fetch failed for subject %q: %v", subject, err) | ||
| return base | ||
| } | ||
| e.store(subject, claims) | ||
| return mergeClaims(base, claims) |
There was a problem hiding this comment.
Fixed in e7dfbf10: enrich now rejects a userinfo response whose sub doesn't match the caller's subject — it returns the base identity unchanged and caches/merges nothing, so a misconfiguration or token-confusion can't associate (or cache) the wrong profile.
| -- Add created_by to actions: the OIDC subject of the identity that created the run. | ||
| -- Captured from the auth headers the load balancer forwards (it enforces auth), | ||
| -- and used to populate ActionMetadata.executed_by on read. | ||
| ALTER TABLE actions ADD COLUMN IF NOT EXISTS created_by VARCHAR(255); |
There was a problem hiding this comment.
Fixed: the column is now TEXT (the migration comment notes the OIDC sub length is IdP-dependent and can exceed 255).
| // CreatedBy is the OIDC subject of the identity that created this run, captured | ||
| // from the auth headers the load balancer forwards. Kept for querying/filtering. | ||
| // NULL for runs created without an authenticated identity. | ||
| CreatedBy sql.NullString `db:"created_by" json:"created_by,omitempty"` |
There was a problem hiding this comment.
Intentional: created_by is persisted for internal querying/attribution, not API-level filtering, so it's deliberately omitted from ActionColumnsSet. I clarified the field comment to say so (add it there only if API filtering is wanted).
| // identityFromHeaders builds the EnrichedIdentity of the caller from the auth headers | ||
| // the load balancer forwards. Auth is enforced upstream (e.g. ALB OIDC / JWT | ||
| // validation), so the claims are trusted and only decoded here — not re-verified. | ||
| // Returns nil when no authenticated identity is present. | ||
| func identityFromHeaders(h http.Header) *common.EnrichedIdentity { |
There was a problem hiding this comment.
Fixed in e7dfbf10: added Config.TrustForwardedIdentityHeaders (default true). When false, the decoded-but-unverified forwarded JWT claims are not trusted and executed_by is left unset. We rely on the upstream proxy/LB to validate tokens and strip client-supplied headers rather than verifying against the issuer's JWKS here.
| // Capture who created the run from the auth headers the load balancer forwards | ||
| // (it enforces auth upstream). nil when there is no authenticated identity. On the | ||
| // Bearer path the token carries only the subject, so enrich name/email via userinfo. | ||
| executedBy := identityFromHeaders(req.Header()) | ||
| executedBy = s.enricher.enrich(ctx, accessTokenFromHeaders(req.Header()), executedBy) |
There was a problem hiding this comment.
The new mapping logic is covered by TestActionMetadataFromModel_ExecutedBy (full identity, subject-only fallback, corrupt bytes, none). The CreateRun → CreateAction wiring itself is thin — CreatedBy: nullStr(identitySubject(executedBy)) and ExecutedBy: the marshaled bytes. Happy to add a RunService-level test asserting the repo call args if you'd like, but the behavioral surface (parse + fallback) is exercised by the mapping test.
…d cache - Add Config.TrustForwardedIdentityHeaders (default true). executed_by is derived from the proxy-forwarded, unverified JWTs only when set; turn off where the service may be reached without a trusted proxy. - Enricher: reject userinfo responses whose `sub` doesn't match the caller (token confusion), and evict expired cache entries (lazily + swept on store) so the map can't grow unbounded. - Fix the misleading "never blocks" doc — userinfo is a synchronous call on cache miss. - created_by column: VARCHAR(255) -> TEXT (OIDC sub length is IdP-dependent). - Clarify created_by is intentionally not in the API filter allowlist. - Tests: actionMetadataFromModel (executed_by populate / created_by fallback / corrupt bytes / none) and enricher subject-mismatch rejection. Signed-off-by: Kevin Su <pingsutw@apache.org>
Signed-off-by: Kevin Su <pingsutw@apache.org>
| if len(action.ExecutedBy) > 0 { | ||
| var id common.EnrichedIdentity | ||
| if err := proto.Unmarshal(action.ExecutedBy, &id); err == nil { | ||
| metadata.ExecutedBy = &id | ||
| } | ||
| } | ||
| if metadata.ExecutedBy == nil && action.CreatedBy.Valid { | ||
| metadata.ExecutedBy = subjectOnlyIdentity(action.CreatedBy.String) | ||
| } |
There was a problem hiding this comment.
In practice the write path can't persist a subject-less executed_by: it's only set from identityFromHeaders, which returns nil unless a subject is present (identityFromJWT returns nil when sub == "", and subjectOnlyIdentity("") is nil), and enrich never strips the subject. So a parseable-but-subjectless executed_by isn't produced, and legacy NULL rows fall back to created_by. Glad to add an explicit id.GetUser().GetId().GetSubject() != "" check on the read side as defense-in-depth if you'd prefer.
| func (e *identityEnricher) store(subject string, claims *oidcClaims) { | ||
| e.mu.Lock() | ||
| defer e.mu.Unlock() | ||
| // Opportunistically evict any other expired entries to bound the map size. | ||
| now := time.Now() | ||
| for k, c := range e.cache { | ||
| if now.After(c.expires) { | ||
| delete(e.cache, k) | ||
| } | ||
| } | ||
| e.cache[subject] = cachedClaims{claims: claims, expires: time.Now().Add(identityCacheTTL)} | ||
| } |
There was a problem hiding this comment.
e7dfbf10 evicts expired entries (in cachedFor and store), so dead keys don't accumulate. I didn't add a hard max-size / LRU: the cache only holds distinct run-creators within the 10m TTL (bounded in practice) and entries are swept on expiry. A bounded/LRU cache is a reasonable follow-up if we see high-cardinality bursts within the TTL — can add if you think it's warranted.
| -- Add created_by to actions: the OIDC subject of the identity that created the run. | ||
| -- Captured from the auth headers the load balancer forwards (it enforces auth), | ||
| -- and used to populate ActionMetadata.executed_by on read. | ||
| -- TEXT, not VARCHAR(n): the OIDC `sub` length is IdP-dependent and can exceed 255. | ||
| ALTER TABLE actions ADD COLUMN IF NOT EXISTS created_by TEXT; |
There was a problem hiding this comment.
Done: the schema is TEXT (the right choice given the IdP-dependent sub length), and I've updated the PR description to say created_by TEXT so it matches the migration.
…ate column) created_by stored the bare subject, which is already inside the executed_by EnrichedIdentity blob, and nothing queried it (not in ActionColumnsSet). The subject-only fallback it backed was unreachable too — both columns shipped in the same change, so no row ever had executed_by empty while created_by was set. Drop the column, its migration, the model field, the INSERT param, the fallback branch, and the identitySubject helper. executed_by is now the single source of attribution. Signed-off-by: Kevin Su <pingsutw@apache.org>
Why are the changes needed?
With auth at the load balancer (it validates the request and forwards the caller's identity), the runs service can record who ran a task.
RunDetails.action.metadata.executed_bywas never populated.What changes were proposed in this pull request?
Capture the caller's identity at
CreateRunand surface it on read asActionMetadata.executed_by.identity.go—identityFromHeadersbuilds anEnrichedIdentityfrom the auth headers the LB forwards:X-Amzn-Oidc-DataJWT (sub, email, given_name, family_name); falls back toX-Amzn-Oidc-Identity(subject only).Authorizationaccess token's claims.identity_enricher.go— the SDK/CLI forwards an OAuth access token, which carries only the subject (profile claims live in userinfo, not the token). So on the Bearer path the enricher calls the IdP userinfo endpoint with the forwarded access token to fetch email/name. This is the OSS analog of how cloud resolves a subject to a full user (cloud uses its identity service; the standalone runs service queries the IdP directly). It resolvesuserinfo_endpointfrom OIDC discovery (cached), cachessubject→identity(10m TTL), and is best-effort — any timeout/error/missing-claim falls back to subject-only and never blocks run creation. Driven byruns.authMetadata.externalAuthServerBaseUrl; a no-op when unset. The cookie path already has claims viax-amzn-oidc-data, so it short-circuits without a userinfo call.actions.executed_by BYTEAholds the serializedEnrichedIdentity;created_by TEXTkeeps the bare subject for querying. Read prefers the full identity, falls back to subject-only for pre-existing rows.How was this patch tested?
Unit: header decode (cookie/bearer/precedence/malformed), DB round-trip; enricher userinfo fetch,
subject→identitycaching (no repeat calls), no-op cases, and error fallback (httptest IdP). Build +runs/service+runs/repository/implsuites pass.Live on the
flyte-developmentALB cluster (Okta auth):executed_byunion runsubject+ firstName "Kevin", lastName "Su" (enriched via userinfo; was subject-only before)subject+email kevin@union.ai(fromx-amzn-oidc-data)Labels
Check all the applicable boxes