Add security-2 documentation section by ppiegaze · Pull Request #907 · unionai/unionai-docs

ppiegaze · 2026-04-10T09:39:13Z

Summary

Adds draft security-2/ section with 45 pages covering architecture, auth, compliance, encryption, network security, operations, secrets, and reference material
Organized into 8 subsections for the restructured security documentation

Test plan

Review content for accuracy
Verify Hugo build (make dist)
Check variant frontmatter on all pages
Validate internal links

🤖 Generated with Claude Code

cloudflare-workers-and-pages · 2026-04-10T09:40:27Z

Deploying docs with Cloudflare Pages

Latest commit:	`d3b3f06`
Status:	✅ Deploy successful!
Preview URL:	https://319b0bf7.docs-dog.pages.dev
Branch Preview URL:	https://peeter-security-2.docs-dog.pages.dev

View logs

Draft security documentation covering architecture, auth, compliance, encryption, network security, operations, secrets, and reference material. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>

Rewrite security-2 section for conciseness (~26% line reduction) while preserving all information. Replace duplicated tables/paragraphs with cross-reference links, tighten prose to match user-guide style, and add Cloudflare tunnel firewall configuration link. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>

Rename security-2/ to security/, replacing the original flat file layout with the new organized subsection structure (architecture, auth, compliance, keys, network, operations, reference, secrets). Co-Authored-By: Claude Opus 4.6 <[email protected]>

New documentation section organized under five top-level categories: - Architecture: two-plane separation, control/data plane, network, deployment models - Data: classification/residency, data flow, encryption, secrets, workflow lifecycle - Access: authentication, RBAC, tenant isolation, human access controls - Compliance: certifications, HIPAA, GDPR, standards, shared responsibility - Operations: logging/audit, vulnerability mgmt, threat modeling, org security Each page includes prose content expanding the topic tree claims plus Verification sections with reviewer focus ratings and concrete CLI commands for security reviewers to independently confirm claims. 31 files total (6 section indexes + 25 content pages). Co-Authored-By: Claude Opus 4.6 <[email protected]>

Moves the new topic-tree-based security content from security-2/ into security/, replacing the old flat subsection structure (auth/, keys/, network/, secrets/, reference/) with the new five-category organization (architecture/, data/, access/, compliance/, operations/). Co-Authored-By: Claude Opus 4.6 <[email protected]>

Co-Authored-By: Claude Opus 4.6 <[email protected]>

…perations/Compliance order Co-Authored-By: Claude Opus 4.6 <[email protected]>

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>

Annotate security documentation with WARNING and NOTE callouts based on source code audit of unionai/cloud and flyteorg/flyte-sdk repos. Key findings: structured task I/O transits control plane memory, task definition closures contain potentially sensitive fields, log streams pass through unredacted, and tenant isolation has identified gaps. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>

Include task function names, workflow names, execution names, user identity, and other identifier columns alongside the closure blob contents already listed. Note encryption at rest (AES-256/KMS). Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>

Break Data summary into three bullets distinguishing bulk data (never enters CP), inline data (transits CP transiently), and CP database metadata. Reformat all overview sections as heading + bullet list. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>

Replace v1 terminology (task definition closures, launch plan specs, FlyteAdmin database) with v2 equivalents (TaskSpec/RunSpec blobs, triggers, three CP databases). Note that v2 sends full TaskSpec inline on every run and stores across PostgreSQL + 2x Cassandra. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>

Make the current architecture the default — no version labels or contrasts with legacy behavior. Union.ai is simply Union.ai. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>

Replace 39 WARNING/NOTE callouts across 20 files with accurate inline prose. Distinguish bulk data (never enters CP, presigned URLs), inline data (transits CP memory, encrypted in transit, not persisted), and metadata (stored in CP databases, encrypted at rest). Be precise about encryption state at each phase. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>

- data-flow.md: per-hop encryption tables for all three data flow patterns, plus data flow path diagrams - classification-and-residency.md: encryption columns in classification table (at rest, in transit, enters CP memory) - control-plane.md: TaskSpec field enumeration table with sensitivity classifications - secrets.md: per-phase encryption table for secret creation lifecycle - encryption.md: comprehensive data protection summary table covering every data category across all phases Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>

- Remove importance labels (Critical/High/Medium/Low) from all 37 verification headings across 25 files - Fix verification steps that claimed "no customer data in CP" to reflect the three-tier model (bulk/inline/metadata) - Update tunnel traffic verification to acknowledge structured I/O transits the tunnel (not just "metadata-sized" traffic) - Update control-plane verification to reference TaskSpec field table - Fix workflow-data-flow retrieval step to distinguish binary vs structured outputs Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>

- "State Service" → "Actions Service" (actual code name) - "workflow execution reaches a point" → "run requires a task to execute" - "execution graph" → "run state" - "Task registration" section merged into "Task deployment and run creation" (tasks are sent inline with each run, not registered separately) - "register tasks" → "deploy tasks" in RBAC table Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>

Replace 103 occurrences of space-dash-dash-space across 27 files with colons, parentheses, periods, or restructured sentences. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>

Document the optional service where customers can grant Union.ai staff time-limited RBAC access to their view of the system for troubleshooting. Distinguish from BYOC K8s cluster management access. Available for both self-managed and BYOC deployments. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>

Reorganize from 5 sections to 4 sections + 1 standalone page: - data/ → data-protection/ (renamed; gains logging-and-audit from ops) - access/ → identity-and-access/ (renamed) - operations/ → eliminated (content distributed) - logging-and-audit → data-protection/ - threat-modeling → threat-model.md (top-level, high visibility) - organizational-security → compliance/ - vulnerability-management → compliance/ - _index.md benefits table → architecture/_index.md - compliance/ → "Compliance and governance" (gains 2 pages from ops) New structure: Architecture (planes, network, deployment, tunnel) Data protection (classification, flow, encryption, secrets, logging) Identity and access (auth, RBAC, tenant isolation, human access) Threat model (standalone page, promoted for visibility) Compliance and governance (certs, HIPAA, GDPR, org security, vuln mgmt) All cross-references updated. No broken links. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>

Replaces internal RPC method names, service identifiers, and protocol framework references in security docs with audience-appropriate generic terms (intentional names retained only in the Components section). Refines factual claims on user records and task metadata fields against source code, adds a verification step demonstrating customer key authority over bulk data, and tightens several long sentences. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>

Establishes data-protection/ as the canonical source for data classification, residency, encryption, and flow patterns. Architecture pages describe structure (planes, components, network paths, deployment models) and link to data-protection for residency facts rather than restating them. - Trim residency restatements from architecture/control-plane.md, data-plane.md, network.md, two-plane-separation.md - Replace specific datastore mentions in control-plane.md with generic phrasing and link to encryption page - Standardize size limits to 10 MiB submission / 20 MiB retrieval (drop "MB" and "10-20 MiB" combined phrasings) - Fix deployment-models.md "AES-256 for all data" claim - Trim workflow-data-flow.md restatements with link to classification-and-residency - Apply user's semantic-line-break edits to security/_index.md Co-Authored-By: Claude Opus 4.6 <[email protected]>

- Rewrite architecture/_index.md: intro paragraph reflecting all six subsections, followed by a single bulleted list with one bullet per subsection - Move "Customer authority over data" CMK-disable test from architecture/two-plane-separation.md to data-protection/encryption.md (where customer-managed keys belong topically) - Remove the Verification section from architecture/two-plane-separation.md; the residency portion is covered (better) in data-protection/classification-and-residency.md - Apply user's "What it does and does not store" restructuring in control-plane.md Co-Authored-By: Claude Opus 4.6 <[email protected]>

Replace the named internal microservice breakdown (Admin, Queue Service, Actions Service, Cluster Service, DataProxy) with a capability-level list. The internal microservice decomposition isn't directly verifiable by reviewers (control plane runs on Union.ai infrastructure) and naming each service discloses implementation detail without aiding security review. Co-Authored-By: Claude Opus 4.6 <[email protected]>

- Add protobuf schema review step (TaskTemplate, RunSpec) to data-protection/classification-and-residency.md, where the field enumeration lives - Remove Verification section from architecture/control-plane.md; "What it stores" claims are verified in classification-and-residency, and Infrastructure / SOC 2 verification is already covered in compliance/certifications.md Co-Authored-By: Claude Opus 4.6 <[email protected]>

- Move Data flow cross-reference to end of Components (was orphaned mid-list) - Collapse duplicate workload identity description in Kubernetes security to a one-liner pointing at IAM and workload identity - Trim duplicate Image Builder paragraph at the start of Container security - Drop "operates as a standard Kubernetes controller" redundancy in Executor - Drop "natural" qualifier in object store layout Co-Authored-By: Claude Opus 4.6 <[email protected]>

A reviewer noted that the data plane initiates two distinct outbound-only channels to the control plane, not one. Verified against unionai/cloud source code: a Cloudflare Tunnel (cloudflared sidecar) and a separate direct gRPC connection (operator/propeller dialing the regional cloudUrl). The tunnel handles reverse-proxy traffic from CP to DP services (DataProxy, log streaming, ingress); the gRPC channel handles orchestration RPCs (cluster registration, action lifecycle, events, catalog, admin). network.md changes: - Top-of-page intro mentions both channels - New "Direct gRPC connection" section - Tunnel traffic list corrected: "Orchestration instructions" and "State transitions" moved from tunnel to gRPC (they ride the gRPC channel, not the tunnel) - Added "Apps & Serving ingress" to tunnel traffic list - Communication paths table split into two cross-plane rows - Verification reviewer focus expanded to cover both channels data-plane.md: Tunnel Service component now notes the separate gRPC connection alongside. deployment-models.md: self-managed "only connection is Cloudflare Tunnel" corrected to two outbound-only channels. Also folds in earlier proofreading nits on network.md: - Container images bypass via container registry pull, not presigned URLs - Region table simplified (removed inconsistent placeholder Domain column) - "VPN configuration is needed" -> "are needed" - "simplified to permitting outbound HTTPS" rephrased - "rotate implicitly" / "operator polling" rephrased to plain language - Bidirectional tunnel-traffic note added before list - Communication paths direction notation corrected (DP-initiated) - Removed weak "request or build a tunnel audit mode" verification step Co-Authored-By: Claude Opus 4.6 <[email protected]>

- Correct claim that orchestration traffic flows through the Cloudflare Tunnel; orchestration RPCs ride the direct gRPC channel. Defer data and orchestration details to network.md instead of restating (incorrectly) here. - Add GCP/Azure equivalents to the AWS endpoint-listing verification command. Co-Authored-By: Claude Opus 4.6 <[email protected]>

- Common properties bullet now reflects both outbound channels (Cloudflare Tunnel and direct gRPC), not just the tunnel - Resilience claim: "If the tunnel connection drops" -> "If either outbound channel drops" to match the two-channel reality - Verification: rephrase "Simulate a control plane outage by disconnecting the tunnel" to "Simulate a connectivity disruption" since scaling down the Tunnel Service alone doesn't disable the direct gRPC channel - Self-managed: clarify that the eliminated third-party access is to the data plane infrastructure specifically Co-Authored-By: Claude Opus 4.6 <[email protected]>

Across all 8 pages in content/security/data-protection/: Factual / structural fixes: - logging-and-audit.md weight: 1 -> 7 (was conflicting with classification-and-residency) - Persisted logs no longer listed as living in S3 (they're in CloudWatch / Cloud Logging / Azure Monitor only) - Generic "Data plane object store" replaces AWS-specific "Data plane S3" - Outdated "Stackdriver" replaced with "Cloud Logging" throughout - Stale cross-references to control-plane.md retargeted: field enumeration now lives in classification-and-residency.md - "Two-plane separation" link replaced with "Network architecture" where the topic is network paths, and "Data flow" where the topic is log flow - workflow-data-flow.md "two data flow patterns" -> "data flow patterns" (there are three) Two-channel followups: - encryption.md "Encryption in transit" lists both Cloudflare Tunnel and direct gRPC channels - classification-and-residency.md orchestration metadata transit column: "TLS+mTLS+tunnel (events)" -> "TLS (gRPC events)" (events ride gRPC) - encryption.md data protection summary: orchestration rows now show "TLS (gRPC)" instead of "TLS/mTLS/tunnel" - multi-cloud.md cross-plane connectivity defers to network.md and acknowledges both channels Tech generalization (matching control-plane.md treatment): - All explicit "PostgreSQL" / "Cassandra" / "AWS RDS" references in control-plane storage tables removed; replaced with "control plane databases (AES-256/KMS)" or "managed cloud database service" - "ClickHouse" replaced with "Observability metrics store (per-cluster)" in encryption-at-rest table Other: - _index.md long sentence split into bulleted list of three patterns - encryption.md drops defensive "This is standard for any service that processes data" - data-flow.md nested parens in bulk-data list flattened - logging-and-audit.md audit verification section restructured: drops invented "union security audit" command, lists actual sources today - logging-and-audit.md sentence-fragment "Self-service verification using existing features." replaced with standard phrasing - secrets.md verification: competitor comparison moved out of the test step into a closing note Co-Authored-By: Claude Opus 4.6 <[email protected]>

authentication.md: - Drop "(Okta)" from the OIDC method row; any OIDC/SAML 2.0 provider works - Rephrase confusing primary-IdP statement; customers configure their own identity provider - Fix "a MFA prompt" -> "an MFA prompt" - Standardize closing self-service note to match other pages human-access.md: - Disambiguate "control plane tenant" (was ambiguous between customer tenant and Union.ai-hosted infrastructure); clarify it's the Union.ai- hosted control plane - Capitalize "Helm" - Replace "the customer's own view of the system" with "the customer's tenant for troubleshooting" - Scope "Access scope" section to the actual conditions under which Union.ai personnel access a customer's tenant (BYOC or optional support service); it had implied routine access - Collapse redundant "cloud account or IAM roles, or access customer object stores..." into one list - Self-managed verification: reflect both outbound channels (Cloudflare Tunnel and direct gRPC), not just the tunnel rbac.md: - Standardize "# Expect ..." comment alignment - "a RBAC policy" -> "an RBAC policy" - Standardize closing self-service note tenant-isolation.md: - Rename "## Isolation verification" -> "## Defense in depth" to avoid two H2s called "verification" - Lowercase "protobuf" (it's the term, not a proper name) Co-Authored-By: Claude Opus 4.6 <[email protected]>

The previous version restated impact analysis from Architecture, Encryption, and Data flow, which caused drift (size limits, single-channel framing, stale "see X for verification" links pointing at sections that had been removed or moved). The rewrite enumerates principal threat scenarios as one-paragraph framings with links to the canonical pages where the controls and verification live. Coverage expanded from 3 scenarios to 5: control plane compromise, cross- plane network interception, presigned URL leakage, secret exfiltration, and cross-tenant data access. The Verification section is removed; verification now lives entirely on each canonical page (no more "see referenced sections" indirection). Page stays at top-level so it remains where security reviewers expect to find a threat-modeling artifact. Co-Authored-By: Claude Opus 4.6 <[email protected]>

Weight collisions and order: - vulnerability-management.md weight 2 -> 7 (was colliding with hipaa.md) - organizational-security.md weight 4 -> 6 (was colliding with standards.md) - Order now matches the section index in _index.md certifications.md: - SOC 2 Type I trust criteria: "Integrity" -> "Processing Integrity" (the official trust service criterion name; matches Type II) - "70+ verified controls" -> "73 verified controls" (exact) gdpr.md: - EU Central -> EU Central (Frankfurt) for symmetry with the other regions - Stale verification link to two-plane-separation retargeted to data-protection/classification-and-residency (which now owns the residency verification) - "For details" link similarly retargeted hipaa.md: - Drop "container images" from the bulk-PHI list (container images aren't typical PHI containers) - Pronoun agreement: "If these contain PHI, it would be persisted" -> "If they contain PHI, they would be persisted" - Restructured to consolidate the duplicate residency claim that previously appeared in both opening and closing paragraphs - Stale verification link retargeted to classification-and-residency standards.md: - "complies with" -> "aligns with" (ISO 27001 is not a current Union.ai certification, per certifications.md) - Corrected ISO 27001:2022 control titles: * A.8.20 "Network security" -> "Networks security" (official plural) * Replaced A.8.28 "Secure configuration" (actual title is "Secure coding") with A.8.22 "Segregation of networks" (which actually fits the described control about management plane separation) * A.8.21 "Cryptography" -> A.8.24 "Use of cryptography" (8.21 is actually "Security of network services"; cryptography is 8.24) * A.5.23 "Cloud service security" -> "Information security for use of cloud services" (official title) - Replaced specific CIS v8 sub-control numbers (4.4, 12.11, 13.2 -- some of which I couldn't verify against the official CIS Controls v8) with alignment to top-level CIS controls 12 and 13 vulnerability-management.md: - Cloudflare row "Tunnel connectivity" -> "Cross-plane connectivity (Tunnel and gRPC ingress)" to reflect that both outbound channels terminate at Cloudflare's edge architecture/private-connectivity.md (consistency with standards.md): - Same ISO/CIS control title corrections applied Co-Authored-By: Claude Opus 4.6 <[email protected]>

- data-flow.md: replace the 3-pattern ASCII diagram with three prose bullet points describing each pattern's flow and encryption - data-flow.md: convert the presigned URL phase table to bullets - Apply user edits to deployment-models.md (BYOC ordered before Self-managed) and data-protection/_index.md Co-Authored-By: Claude Opus 4.6 <[email protected]>

Co-Authored-By: Claude Opus 4.6 <[email protected]>

Copilot AI review requested due to automatic review settings April 10, 2026 09:39

ppiegaze requested review from EngHabu, cosmicBboy, kumare3 and samhita-alla as code owners April 10, 2026 09:39

Copilot started reviewing on behalf of ppiegaze April 10, 2026 09:39 View session

ppiegaze and others added 2 commits April 15, 2026 13:53

Add security-2 documentation section

a3d1bba

Draft security documentation covering architecture, auth, compliance, encryption, network security, operations, secrets, and reference material. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>

ppiegaze force-pushed the peeter/security-2 branch from 7a931bd to 3bac80e Compare April 15, 2026 11:53

ppiegaze and others added 20 commits April 17, 2026 12:28

Add missing H1 headings to security content pages

9dd77e7

Co-Authored-By: Claude Opus 4.6 <[email protected]>

Rewrite security index: replace Core principles with Overview, swap O…

48f9b75

…perations/Compliance order Co-Authored-By: Claude Opus 4.6 <[email protected]>

Clarify Architecture and Data summaries in security index

74930c2

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>

Remove trailing periods from section link headings in security index

93841ff

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>

Remove v1/v2 contrast language from annotations

2fa85f5

Make the current architecture the default — no version labels or contrasts with legacy behavior. Union.ai is simply Union.ai. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>

Remove all em-dash (--) usage from security section

f65e947

Replace 103 occurrences of space-dash-dash-space across 27 files with colons, parentheses, periods, or restructured sentences. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>

Merge branch 'main' into peeter/security-2

ea3b9e3

ppiegaze added security Secutiry section. Do not merge do-not-merge PR is ready for review, but should not be merged just yet labels Apr 23, 2026

ppiegaze mentioned this pull request Apr 23, 2026

Fix RBAC table inconsistencies in security docs #876

Closed

2 tasks

ppiegaze and others added 19 commits April 28, 2026 12:35

Merge remote-tracking branch 'origin/main' into peeter/security-2

9367118

Merge branch 'main' into peeter/security-2

222344b

clean up

20b0d34

typo

5678f70

Split inline proxy prose into shorter paragraphs

d3b3f06

Co-Authored-By: Claude Opus 4.6 <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add security-2 documentation section#907

Add security-2 documentation section#907
ppiegaze wants to merge 41 commits intomainfrom
peeter/security-2

ppiegaze commented Apr 10, 2026

Uh oh!

cloudflare-workers-and-pages Bot commented Apr 10, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

ppiegaze commented Apr 10, 2026

Summary

Test plan

Uh oh!

cloudflare-workers-and-pages Bot commented Apr 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Deploying docs with Cloudflare Pages

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

cloudflare-workers-and-pages Bot commented Apr 10, 2026 •

edited

Loading