Add dual staging environment infrastructure (pathroute & subdomain)#542
Add dual staging environment infrastructure (pathroute & subdomain)#542
Conversation
This adds infrastructure for two parallel staging environments that deploy from the main branch: - staging-pathroute: Uses path-based routing (e.g., /api/automation) - staging-subdomain: Uses subdomain-based routing (production pattern) Changes: - Add GitHub Actions workflow for dual environment deployment - Supports deploying to both, pathroute only, or subdomain only - Includes SOPS secret decryption and Helm deployment - Dry-run mode for validation - Add envs/staging-pathroute/ with values.yaml and README - Add envs/staging-subdomain/ with values.yaml and README - Add .sops.yaml for GCP KMS encryption of secrets - Add helper scripts for secret management Co-authored-by: openhands <openhands@all-hands.dev>
all-hands-bot
left a comment
There was a problem hiding this comment.
🔴 Needs improvement - Fundamental data structure and DRY violations
This infrastructure change solves a real problem (testing dual routing patterns) but introduces massive code duplication that will become a maintenance nightmare. The workflow and configuration files violate basic engineering principles.
|
|
||
| echo "" | ||
| echo "Current pod status:" | ||
| kubectl get pods -n ${{ env.NAMESPACE }} -l app.kubernetes.io/instance=${{ env.HELM_RELEASE }} |
There was a problem hiding this comment.
🔴 Critical - Data Structure: Jobs deploy-pathroute and deploy-subdomain are 99% identical (150+ lines duplicated). This is exactly the wrong data structure.
The right way: Use a matrix strategy:
jobs:
deploy:
strategy:
matrix:
env:
- name: pathroute
namespace: openhands-pathroute
helm_release: openhands-pathroute
env_dir: envs/staging-pathroute
- name: subdomain
namespace: openhands-subdomain
helm_release: openhands-subdomain
env_dir: envs/staging-subdomain
env:
NAMESPACE: ${{ matrix.env.namespace }}
HELM_RELEASE: ${{ matrix.env.helm_release }}
ENV_DIR: ${{ matrix.env.env_dir }}
# ... rest of steps (once)Then add conditional matrix execution:
if: ${{ inputs.environment == 'both' || inputs.environment == matrix.env.name }}This eliminates 150 lines of duplication and makes adding new environments trivial.
There was a problem hiding this comment.
✅ Addressed - Refactored to use matrix strategy. The workflow now uses:
strategy:
matrix:
env:
- name: pathroute
namespace: openhands-pathroute
...
- name: subdomain
namespace: openhands-subdomain
...
if: ${{ inputs.environment == 'both' || inputs.environment == matrix.env.name }}This eliminated 150+ lines of duplication. See commit 4dd4352.
This comment was written by an AI assistant (OpenHands).
| @@ -0,0 +1,322 @@ | |||
| # Staging (Subdomain-Based Routing) environment values for OpenHands | |||
There was a problem hiding this comment.
🔴 Critical - Code Duplication: This file is 99% identical to envs/staging-pathroute/values.yaml (only ingress.host and a few URLs differ).
Bad: 600+ lines of duplicated configuration that will drift and cause bugs.
Good: Use Helm value inheritance:
# envs/common/values.yaml (shared base)
imagePullSecrets: [...]
databaseMigrations: [...]
# ... all common config
# envs/staging-pathroute/values.yaml
ingress:
host: staging-pathroute.all-hands.dev
automation:
openhandsApiUrl: "https://staging-pathroute.all-hands.dev"
# envs/staging-subdomain/values.yaml
ingress:
host: staging-subdomain.all-hands.dev
automation:
openhandsApiUrl: "https://staging-subdomain.all-hands.dev"Then deploy with:
helm upgrade --install ... \
-f envs/common/values.yaml \
-f envs/staging-pathroute/values.yamlThis is not optional. Duplicating 600 lines of config is unacceptable.
There was a problem hiding this comment.
✅ Addressed - Created envs/common/values.yaml with 315 lines of shared configuration. Environment-specific files are now only 11 lines each containing just the differences:
envs/common/values.yaml- 315 lines (shared base config)envs/staging-pathroute/values.yaml- 11 lines (only ingress.host and URLs)envs/staging-subdomain/values.yaml- 11 lines (only ingress.host and URLs)
The workflow deploys with:
helm upgrade --install ... \
-f envs/common/values.yaml \
-f ${{ matrix.env.env_dir }}/values.yamlSee commit 4dd4352.
This comment was written by an AI assistant (OpenHands).
| integrations: | ||
| annotations: | ||
| cert-manager.io/cluster-issuer: letsencrypt | ||
| mcp: |
There was a problem hiding this comment.
🟡 Suggestion - Clarity: This comment says "disable path-based ingresses" but then immediately configures them with annotations. Either they're disabled (remove them) or they're enabled (fix the comment). Which is it?
There was a problem hiding this comment.
✅ Addressed - The confusing comment about "disable path-based ingresses" has been removed. The new envs/staging-subdomain/values.yaml is now clean and only contains the actual differences:
# Staging (Subdomain-Based Routing) environment values
# Uses subdomain-based routing: automation.staging-subdomain.all-hands.dev
# Base config: envs/common/values.yaml
ingress:
host: staging-subdomain.all-hands.dev
automation:
openhandsApiUrl: "https://staging-subdomain.all-hands.dev"
automationBaseUrl: "https://staging-subdomain.all-hands.dev"See commit 4dd4352.
This comment was written by an AI assistant (OpenHands).
| @@ -0,0 +1,20 @@ | |||
| #!/bin/bash | |||
There was a problem hiding this comment.
🟡 Suggestion - Unnecessary Abstraction: This script is a 5-line wrapper around sops --decrypt. It adds no value - just run sops --decrypt $file directly.
Same for safe-apply-secrets.sh - it's just kubectl apply. Why wrap it?
If you must have scripts, they should solve real problems (error handling, validation, complex logic), not be trivial wrappers.
There was a problem hiding this comment.
✅ Addressed - Both scripts/deploy/decrypt.sh and scripts/deploy/safe-apply-secrets.sh have been deleted. The workflow now runs the commands directly:
- name: Decrypt and apply secrets
run: |
for file in "$SECRETS_DIR"/*.yaml; do
sops --decrypt "$file" | kubectl apply -n ${{ env.NAMESPACE }} -f -
doneSee commit 4dd4352.
This comment was written by an AI assistant (OpenHands).
…mon values - Refactor workflow to use matrix strategy (eliminates ~150 lines duplication) - Extract common staging config to envs/common/values.yaml - Slim down environment-specific values to only host/URL overrides - Remove unnecessary wrapper scripts (decrypt.sh, safe-apply-secrets.sh) - Update READMEs to document new structure Co-authored-by: openhands <openhands@all-hands.dev>
- deploy.sh: Script to deploy OpenHands Cloud to shared testbed - setup-shared-cluster.sh: One-time cluster setup script - README.md: Comprehensive guide for team members The testbed is intentionally private (uses /etc/hosts for DNS). Each developer can deploy to their own namespace. Co-authored-by: openhands <openhands@all-hands.dev>
Document the strategy for 4 staging environments: - 2 CI environments (pathroute + subdomain routing) - 2 Dev environments (pathroute + subdomain routing) Covers: - TLS/cert-manager setup - Helm chart installation complexity - Incremental deployment strategy - SAML IdP (Keycloak) setup - Integration test suite requirements - External DNS routing - Why Replicated isn't the solution for this use case Co-authored-by: openhands <openhands@all-hands.dev>
- Add 'Current Progress' section documenting completed testbed work - Update status from 'Draft' to 'In Progress' - Add Phase 0 (Developer Testbed) to Implementation Plan - Mark completed tasks in all phases - Update Appendix with testbed URLs and infrastructure details - Add links to testbed README and deploy script Co-authored-by: openhands <openhands@all-hands.dev>
This PR now uses the infrastructure created in PR #580 (SV-OHE-staging-Deploy-Infra): - GCP Project: platform-team-sandbox - GKE Cluster: ohe-staging-cluster - Domain: ohe-staging.platform-team.all-hands.dev Changes: - Update workflow to target platform-team-sandbox cluster - Use testenv-charts/helm/environments/staging/base-values.yaml as base config - Copy secrets from all-hands-system namespace (not SOPS-encrypted) - Update environment values to use new domain structure: - pathroute.ohe-staging.platform-team.all-hands.dev - subdomain.ohe-staging.platform-team.all-hands.dev - Remove obsolete envs/common/values.yaml (now using testenv-charts base) - Remove obsolete scripts/testbed/ (superseded by PR #580) - Update documentation to reflect new infrastructure Deployed URLs: - https://pathroute.ohe-staging.platform-team.all-hands.dev (path-based routing) - https://subdomain.ohe-staging.platform-team.all-hands.dev (subdomain routing)
Summary
This PR sets up continuous deployment for two staging environments that use the infrastructure from PR #580 (
SV-OHE-staging-Deploy-Infra).Infrastructure
Uses the Platform Team Sandbox infrastructure (created in PR #580):
platform-team-sandboxohe-staging-clusterus-central1ohe-staging.platform-team.all-hands.devEnvironments
pathroute.ohe-staging.platform-team.all-hands.dev/api/automation,/integration/*)openhands-pathroutesubdomain.ohe-staging.platform-team.all-hands.devopenhands-subdomainChanges
GitHub Actions Workflow (
.github/workflows/deploy-staging.yml)platform-team-sandboxGCP project andohe-staging-clustertestenv-charts/helm/environments/staging/base-values.yamlas base configall-hands-systemnamespace to deployment namespacesEnvironment Configurations (
envs/)staging-pathroute/: Values for path-based routing configurationstaging-subdomain/: Values for subdomain-based routing configurationRemoved (superseded by PR #580)
envs/common/values.yaml- now usingtestenv-charts/helm/environments/staging/base-values.yamlscripts/testbed/- superseded by PR feat: Add staging infrastructure for 2 OHE deployment environments - path and subdomain #580 infrastructureDeployment
both,pathroute, orsubdomainRelated
This PR was updated by an AI agent (OpenHands) on behalf of the user.