Skip to content

Add dual staging environment infrastructure (pathroute & subdomain)#542

Open
saurya wants to merge 6 commits intomainfrom
sv-ohe-staging-infra
Open

Add dual staging environment infrastructure (pathroute & subdomain)#542
saurya wants to merge 6 commits intomainfrom
sv-ohe-staging-infra

Conversation

@saurya
Copy link
Copy Markdown
Contributor

@saurya saurya commented Apr 13, 2026

Summary

This PR sets up continuous deployment for two staging environments that use the infrastructure from PR #580 (SV-OHE-staging-Deploy-Infra).

Infrastructure

Uses the Platform Team Sandbox infrastructure (created in PR #580):

  • GCP Project: platform-team-sandbox
  • GKE Cluster: ohe-staging-cluster
  • Region: us-central1
  • Base Domain: ohe-staging.platform-team.all-hands.dev

Environments

Environment URL Routing Pattern Namespace
pathroute pathroute.ohe-staging.platform-team.all-hands.dev Path-based (/api/automation, /integration/*) openhands-pathroute
subdomain subdomain.ohe-staging.platform-team.all-hands.dev Subdomain-based (production-like) openhands-subdomain

Changes

GitHub Actions Workflow (.github/workflows/deploy-staging.yml)

  • Targets platform-team-sandbox GCP project and ohe-staging-cluster
  • Uses testenv-charts/helm/environments/staging/base-values.yaml as base config
  • Copies secrets from all-hands-system namespace to deployment namespaces
  • Supports deploying to both, pathroute-only, or subdomain-only
  • Dry-run mode for validation before actual deployment

Environment Configurations (envs/)

  • staging-pathroute/: Values for path-based routing configuration
  • staging-subdomain/: Values for subdomain-based routing configuration
  • Each includes README.md with deployment instructions

Removed (superseded by PR #580)

Deployment

  1. Go to ActionsDeploy to Staging
  2. Click Run workflow
  3. Select environment: both, pathroute, or subdomain
  4. Enter the image tag to deploy

Related


This PR was updated by an AI agent (OpenHands) on behalf of the user.

This adds infrastructure for two parallel staging environments that deploy
from the main branch:

- staging-pathroute: Uses path-based routing (e.g., /api/automation)
- staging-subdomain: Uses subdomain-based routing (production pattern)

Changes:
- Add GitHub Actions workflow for dual environment deployment
  - Supports deploying to both, pathroute only, or subdomain only
  - Includes SOPS secret decryption and Helm deployment
  - Dry-run mode for validation
- Add envs/staging-pathroute/ with values.yaml and README
- Add envs/staging-subdomain/ with values.yaml and README
- Add .sops.yaml for GCP KMS encryption of secrets
- Add helper scripts for secret management

Co-authored-by: openhands <openhands@all-hands.dev>
Copy link
Copy Markdown
Contributor

@all-hands-bot all-hands-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 Needs improvement - Fundamental data structure and DRY violations

This infrastructure change solves a real problem (testing dual routing patterns) but introduces massive code duplication that will become a maintenance nightmare. The workflow and configuration files violate basic engineering principles.

Comment thread .github/workflows/deploy-staging.yml Outdated

echo ""
echo "Current pod status:"
kubectl get pods -n ${{ env.NAMESPACE }} -l app.kubernetes.io/instance=${{ env.HELM_RELEASE }}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 Critical - Data Structure: Jobs deploy-pathroute and deploy-subdomain are 99% identical (150+ lines duplicated). This is exactly the wrong data structure.

The right way: Use a matrix strategy:

jobs:
  deploy:
    strategy:
      matrix:
        env:
          - name: pathroute
            namespace: openhands-pathroute
            helm_release: openhands-pathroute
            env_dir: envs/staging-pathroute
          - name: subdomain
            namespace: openhands-subdomain
            helm_release: openhands-subdomain
            env_dir: envs/staging-subdomain
    env:
      NAMESPACE: ${{ matrix.env.namespace }}
      HELM_RELEASE: ${{ matrix.env.helm_release }}
      ENV_DIR: ${{ matrix.env.env_dir }}
    # ... rest of steps (once)

Then add conditional matrix execution:

if: ${{ inputs.environment == 'both' || inputs.environment == matrix.env.name }}

This eliminates 150 lines of duplication and makes adding new environments trivial.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed - Refactored to use matrix strategy. The workflow now uses:

strategy:
  matrix:
    env:
      - name: pathroute
        namespace: openhands-pathroute
        ...
      - name: subdomain
        namespace: openhands-subdomain
        ...
if: ${{ inputs.environment == 'both' || inputs.environment == matrix.env.name }}

This eliminated 150+ lines of duplication. See commit 4dd4352.

This comment was written by an AI assistant (OpenHands).

Comment thread envs/staging-subdomain/values.yaml Outdated
@@ -0,0 +1,322 @@
# Staging (Subdomain-Based Routing) environment values for OpenHands
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 Critical - Code Duplication: This file is 99% identical to envs/staging-pathroute/values.yaml (only ingress.host and a few URLs differ).

Bad: 600+ lines of duplicated configuration that will drift and cause bugs.

Good: Use Helm value inheritance:

# envs/common/values.yaml (shared base)
imagePullSecrets: [...]
databaseMigrations: [...]
# ... all common config

# envs/staging-pathroute/values.yaml
ingress:
  host: staging-pathroute.all-hands.dev
automation:
  openhandsApiUrl: "https://staging-pathroute.all-hands.dev"

# envs/staging-subdomain/values.yaml  
ingress:
  host: staging-subdomain.all-hands.dev
automation:
  openhandsApiUrl: "https://staging-subdomain.all-hands.dev"

Then deploy with:

helm upgrade --install ... \
  -f envs/common/values.yaml \
  -f envs/staging-pathroute/values.yaml

This is not optional. Duplicating 600 lines of config is unacceptable.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed - Created envs/common/values.yaml with 315 lines of shared configuration. Environment-specific files are now only 11 lines each containing just the differences:

  • envs/common/values.yaml - 315 lines (shared base config)
  • envs/staging-pathroute/values.yaml - 11 lines (only ingress.host and URLs)
  • envs/staging-subdomain/values.yaml - 11 lines (only ingress.host and URLs)

The workflow deploys with:

helm upgrade --install ... \
  -f envs/common/values.yaml \
  -f ${{ matrix.env.env_dir }}/values.yaml

See commit 4dd4352.

This comment was written by an AI assistant (OpenHands).

Comment thread envs/staging-subdomain/values.yaml Outdated
integrations:
annotations:
cert-manager.io/cluster-issuer: letsencrypt
mcp:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Suggestion - Clarity: This comment says "disable path-based ingresses" but then immediately configures them with annotations. Either they're disabled (remove them) or they're enabled (fix the comment). Which is it?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed - The confusing comment about "disable path-based ingresses" has been removed. The new envs/staging-subdomain/values.yaml is now clean and only contains the actual differences:

# Staging (Subdomain-Based Routing) environment values
# Uses subdomain-based routing: automation.staging-subdomain.all-hands.dev
# Base config: envs/common/values.yaml

ingress:
  host: staging-subdomain.all-hands.dev

automation:
  openhandsApiUrl: "https://staging-subdomain.all-hands.dev"
  automationBaseUrl: "https://staging-subdomain.all-hands.dev"

See commit 4dd4352.

This comment was written by an AI assistant (OpenHands).

Comment thread scripts/deploy/decrypt.sh Outdated
@@ -0,0 +1,20 @@
#!/bin/bash
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Suggestion - Unnecessary Abstraction: This script is a 5-line wrapper around sops --decrypt. It adds no value - just run sops --decrypt $file directly.

Same for safe-apply-secrets.sh - it's just kubectl apply. Why wrap it?

If you must have scripts, they should solve real problems (error handling, validation, complex logic), not be trivial wrappers.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed - Both scripts/deploy/decrypt.sh and scripts/deploy/safe-apply-secrets.sh have been deleted. The workflow now runs the commands directly:

- name: Decrypt and apply secrets
  run: |
    for file in "$SECRETS_DIR"/*.yaml; do
      sops --decrypt "$file" | kubectl apply -n ${{ env.NAMESPACE }} -f -
    done

See commit 4dd4352.

This comment was written by an AI assistant (OpenHands).

saurya and others added 5 commits April 13, 2026 13:02
…mon values

- Refactor workflow to use matrix strategy (eliminates ~150 lines duplication)
- Extract common staging config to envs/common/values.yaml
- Slim down environment-specific values to only host/URL overrides
- Remove unnecessary wrapper scripts (decrypt.sh, safe-apply-secrets.sh)
- Update READMEs to document new structure

Co-authored-by: openhands <openhands@all-hands.dev>
- deploy.sh: Script to deploy OpenHands Cloud to shared testbed
- setup-shared-cluster.sh: One-time cluster setup script
- README.md: Comprehensive guide for team members

The testbed is intentionally private (uses /etc/hosts for DNS).
Each developer can deploy to their own namespace.

Co-authored-by: openhands <openhands@all-hands.dev>
Document the strategy for 4 staging environments:
- 2 CI environments (pathroute + subdomain routing)
- 2 Dev environments (pathroute + subdomain routing)

Covers:
- TLS/cert-manager setup
- Helm chart installation complexity
- Incremental deployment strategy
- SAML IdP (Keycloak) setup
- Integration test suite requirements
- External DNS routing
- Why Replicated isn't the solution for this use case

Co-authored-by: openhands <openhands@all-hands.dev>
- Add 'Current Progress' section documenting completed testbed work
- Update status from 'Draft' to 'In Progress'
- Add Phase 0 (Developer Testbed) to Implementation Plan
- Mark completed tasks in all phases
- Update Appendix with testbed URLs and infrastructure details
- Add links to testbed README and deploy script

Co-authored-by: openhands <openhands@all-hands.dev>
This PR now uses the infrastructure created in PR #580 (SV-OHE-staging-Deploy-Infra):
- GCP Project: platform-team-sandbox
- GKE Cluster: ohe-staging-cluster
- Domain: ohe-staging.platform-team.all-hands.dev

Changes:
- Update workflow to target platform-team-sandbox cluster
- Use testenv-charts/helm/environments/staging/base-values.yaml as base config
- Copy secrets from all-hands-system namespace (not SOPS-encrypted)
- Update environment values to use new domain structure:
  - pathroute.ohe-staging.platform-team.all-hands.dev
  - subdomain.ohe-staging.platform-team.all-hands.dev
- Remove obsolete envs/common/values.yaml (now using testenv-charts base)
- Remove obsolete scripts/testbed/ (superseded by PR #580)
- Update documentation to reflect new infrastructure

Deployed URLs:
- https://pathroute.ohe-staging.platform-team.all-hands.dev (path-based routing)
- https://subdomain.ohe-staging.platform-team.all-hands.dev (subdomain routing)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants