Skip to content

Refactor stage Terraform: single fullstack container -> separate frontend + backend + Postgres #749

Description

@Nickatak

Context

The CTJ rewrite's target topology is three containers in a shared ECS task:

  • Next.js - serves the frontend (App Router pages, server components, server actions)
  • Django - serves the CTJ API (/api/*) and Django admin (/admin/*)
  • Postgres - the only database in scope for the rewrite (no managed RDS)

The ALB does path-based routing: /api/* and /admin/* go to a Django target group, everything else goes to a Next.js target group. Both target groups are backed by the same ECS task. Postgres has no public ingress; it's reachable only from the Django container via localhost:5432. The browser sees a single origin (stage.civictechjobs.org), so CORS isn't a concern.

Stage today still runs as a single fullstack container (application_type = "fs", container_port = 8000, single :stage image of the combined Django + WhiteNoise + SPA-catchall shape, separate RDS instance). The deployed image and Terraform are both pre-rewrite.

This is the critical-path Terraform change. Until it lands, the rewrite can't deploy; the application-side cutover (auth rebuild, matching engine work, qualifier integration) has nowhere to run.

Scope

In the incubator repo at terraform/projects/civic-tech-jobs/. Four distinct pieces of work:

1. ECR repositories

In civic-tech-jobs.tf, split the existing civic_tech_jobs_ecr_fullstack module call into two:

  • ECR repo civic-tech-jobs-frontend
  • ECR repo civic-tech-jobs-backend

Postgres uses the upstream postgres:18 image and doesn't need an ECR repo.

2. ECS task definition

In environment-stage.tf, replace the single civic_tech_jobs_fullstack_stage_service module call with a multi-container task spec:

  • One ECS task with three container definitions: Next.js, Django, Postgres
  • ALB with two target groups (Django, Next.js); listener rules: /api/* and /admin/* -> Django target group, default -> Next.js target group
  • Both target groups attach to the same task; Postgres is unattached (no public ingress)
  • Cross-container traffic stays on localhost (shared task network namespace)

The existing container module in incubator/terraform/modules/container was written for single-container ECS services. The application-side requirement is that the three containers share a task; how the module shape gets there is your call.

3. Frontend build-time vars

NEXT_PUBLIC_API_URL must be baked into the frontend image at build time (Next.js inlines NEXT_PUBLIC_* values into the JS bundle during npm run build; setting them at container runtime has no effect because the bundle is already frozen). This means:

No Terraform-side change for NEXT_PUBLIC_API_URL itself; just flagging the coordination point so the deploy workflow can match.

4. Postgres: drop RDS, in-task container

The existing civic_tech_jobs_stage_database module call provisions an RDS instance and exports database, host, port, owner_username, owner_password_arn. Those exports are wired into the fullstack container's env list today.

The new shape, per docs/developer/deployment-infra.md:

  • The civic_tech_jobs_stage_database module call goes away (no more RDS provisioning).
  • A Postgres container (image: postgres:18) is added to the ECS task. Its init env vars are POSTGRES_USER / POSTGRES_PASSWORD / POSTGRES_DB. Its data dir at /var/lib/postgresql/data needs persistent storage; the deployment-infra spec leaves persistence configuration explicitly to DevOps.
  • The Django container's connection env (SQL_HOST / SQL_PORT / SQL_USER / SQL_PASSWORD / SQL_DATABASE) connects to localhost:5432 with the same credentials as the Postgres init env. How those credentials are sourced (the existing random_password pattern, Secrets Manager, etc.) is a DevOps call.

References

Coordination

  • Substantive cross-repo change, not a drive-by PR. Worth a synchronous handoff conversation rather than starting blind.
  • Blocks the deploy-stage.yml rewrite on the CTJ side (Rewrite deploy-stage.yml for two-image build (BLOCKED) #750). The workflow can't be rewritten to push two images until the ECR repos and ECS task spec are in place; landing the workflow rewrite first would push to nonexistent ECR repos. Lockstep landing is fine; isolated CTJ-first is not.
  • The existing :stage deploy is currently broken at the build step on push to main (CTJ's deploy-stage.yml references a Dockerfile that no longer exists). That's a deliberate escape hatch; see Rewrite deploy-stage.yml for two-image build (BLOCKED) #750.

Owner

TBD - pending DevOps CoP discussion.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    Status
    🆕 New Issue Approval

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions