Percona's CI/CD platform: a GitOps-managed EKS cluster in us-east-1 hosting
the Jenkins masters and the platform services around them (LGTM observability,
Authentik SSO, ingress with TLS, autoscaling). Everything is defined as code
and reconciled from this repo. There are no manual cluster changes.
- OpenTofu owns AWS up to "ArgoCD healthy": VPC, EKS, node groups, Pod Identity, the EC2 Jenkins masters, ARM spot fleets, S3, cleanup reapers. TF outputs reach ArgoCD as cluster-Secret annotations.
- From there ArgoCD owns everything in-cluster: a root App-of-Apps fans out
ApplicationSets that reconcile one Application per
resources/addons/*dir and one perresources/jenkins/master/instances/*dir. No manualkubectl. - Jenkins masters serve on
*.cd.percona.comin two modes: EKS-fronted EC2 (ALB, in-cluster NGINX, cross-region VPC peering, an EndpointSlice reconciler) or in-cluster StatefulSet. Hostnames resolve to the ALB (HTTPS only). A shell goes through SSM (runbook). - Repo CI is lint + validate only.
ci-gateis the single required check andjust cimirrors it locally. - The repo is public: no account IDs, ARNs, or secrets in committed files.
| Path | What lives there |
|---|---|
terraform/ |
AWS substrate. Conventions in terraform/CLAUDE.md (file-naming grammar, per-team # Owner: banners, tags). Reusable modules carry their own READMEs (jenkins-arm-fleet, jenkins-arm-standalone, scheduled-lambda). Pins in versions.tf |
argocd-bootstrap/ |
Root Application, ApplicationSets, AppProject |
resources/addons/ |
One dir = one ArgoCD Application (observability, ingress, SSO, ...) |
resources/jenkins/ |
In-cluster master chart, per-instance values, clouds catalog (rendered by scripts/render-clouds.py, drift-gated in CI) |
images/ |
Container images (controller bundle and friends), built by GitHub Actions |
scripts/ |
Verification and render tooling. Catalog in scripts/README.md |
docs/ |
Architecture, ADRs, runbooks. Everything is indexed in docs/README.md |
justfile |
The single entrypoint for CI and every tofu operation |
just ci # local lint + validate (mirrors the PR gate)
just tf-plan # TF plan (writes tfplan)
just tf-apply # apply the saved tfplan, never auto-approve
just ssh # list the running Jenkins masters (just ssh <inst> opens a shell)AWS_PROFILE must be exported in your shell. AWS-touching recipes fail loudly
without it. Back up state before risky applies (just tf-state-backup). State
bucket bootstrap: runbook.
| Tool | Used for |
|---|---|
just |
The single entrypoint. Every workflow below is a recipe |
OpenTofu (tofu) |
All terraform operations (version pin at the top of the justfile) |
| AWS CLI v2 | Every AWS-touching recipe. SSO login via aws sso login |
| session-manager-plugin | Interactive just ssh <inst> sessions (one-shot just ssm-run works without it) |
kubectl |
Cluster access (just kubeconfig), ps3 shell |
uv |
Python script gates and lambda tests inside just ci |
| Docker (buildx) | just build-image only |
| trivy, yamllint, actionlint, zizmor, kubeconform | The just ci lint set. Version pins sit at the top of the justfile (helm is fetched and sha-verified automatically) |
| Topic | Doc |
|---|---|
| System architecture and components | docs/architecture.md |
| Compute tiers, MNG vs Karpenter reasoning | ADR 0017 |
| Observability push pipeline | docs/observability.md |
| EC2 master connectivity and resilience | docs/connectivity.md, docs/ec2-master-resilience.md |
Shell access to the masters (just ssh, SSM) |
docs/runbooks/master-shell-access.md |
| Account cleanup reapers | docs/runbooks/cleanup-reapers.md |
| Bootstrap, recovery, upgrades | docs/runbooks/ |
| Every past design decision | docs/adr/ |
just cimust pass before PR. Pre-commit hooks approximate it (.pre-commit-config.yaml: itsterraform_validatehook shells theterraformbinary, not tofu, andjust tf-validateis the real gate).- Terraform changes follow
terraform/CLAUDE.md, gated fail-closed byscripts/check_conventions.py(part ofjust ci). - Propose architecture changes in
docs/adr/first. - Version pins live in
terraform/versions.tf. Runscripts/check_versions.pybefore bumping. - Commit format:
type(scope): subject. No AI footers.
GNU Affero General Public License v3.0, see LICENSE.