@@ -885,6 +885,174 @@ kubectl apply --dry-run=server -k k8s/overlays/staging
885885kind delete cluster --name gengine-validation
886886` ` `
887887
888+ # # CI Smoke Tests
889+
890+ In addition to manifest validation, GEngine includes end-to-end smoke tests
891+ that deploy the services to a Kind cluster and verify they are functioning
892+ correctly. These smoke tests provide deeper validation than dry-run checks
893+ by testing actual service health, metrics endpoints, and Prometheus annotations.
894+
895+ # ## Smoke Test Workflow
896+
897+ The `.github/workflows/k8s-smoke-test.yml` workflow performs comprehensive
898+ end-to-end testing :
899+
900+ 1. **Creates a Kind cluster** – Spins up a temporary Kubernetes cluster
901+ 2. **Builds and loads Docker image** – Builds the GEngine image and loads it
902+ into the Kind cluster
903+ 3. **Deploys using Kustomize** – Applies the `k8s/overlays/local` overlay
904+ 4. **Waits for rollout** – Ensures all deployments are ready
905+ 5. **Runs smoke test script** – Executes `scripts/k8s_smoke_test.sh` to verify :
906+ - All pods are running and ready
907+ - Health endpoints respond with HTTP 200
908+ - Metrics endpoints are accessible
909+ - Prometheus annotations are correctly configured
910+ 6. **Captures debug logs** – On failure, collects pod logs and events for
911+ troubleshooting
912+
913+ # ## When CI Smoke Tests Run
914+
915+ The smoke test workflow runs on :
916+
917+ - **Pushes to main** – When any of these paths change:
918+ - ` k8s/**/*.yaml` (Kubernetes manifests)
919+ - ` scripts/k8s_smoke_test.sh` (smoke test script)
920+ - ` Dockerfile` (container image)
921+ - ` .github/workflows/k8s-smoke-test.yml` (workflow itself)
922+ - **Nightly schedule** – Runs at 3:00 AM UTC every day
923+ - **Manual trigger** – Can be invoked via GitHub Actions UI
924+
925+ **Note:** Smoke tests do NOT run on every PR to avoid heavy K8s resource
926+ usage. For PR validation, rely on the `k8s-validation.yml` workflow which
927+ performs schema linting and dry-run validation.
928+
929+ # ## Manually Triggering Smoke Tests
930+
931+ To manually trigger the smoke test workflow :
932+
933+ 1. Go to the repository's **Actions** tab on GitHub
934+ 2. Select **K8s Smoke Test** from the workflow list
935+ 3. Click **Run workflow**
936+ 4. Optionally configure :
937+ - **Run load test**: Enable the `--load` flag to run basic load testing
938+ - **Debug on failure**: Capture extra debug logs if the test fails
939+ 5. Click **Run workflow** to start
940+
941+ # ## Running Smoke Tests Locally
942+
943+ You can run the same smoke tests locally to validate changes before pushing.
944+
945+ # ### Prerequisites for Local Smoke Tests
946+
947+ Ensure you have :
948+
949+ - **Docker** – For building container images
950+ - **kubectl** – For interacting with Kubernetes
951+ - **Kind** – For creating local Kubernetes clusters
952+ - **curl** – For health check requests
953+
954+ Install Kind if not already installed :
955+
956+ ` ` ` bash
957+ # Install kind (Linux)
958+ curl -Lo ./kind https://kind.sigs.k8s.io/dl/v0.20.0/kind-linux-amd64
959+ chmod +x ./kind
960+ sudo mv ./kind /usr/local/bin/kind
961+
962+ # Verify installation
963+ kind version
964+ ` ` `
965+
966+ # ### Local Smoke Test Procedure
967+
968+ Run these commands to replicate the CI smoke test locally :
969+
970+ ` ` ` bash
971+ # Set environment variables
972+ export GENGINE_IMAGE_TAG="latest"
973+ export GENGINE_NAMESPACE="gengine"
974+
975+ # Create a Kind cluster
976+ kind create cluster --name gengine-smoke-test --wait 120s
977+
978+ # Build the Docker image
979+ docker build -t "gengine:${GENGINE_IMAGE_TAG}" --target runtime .
980+
981+ # Load the image into Kind
982+ kind load docker-image "gengine:${GENGINE_IMAGE_TAG}" --name gengine-smoke-test
983+
984+ # Deploy GEngine (update image tag first if needed)
985+ kubectl apply -k k8s/overlays/local
986+
987+ # Wait for deployments to be ready
988+ kubectl rollout status deployment -n "${GENGINE_NAMESPACE}" --timeout=180s
989+
990+ # Run the smoke test script
991+ ./scripts/k8s_smoke_test.sh --namespace "${GENGINE_NAMESPACE}"
992+
993+ # Optional: Run with load test
994+ ./scripts/k8s_smoke_test.sh --namespace "${GENGINE_NAMESPACE}" --load
995+
996+ # Cleanup when done
997+ kubectl delete -k k8s/overlays/local
998+ kind delete cluster --name gengine-smoke-test
999+ ` ` `
1000+
1001+ # ### Quick Local Smoke Test with Minikube
1002+
1003+ If you already have a Minikube cluster running (see
1004+ [Create_Local_Kubernetes_With_Minikube.md](Create_Local_Kubernetes_With_Minikube.md)) :
1005+
1006+ ` ` ` bash
1007+ # Ensure Minikube is running
1008+ minikube status
1009+
1010+ # Build and load the image
1011+ docker build -t "gengine:latest" --target runtime .
1012+ minikube image load "gengine:latest"
1013+
1014+ # Deploy and test
1015+ kubectl apply -k k8s/overlays/local
1016+ kubectl rollout status deployment -n gengine --timeout=120s
1017+ ./scripts/k8s_smoke_test.sh
1018+ ` ` `
1019+
1020+ # ## Smoke Test Script Options
1021+
1022+ The `scripts/k8s_smoke_test.sh` script accepts these options :
1023+
1024+ | Option | Description | Default |
1025+ | ------------- | ------------------------------------------------ | --------- |
1026+ | `--namespace` | Kubernetes namespace to test | `gengine` |
1027+ | `--load` | Run basic load test against health/metrics endpoints | disabled |
1028+
1029+ Exit codes :
1030+
1031+ | Code | Meaning |
1032+ | ---- | -------------------------- |
1033+ | 0 | All smoke tests passed |
1034+ | 1 | Prerequisites not met |
1035+ | 2 | Pod health check failed |
1036+ | 3 | Endpoint checks failed |
1037+
1038+ # ## Troubleshooting Smoke Test Failures
1039+
1040+ If the smoke test fails in CI :
1041+
1042+ 1. **Check the workflow logs** – The "Capture debug logs on failure" step
1043+ includes pod descriptions, logs, and cluster events
1044+
1045+ 2. **Common issues:**
1046+ - **Pods not starting**: Check for image pull errors or resource constraints
1047+ - **Health checks failing**: Verify services are binding to correct ports
1048+ - **Timeout errors**: Increase rollout timeout or check for slow startup
1049+
1050+ 3. **Reproduce locally** – Use the local smoke test procedure above to
1051+ debug the issue in an interactive environment
1052+
1053+ 4. **Check recent changes** – Review commits that modified K8s manifests,
1054+ Dockerfile, or service code
1055+
8881056# # Next Steps
8891057
8901058- [Minikube Setup](Create_Local_Kubernetes_With_Minikube.md) -
0 commit comments