Skip to content

Commit a9dd0e8

Browse files
committed
Merge branch 'copilot/integrate-k8s-smoke-test' (PR #54): Integrate K8s Smoke Test into CI (M8.3.x) FIXES: 51
2 parents 5763654 + 6b8d090 commit a9dd0e8

2 files changed

Lines changed: 318 additions & 0 deletions

File tree

Lines changed: 150 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,150 @@
1+
# Kubernetes Smoke Test Workflow
2+
# Runs end-to-end smoke tests against a Kind cluster to validate deployments.
3+
#
4+
# Triggers:
5+
# - On push to main when K8s manifests or smoke test script change
6+
# - Nightly at 3:00 AM UTC
7+
# - Manual dispatch via workflow_dispatch
8+
#
9+
# This workflow does NOT run on every PR to avoid heavy K8s usage.
10+
# For manifest validation on PRs, see k8s-validation.yml.
11+
12+
name: K8s Smoke Test
13+
14+
on:
15+
push:
16+
branches:
17+
- main
18+
paths:
19+
- "k8s/**/*.yaml"
20+
- "scripts/k8s_smoke_test.sh"
21+
- "Dockerfile"
22+
- ".github/workflows/k8s-smoke-test.yml"
23+
schedule:
24+
# Run nightly at 3:00 AM UTC (offset from other workflows)
25+
- cron: '0 3 * * *'
26+
workflow_dispatch:
27+
inputs:
28+
run_load_test:
29+
description: 'Run load test (--load flag)'
30+
required: false
31+
default: 'false'
32+
type: boolean
33+
debug_on_failure:
34+
description: 'Capture extra debug logs on failure'
35+
required: false
36+
default: 'true'
37+
type: boolean
38+
39+
permissions:
40+
contents: read
41+
42+
env:
43+
GENGINE_IMAGE_TAG: latest
44+
GENGINE_NAMESPACE: gengine
45+
46+
jobs:
47+
smoke-test:
48+
name: K8s Smoke Test
49+
runs-on: ubuntu-latest
50+
timeout-minutes: 20
51+
52+
steps:
53+
- name: Checkout repository
54+
uses: actions/checkout@v4
55+
56+
- name: Create Kind cluster
57+
uses: helm/kind-action@v1
58+
with:
59+
cluster_name: gengine-smoke-test
60+
wait: 120s
61+
62+
- name: Verify cluster is ready
63+
run: |
64+
kubectl cluster-info
65+
kubectl get nodes
66+
echo "Cluster is ready."
67+
68+
- name: Build Docker image
69+
run: |
70+
echo "Building GEngine Docker image..."
71+
docker build -t "gengine:${GENGINE_IMAGE_TAG}" --target runtime .
72+
echo "Docker image built successfully."
73+
74+
- name: Load image into Kind cluster
75+
run: |
76+
echo "Loading image into Kind cluster..."
77+
kind load docker-image "gengine:${GENGINE_IMAGE_TAG}" --name gengine-smoke-test
78+
echo "Image loaded successfully."
79+
80+
- name: Deploy GEngine to Kind cluster
81+
run: |
82+
echo "Deploying GEngine to Kind cluster..."
83+
kubectl apply -k k8s/overlays/local
84+
echo "Deployment applied."
85+
86+
- name: Wait for deployments to be ready
87+
run: |
88+
echo "Waiting for deployments to be ready..."
89+
kubectl rollout status deployment -n "${GENGINE_NAMESPACE}" --timeout=180s
90+
echo "All deployments are ready."
91+
92+
- name: Verify pods are running
93+
run: |
94+
echo "Verifying pods are running..."
95+
kubectl get pods -n "${GENGINE_NAMESPACE}" -o wide
96+
kubectl get services -n "${GENGINE_NAMESPACE}"
97+
98+
- name: Run smoke test script
99+
id: smoke_test
100+
run: |
101+
echo "Running Kubernetes smoke test..."
102+
# Make the script executable
103+
chmod +x scripts/k8s_smoke_test.sh
104+
105+
# Determine if load test should be run
106+
LOAD_FLAG=""
107+
if [[ "${{ github.event.inputs.run_load_test }}" == "true" ]]; then
108+
LOAD_FLAG="--load"
109+
fi
110+
111+
# Run the smoke test
112+
./scripts/k8s_smoke_test.sh --namespace "${GENGINE_NAMESPACE}" ${LOAD_FLAG}
113+
114+
- name: Capture debug logs on failure
115+
if: failure() && (github.event.inputs.debug_on_failure != 'false')
116+
run: |
117+
echo "=========================================="
118+
echo "SMOKE TEST FAILED - Capturing debug info"
119+
echo "=========================================="
120+
121+
echo ""
122+
echo "=== Pod Status ==="
123+
kubectl get pods -n "${GENGINE_NAMESPACE}" -o wide || true
124+
125+
echo ""
126+
echo "=== Pod Descriptions ==="
127+
kubectl describe pods -n "${GENGINE_NAMESPACE}" || true
128+
129+
echo ""
130+
echo "=== Simulation Logs ==="
131+
kubectl logs -n "${GENGINE_NAMESPACE}" -l app.kubernetes.io/component=simulation --tail=100 || true
132+
133+
echo ""
134+
echo "=== Gateway Logs ==="
135+
kubectl logs -n "${GENGINE_NAMESPACE}" -l app.kubernetes.io/component=gateway --tail=100 || true
136+
137+
echo ""
138+
echo "=== LLM Logs ==="
139+
kubectl logs -n "${GENGINE_NAMESPACE}" -l app.kubernetes.io/component=llm --tail=100 || true
140+
141+
echo ""
142+
echo "=== Events ==="
143+
kubectl get events -n "${GENGINE_NAMESPACE}" --sort-by='.lastTimestamp' || true
144+
145+
- name: Cleanup
146+
if: always()
147+
run: |
148+
echo "Cleaning up..."
149+
kubectl delete -k k8s/overlays/local --ignore-not-found=true || true
150+
echo "Cleanup complete."

docs/gengine/Deploy_GEngine_To_Kubernetes.md

Lines changed: 168 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -885,6 +885,174 @@ kubectl apply --dry-run=server -k k8s/overlays/staging
885885
kind delete cluster --name gengine-validation
886886
```
887887

888+
## CI Smoke Tests
889+
890+
In addition to manifest validation, GEngine includes end-to-end smoke tests
891+
that deploy the services to a Kind cluster and verify they are functioning
892+
correctly. These smoke tests provide deeper validation than dry-run checks
893+
by testing actual service health, metrics endpoints, and Prometheus annotations.
894+
895+
### Smoke Test Workflow
896+
897+
The `.github/workflows/k8s-smoke-test.yml` workflow performs comprehensive
898+
end-to-end testing:
899+
900+
1. **Creates a Kind cluster** – Spins up a temporary Kubernetes cluster
901+
2. **Builds and loads Docker image** – Builds the GEngine image and loads it
902+
into the Kind cluster
903+
3. **Deploys using Kustomize** – Applies the `k8s/overlays/local` overlay
904+
4. **Waits for rollout** – Ensures all deployments are ready
905+
5. **Runs smoke test script** – Executes `scripts/k8s_smoke_test.sh` to verify:
906+
- All pods are running and ready
907+
- Health endpoints respond with HTTP 200
908+
- Metrics endpoints are accessible
909+
- Prometheus annotations are correctly configured
910+
6. **Captures debug logs** – On failure, collects pod logs and events for
911+
troubleshooting
912+
913+
### When CI Smoke Tests Run
914+
915+
The smoke test workflow runs on:
916+
917+
- **Pushes to main** – When any of these paths change:
918+
- `k8s/**/*.yaml` (Kubernetes manifests)
919+
- `scripts/k8s_smoke_test.sh` (smoke test script)
920+
- `Dockerfile` (container image)
921+
- `.github/workflows/k8s-smoke-test.yml` (workflow itself)
922+
- **Nightly schedule** – Runs at 3:00 AM UTC every day
923+
- **Manual trigger** – Can be invoked via GitHub Actions UI
924+
925+
**Note:** Smoke tests do NOT run on every PR to avoid heavy K8s resource
926+
usage. For PR validation, rely on the `k8s-validation.yml` workflow which
927+
performs schema linting and dry-run validation.
928+
929+
### Manually Triggering Smoke Tests
930+
931+
To manually trigger the smoke test workflow:
932+
933+
1. Go to the repository's **Actions** tab on GitHub
934+
2. Select **K8s Smoke Test** from the workflow list
935+
3. Click **Run workflow**
936+
4. Optionally configure:
937+
- **Run load test**: Enable the `--load` flag to run basic load testing
938+
- **Debug on failure**: Capture extra debug logs if the test fails
939+
5. Click **Run workflow** to start
940+
941+
### Running Smoke Tests Locally
942+
943+
You can run the same smoke tests locally to validate changes before pushing.
944+
945+
#### Prerequisites for Local Smoke Tests
946+
947+
Ensure you have:
948+
949+
- **Docker** – For building container images
950+
- **kubectl** – For interacting with Kubernetes
951+
- **Kind** – For creating local Kubernetes clusters
952+
- **curl** – For health check requests
953+
954+
Install Kind if not already installed:
955+
956+
```bash
957+
# Install kind (Linux)
958+
curl -Lo ./kind https://kind.sigs.k8s.io/dl/v0.20.0/kind-linux-amd64
959+
chmod +x ./kind
960+
sudo mv ./kind /usr/local/bin/kind
961+
962+
# Verify installation
963+
kind version
964+
```
965+
966+
#### Local Smoke Test Procedure
967+
968+
Run these commands to replicate the CI smoke test locally:
969+
970+
```bash
971+
# Set environment variables
972+
export GENGINE_IMAGE_TAG="latest"
973+
export GENGINE_NAMESPACE="gengine"
974+
975+
# Create a Kind cluster
976+
kind create cluster --name gengine-smoke-test --wait 120s
977+
978+
# Build the Docker image
979+
docker build -t "gengine:${GENGINE_IMAGE_TAG}" --target runtime .
980+
981+
# Load the image into Kind
982+
kind load docker-image "gengine:${GENGINE_IMAGE_TAG}" --name gengine-smoke-test
983+
984+
# Deploy GEngine (update image tag first if needed)
985+
kubectl apply -k k8s/overlays/local
986+
987+
# Wait for deployments to be ready
988+
kubectl rollout status deployment -n "${GENGINE_NAMESPACE}" --timeout=180s
989+
990+
# Run the smoke test script
991+
./scripts/k8s_smoke_test.sh --namespace "${GENGINE_NAMESPACE}"
992+
993+
# Optional: Run with load test
994+
./scripts/k8s_smoke_test.sh --namespace "${GENGINE_NAMESPACE}" --load
995+
996+
# Cleanup when done
997+
kubectl delete -k k8s/overlays/local
998+
kind delete cluster --name gengine-smoke-test
999+
```
1000+
1001+
#### Quick Local Smoke Test with Minikube
1002+
1003+
If you already have a Minikube cluster running (see
1004+
[Create_Local_Kubernetes_With_Minikube.md](Create_Local_Kubernetes_With_Minikube.md)):
1005+
1006+
```bash
1007+
# Ensure Minikube is running
1008+
minikube status
1009+
1010+
# Build and load the image
1011+
docker build -t "gengine:latest" --target runtime .
1012+
minikube image load "gengine:latest"
1013+
1014+
# Deploy and test
1015+
kubectl apply -k k8s/overlays/local
1016+
kubectl rollout status deployment -n gengine --timeout=120s
1017+
./scripts/k8s_smoke_test.sh
1018+
```
1019+
1020+
### Smoke Test Script Options
1021+
1022+
The `scripts/k8s_smoke_test.sh` script accepts these options:
1023+
1024+
| Option | Description | Default |
1025+
| ------------- | ------------------------------------------------ | --------- |
1026+
| `--namespace` | Kubernetes namespace to test | `gengine` |
1027+
| `--load` | Run basic load test against health/metrics endpoints | disabled |
1028+
1029+
Exit codes:
1030+
1031+
| Code | Meaning |
1032+
| ---- | -------------------------- |
1033+
| 0 | All smoke tests passed |
1034+
| 1 | Prerequisites not met |
1035+
| 2 | Pod health check failed |
1036+
| 3 | Endpoint checks failed |
1037+
1038+
### Troubleshooting Smoke Test Failures
1039+
1040+
If the smoke test fails in CI:
1041+
1042+
1. **Check the workflow logs** – The "Capture debug logs on failure" step
1043+
includes pod descriptions, logs, and cluster events
1044+
1045+
2. **Common issues:**
1046+
- **Pods not starting**: Check for image pull errors or resource constraints
1047+
- **Health checks failing**: Verify services are binding to correct ports
1048+
- **Timeout errors**: Increase rollout timeout or check for slow startup
1049+
1050+
3. **Reproduce locally** – Use the local smoke test procedure above to
1051+
debug the issue in an interactive environment
1052+
1053+
4. **Check recent changes** – Review commits that modified K8s manifests,
1054+
Dockerfile, or service code
1055+
8881056
## Next Steps
8891057

8901058
- [Minikube Setup](Create_Local_Kubernetes_With_Minikube.md) -

0 commit comments

Comments
 (0)