Skip to content

Commit 11f22bb

Browse files
author
Yuriy Bezsonov
committed
docs(infra): update test-optimizations documentation with deploy-only methods and improved workflow
1 parent 6dc790b commit 11f22bb

File tree

2 files changed

+193
-56
lines changed

2 files changed

+193
-56
lines changed

infra/scripts/deploy/test-optimizations.md

Lines changed: 78 additions & 36 deletions
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@ Build and test all Java container optimization methods for the Unicorn Store Spr
44

55
## Prerequisites
66

7-
1. Run `containerize.sh` - creates ECR repo, builds baseline image with `:latest` and `:02-multi-stage` tags
7+
1. Run `containerize.sh` - creates ECR repo, builds baseline image with `:latest` tag
88
2. Run `eks.sh` - deploys to EKS with deployment using `:latest` image
99

1010
## Usage
@@ -14,41 +14,47 @@ Build and test all Java container optimization methods for the Unicorn Store Spr
1414
./test-optimizations.sh --deploy # Build, push to ECR, deploy to EKS, measure startup
1515
./test-optimizations.sh --only cds # Build single method
1616
./test-optimizations.sh --only cds --deploy
17+
./test-optimizations.sh --deploy --revert # Revert to :latest on exit
1718
```
1819

1920
## Options
2021

2122
| Option | Description |
2223
|--------|-------------|
2324
| (none) | Build all images locally, output results to stdout |
24-
| `--deploy` | Push to ECR, deploy to EKS, measure startup/restart times |
25+
| `--deploy` | Push to ECR, deploy to EKS, measure startup times |
2526
| `--only <method>` | Build only specified method (partial match: `cds`, `native`, etc.) |
27+
| `--revert` | Revert deployment to `:latest` on exit (use with `--deploy`) |
2628

2729
## Methods
2830

29-
| Tag | Method | Needs DB | Code Change | Special |
30-
|-----|--------|----------|-------------|---------|
31-
| 02-multi-stage | Optimized Dockerfile | No | No | Baseline |
32-
| 03-jib | Google Jib Maven plugin | No | No | No Dockerfile |
33-
| 04-custom-jre | Custom JRE with jlink | No | No | |
34-
| 05-soci | Seekable OCI (lazy loading) | No | No | SOCI index after push |
35-
| 06-cds | Class Data Sharing | Yes | No | Training run |
36-
| 07-aot | Ahead-of-Time compilation | Yes | No | Training run |
37-
| 08-native | GraalVM Native Image | No | No | Long build time |
38-
| 09-crac | Coordinated Restore at Checkpoint | Yes | Yes | UnicornPublisher swap |
31+
| Tag | Method | Build | Needs DB | Code Change | Notes |
32+
|-----|--------|-------|----------|-------------|-------|
33+
| 01-baseline-1cpu | Baseline with 1 CPU | No | No | No | Deploy-only, ~10-12s |
34+
| 01-baseline-2cpu | Baseline with 2 CPUs | No | No | No | Deploy-only, ~6s |
35+
| 01-pod-resize | In-place pod resize | No | No | No | Deploy-only, CPU boost controller |
36+
| 02-multi-stage | Optimized Dockerfile | Yes | No | No | Baseline build |
37+
| 03-jib | Google Jib Maven plugin | Yes | No | No | No Dockerfile |
38+
| 04-custom-jre | Custom JRE with jlink | Yes | No | No | Smaller image |
39+
| 05-soci | Seekable OCI (lazy loading) | Yes | No | No | SOCI index after push |
40+
| 06-cds | Class Data Sharing | Yes | Yes | No | Paketo Buildpacks |
41+
| 07-aot | Ahead-of-Time compilation | Yes | Yes | No | Java 25+ AOT cache |
42+
| 08-native | GraalVM Native Image | Yes | No | No | Long build time |
43+
| 09-crac | Coordinated Restore at Checkpoint | Yes | Yes | Yes | UnicornPublisher swap |
3944

4045
## Flow
4146

4247
### Build-only mode (default)
4348

4449
```
4550
For each method:
46-
1. Pre-build hooks (CRaC: swap UnicornPublisher.crac)
47-
2. Start PostgreSQL if needed (CDS, AOT, CRaC)
48-
3. Build image (docker build or mvn jib:dockerBuild)
49-
4. Stop PostgreSQL
50-
5. Post-build hooks (CRaC: restore UnicornPublisher.java)
51-
6. Output: Method | ✅/❌ | Size | Time
51+
1. Skip if deploy-only (01-baseline-*, 01-pod-resize)
52+
2. Pre-build hooks (CRaC: swap UnicornPublisher.crac)
53+
3. Start PostgreSQL if needed (CDS, AOT, CRaC) - AWS RDS or local Docker
54+
4. Build image (docker build, mvn jib:dockerBuild, or pack build)
55+
5. Stop PostgreSQL
56+
6. Post-build hooks (CRaC: restore UnicornPublisher.java)
57+
7. Output: Method | ✅/❌ | Size | Time
5258
```
5359

5460
### Deploy mode (`--deploy`)
@@ -60,19 +66,24 @@ For each method:
6066
│ Start watcher ──────────────────► Initialize results file │
6167
│ │
6268
│ For each method: │
63-
│ Build image
64-
│ Push to ECR (:tag)
69+
│ Build image (skip for 01-*)
70+
│ Push to ECR (skip for 01-*)
6571
│ Write to queue ───────────────► Read queue │
66-
│ Continue immediately kubectl set image :tag │
72+
│ Continue immediately │
73+
│ Handle deploy-only methods: │
74+
│ 01-baseline-1cpu: set 1 CPU│
75+
│ 01-baseline-2cpu: set 2 CPU│
76+
│ 01-pod-resize: install │
77+
│ CPU boost controller │
78+
│ Or: kubectl set image :tag │
6779
│ Wait for rollout │
6880
│ Record startup time │
69-
│ kubectl rollout restart │
70-
│ Record restart time │
81+
│ Cleanup (pod-resize) │
7182
│ Write to results file │
7283
│ │
73-
│ Write END marker ───────────────► kubectl set image :latest
74-
│ Wait for watcher (revert to baseline)
75-
│ Print final results Exit
84+
│ Write END marker ───────────────► Exit
85+
│ Wait for watcher
86+
│ Print final results
7687
└─────────────────────────────────────────────────────────────────┘
7788
```
7889

@@ -94,32 +105,63 @@ All output goes to `/tmp/test-optimizations/` (or `${SCRIPT_DIR}/.test-optimizat
94105
```
95106
Method | Build | Size Local | Time
96107
---------------|-------|------------|------
108+
01-baseline-1cpu | ✅ | N/A | 0s
97109
02-multi-stage | ✅ | 598MB | 45s
98-
06-cds | ✅ | 1.34GB | 2m15s
110+
06-cds | ✅ | 1.23GB | 2m23s
99111
08-native | ❌ | N/A | 5m30s
100112
```
101113

102114
### Deploy mode results file
103115
```
104-
Method | Size Local | Size ECR | Build Time | Startup Time
105-
---------------|------------|----------|------------|-------------
106-
02-multi-stage | 598MB | 580MB | 45s | 8.234 seconds
107-
06-cds | 1.34GB | 1.2GB | 2m15s | 2.156 seconds
108-
09-crac | 1.1GB | 1.0GB | 3m20s | 0.087 seconds
116+
Method | Size Local | Size ECR | Build Time | Startup Time
117+
-----------------|------------|----------|------------|-------------
118+
01-baseline-1cpu | N/A | N/A | 0s | 10.234 seconds
119+
01-baseline-2cpu | N/A | N/A | 0s | 6.123 seconds
120+
01-pod-resize | N/A | N/A | 0s | 5.987 seconds
121+
02-multi-stage | 598MB | 580MB | 45s | 10.234 seconds
122+
06-cds | 1.23GB | 578MB | 2m23s | 3.916 seconds
123+
07-aot | 1.34GB | 428MB | 56s | 4.0 seconds
124+
09-crac | 1.1GB | 1.0GB | 3m20s | 0.087 seconds
109125
```
110126

127+
## Database Configuration
128+
129+
The script automatically detects database configuration:
130+
131+
1. **AWS RDS** (preferred): Tries to get credentials from SSM Parameter Store and Secrets Manager
132+
- `workshop-db-connection-string` - JDBC URL
133+
- `workshop-db-secret` - username/password
134+
135+
2. **Local Docker** (fallback): Starts PostgreSQL container if AWS credentials not available
136+
- Uses `host.docker.internal:5432` for Docker build access
137+
138+
## Special Build Methods
139+
140+
### CDS (06-cds)
141+
Uses Paketo Buildpacks instead of Dockerfile:
142+
- Installs `pack` CLI if not available
143+
- `BP_JVM_CDS_ENABLED=true` - creates CDS archive during build
144+
- `BPL_JVM_CDS_ENABLED=true` - uses CDS archive at runtime
145+
146+
### Pod Resize (01-pod-resize)
147+
Installs Kube Startup CPU Boost controller:
148+
- Creates `StartupCPUBoost` resource with 100% CPU increase
149+
- Automatically removes boost when pod becomes ready
150+
- Cleans up controller after test
151+
111152
## Environment
112153

113154
Requires workshop environment (`/etc/profile.d/workshop.sh`) with:
114155
- `ACCOUNT_ID` - AWS account ID
115156
- `AWS_REGION` - AWS region
116-
- `SPRING_DATASOURCE_URL` - Database connection string (for CDS/AOT/CRaC builds)
117-
- `SPRING_DATASOURCE_USERNAME` / `SPRING_DATASOURCE_PASSWORD`
157+
158+
Database credentials are fetched automatically from AWS or use local Docker fallback.
118159

119160
## Notes
120161

121-
- Build uses local PostgreSQL container for training (CDS, AOT, CRaC)
122-
- Deploy uses real RDS database via Secrets Manager
162+
- Deploy-only methods (01-*) don't build images, they modify deployment config
163+
- CDS uses Paketo Buildpacks (not Dockerfile) for proper CDS archive creation
123164
- Native image may fail on ARM (macOS) - works on x86-64 Linux
124165
- CRaC requires x86-64 for `-XX:CPUFeatures=generic`
125166
- SOCI requires `soci` CLI tool installed
167+
- Pod resize requires EKS 1.27+ with in-place pod resize support

infra/scripts/deploy/test-optimizations.sh

Lines changed: 115 additions & 20 deletions
Original file line numberDiff line numberDiff line change
@@ -62,6 +62,9 @@ WATCHER_PID_FILE="${OUTPUT_DIR}/watcher.pid"
6262

6363
# Methods in order (tag names)
6464
METHODS=(
65+
"01-baseline-1cpu"
66+
"01-baseline-2cpu"
67+
"01-pod-resize"
6568
"02-multi-stage"
6669
"03-jib"
6770
"04-custom-jre"
@@ -115,6 +118,14 @@ needs_code_change() {
115118
[[ "$1" == "09-crac" ]]
116119
}
117120

121+
# Check if method is a special deploy-only method (no build)
122+
is_deploy_only() {
123+
case "$1" in
124+
01-baseline-1cpu|01-baseline-2cpu|01-pod-resize) return 0 ;;
125+
*) return 1 ;;
126+
esac
127+
}
128+
118129
# Database configuration - try AWS first, fallback to local Docker
119130
USE_AWS_DB=false
120131
SPRING_DATASOURCE_URL=""
@@ -210,11 +221,16 @@ format_time() {
210221
build_image() {
211222
local tag="$1"
212223
local log_file="$2"
213-
local dockerfile="${DOCKERFILES_DIR}/Dockerfile.${tag}"
214224
local build_args=""
215225

216226
log_info "Building ${tag}..."
217227

228+
# Deploy-only methods (baseline, pod-resize) - no build needed
229+
if is_deploy_only "$tag"; then
230+
log_info "Deploy-only method, skipping build..."
231+
return 0
232+
fi
233+
218234
# Special case: jib uses maven
219235
if [[ "$tag" == "03-jib" ]]; then
220236
log_info "Using Maven Jib plugin..."
@@ -249,6 +265,9 @@ build_image() {
249265
return $result
250266
fi
251267

268+
# Dockerfile name matches tag (e.g., 02-multi-stage -> Dockerfile.02-multi-stage)
269+
local dockerfile="${DOCKERFILES_DIR}/Dockerfile.${tag}"
270+
252271
# Check Dockerfile exists
253272
if [[ ! -f "${dockerfile}" ]]; then
254273
log_error "Dockerfile not found: ${dockerfile}"
@@ -270,7 +289,7 @@ build_image() {
270289
fi
271290

272291
# Build with --progress=plain for cleaner logs
273-
# Use --no-cache for methods that need fresh training (CDS, AOT, CRaC)
292+
# Use --no-cache for methods that need fresh training (AOT, CRaC)
274293
local no_cache=""
275294
if needs_db "$tag"; then
276295
no_cache="--no-cache"
@@ -359,11 +378,6 @@ deploy_watcher() {
359378

360379
# Check for END marker
361380
if [[ "$status" == "END" ]]; then
362-
# Revert to baseline (disabled for testing)
363-
# log_info "Reverting to baseline (:latest)..."
364-
# kubectl set image deployment/unicorn-store-spring \
365-
# unicorn-store-spring="${ecr_uri}:latest" -n unicorn-store-spring 2>/dev/null
366-
# kubectl rollout status deployment unicorn-store-spring -n unicorn-store-spring --timeout=180s 2>/dev/null
367381
return 0
368382
fi
369383

@@ -379,11 +393,79 @@ deploy_watcher() {
379393
continue
380394
fi
381395

382-
# Deploy with new image
383-
log_info "Deploying ${tag}..."
384-
echo "--- kubectl set image ---" >> "${deploy_log}"
385-
kubectl set image deployment/unicorn-store-spring \
386-
unicorn-store-spring="${ecr_uri}:${tag}" -n unicorn-store-spring >> "${deploy_log}" 2>&1
396+
# Handle special deploy-only methods
397+
case "$tag" in
398+
01-baseline-1cpu)
399+
log_info "Deploying baseline with 1 CPU..."
400+
# Ensure 1 CPU and restart
401+
kubectl patch deployment unicorn-store-spring -n unicorn-store-spring \
402+
--type='json' -p='[
403+
{"op": "replace", "path": "/spec/template/spec/containers/0/resources/requests/cpu", "value": "1"},
404+
{"op": "replace", "path": "/spec/template/spec/containers/0/resources/limits/cpu", "value": "1"}
405+
]' >> "${deploy_log}" 2>&1
406+
kubectl rollout restart deployment unicorn-store-spring -n unicorn-store-spring >> "${deploy_log}" 2>&1
407+
;;
408+
01-baseline-2cpu)
409+
log_info "Deploying baseline with 2 CPUs..."
410+
# Increase to 2 CPU and restart
411+
kubectl patch deployment unicorn-store-spring -n unicorn-store-spring \
412+
--type='json' -p='[
413+
{"op": "replace", "path": "/spec/template/spec/containers/0/resources/requests/cpu", "value": "2"},
414+
{"op": "replace", "path": "/spec/template/spec/containers/0/resources/limits/cpu", "value": "2"}
415+
]' >> "${deploy_log}" 2>&1
416+
kubectl rollout restart deployment unicorn-store-spring -n unicorn-store-spring >> "${deploy_log}" 2>&1
417+
;;
418+
01-pod-resize)
419+
log_info "Deploying with in-place pod resize (CPU boost)..."
420+
# Revert to 1 CPU first
421+
kubectl patch deployment unicorn-store-spring -n unicorn-store-spring \
422+
--type='json' -p='[
423+
{"op": "replace", "path": "/spec/template/spec/containers/0/resources/requests/cpu", "value": "1"},
424+
{"op": "replace", "path": "/spec/template/spec/containers/0/resources/limits/cpu", "value": "1"}
425+
]' >> "${deploy_log}" 2>&1
426+
427+
# Install Kube Startup CPU Boost if not present
428+
if ! kubectl get crd startupcpuboosts.autoscaling.x-k8s.io &>/dev/null; then
429+
log_info "Installing Kube Startup CPU Boost..."
430+
kubectl apply -f https://github.com/google/kube-startup-cpu-boost/releases/download/v0.17.1/manifests.yaml >> "${deploy_log}" 2>&1
431+
kubectl wait --for=condition=ready pod -l control-plane=controller-manager \
432+
-n kube-startup-cpu-boost-system --timeout=120s >> "${deploy_log}" 2>&1
433+
fi
434+
435+
# Create StartupCPUBoost resource
436+
cat <<BOOST_EOF | kubectl apply -f - >> "${deploy_log}" 2>&1
437+
apiVersion: autoscaling.x-k8s.io/v1alpha1
438+
kind: StartupCPUBoost
439+
metadata:
440+
name: unicorn-store-spring
441+
namespace: unicorn-store-spring
442+
selector:
443+
matchExpressions:
444+
- key: app
445+
operator: In
446+
values: ["unicorn-store-spring"]
447+
spec:
448+
resourcePolicy:
449+
containerPolicies:
450+
- containerName: unicorn-store-spring
451+
percentageIncrease:
452+
value: 100
453+
durationPolicy:
454+
podCondition:
455+
type: Ready
456+
status: "True"
457+
BOOST_EOF
458+
kubectl rollout restart deployment unicorn-store-spring -n unicorn-store-spring >> "${deploy_log}" 2>&1
459+
;;
460+
*)
461+
# Normal image deployment
462+
log_info "Deploying ${tag}..."
463+
echo "--- kubectl set image ---" >> "${deploy_log}"
464+
kubectl set image deployment/unicorn-store-spring \
465+
unicorn-store-spring="${ecr_uri}:${tag}" -n unicorn-store-spring >> "${deploy_log}" 2>&1
466+
;;
467+
esac
468+
387469
echo "--- kubectl rollout status ---" >> "${deploy_log}"
388470
if ! kubectl rollout status deployment unicorn-store-spring -n unicorn-store-spring --timeout=180s >> "${deploy_log}" 2>&1; then
389471
echo "--- kubectl describe deployment ---" >> "${deploy_log}"
@@ -397,6 +479,13 @@ deploy_watcher() {
397479
local startup_time=$(get_startup_time "$tag")
398480
echo "Startup time: ${startup_time}" >> "${deploy_log}"
399481

482+
# Cleanup after pod-resize test
483+
if [[ "$tag" == "01-pod-resize" ]]; then
484+
log_info "Cleaning up CPU boost..."
485+
kubectl delete startupcpuboost unicorn-store-spring -n unicorn-store-spring >> "${deploy_log}" 2>&1 || true
486+
kubectl delete -f https://github.com/google/kube-startup-cpu-boost/releases/download/v0.17.1/manifests.yaml >> "${deploy_log}" 2>&1 || true
487+
fi
488+
400489
echo "${tag} | ${size_local} | ${size_ecr} | ${build_time} | ${startup_time}" >> "${RESULTS_FILE}"
401490
log_info "${tag}: startup=${startup_time}"
402491
done
@@ -494,16 +583,22 @@ for tag in "${METHODS[@]}"; do
494583

495584
if build_image "$tag" "$build_log"; then
496585
build_status=""
497-
size_local=$(docker images "${IMAGE_NAME}:${tag}" --format "{{.Size}}" 2>/dev/null || echo "N/A")
498586

499-
# Deploy mode: push and queue for deployment
587+
# Deploy-only methods don't have local images
588+
if ! is_deploy_only "$tag"; then
589+
size_local=$(docker images "${IMAGE_NAME}:${tag}" --format "{{.Size}}" 2>/dev/null || echo "N/A")
590+
fi
591+
592+
# Deploy mode: push and queue for deployment (skip push for deploy-only methods)
500593
if [[ "$DEPLOY_MODE" == true ]]; then
501-
size_ecr=$(push_image "$tag" "$build_log")
502-
# Check if push failed
503-
if [[ "$size_ecr" == PUSH_FAILED:* ]]; then
504-
error_msg="${size_ecr#PUSH_FAILED:}"
505-
size_ecr="N/A"
506-
push_status="FAILED"
594+
if ! is_deploy_only "$tag"; then
595+
size_ecr=$(push_image "$tag" "$build_log")
596+
# Check if push failed
597+
if [[ "$size_ecr" == PUSH_FAILED:* ]]; then
598+
error_msg="${size_ecr#PUSH_FAILED:}"
599+
size_ecr="N/A"
600+
push_status="FAILED"
601+
fi
507602
fi
508603
fi
509604
else

0 commit comments

Comments
 (0)