-
Notifications
You must be signed in to change notification settings - Fork 113
add liveness, readiness, and startup probes to noobaa-operator #1684
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -42,6 +42,9 @@ spec: | |
| - name: noobaa-operator | ||
| image: NOOBAA_OPERATOR_IMAGE | ||
| terminationMessagePolicy: FallbackToLogsOnError | ||
| ports: | ||
| - name: healthz | ||
| containerPort: 8081 | ||
| volumeMounts: | ||
| - name: bound-sa-token | ||
| mountPath: /var/run/secrets/openshift/serviceaccount | ||
|
|
@@ -51,6 +54,29 @@ spec: | |
| # SHOULD BE RETURNED ONCE COSI IS BACK | ||
| # - name: socket | ||
| # mountPath: /var/lib/cosi | ||
| readinessProbe: | ||
| httpGet: | ||
| path: /readyz | ||
| port: healthz | ||
| initialDelaySeconds: 5 | ||
| periodSeconds: 10 | ||
| timeoutSeconds: 5 | ||
| failureThreshold: 3 | ||
| livenessProbe: | ||
| httpGet: | ||
| path: /readyz | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Should all probes be the same?
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Since we only perform ping, I suppose it's okay for them to be set to the same path. In the future, we might want to use more advanced probes, but for now, it's enough. |
||
| port: healthz | ||
| initialDelaySeconds: 15 | ||
| periodSeconds: 10 | ||
| timeoutSeconds: 5 | ||
| failureThreshold: 3 | ||
| startupProbe: | ||
|
Comment on lines
+57
to
+73
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 💡 Verification agent 🧩 Analysis chainUse /healthz for liveness; /readyz for readiness (current YAML uses /readyz for liveness). readinessProbe:
httpGet:
path: /readyz
port: healthz
initialDelaySeconds: 5
periodSeconds: 10
timeoutSeconds: 5
failureThreshold: 3
livenessProbe:
httpGet:
- path: /readyz
+ path: /healthz
port: healthz
initialDelaySeconds: 15
periodSeconds: 10
timeoutSeconds: 5
failureThreshold: 3🏁 Script executed: #!/usr/bin/env bash
set -e
# Locate the operator.yaml file
OP_FILE=$(fd --type f operator.yaml || true)
echo "Found file: $OP_FILE"
[ -n "$OP_FILE" ] || { echo "operator.yaml not found"; exit 1; }
# Print lines 60–76 for verification
sed -n '60,76p' "$OP_FILE"Length of output: 698 Use /healthz for livenessProbe 🤖 Prompt for AI Agents |
||
| httpGet: | ||
| path: /readyz | ||
| port: healthz | ||
| periodSeconds: 10 | ||
| timeoutSeconds: 5 | ||
| failureThreshold: 30 | ||
| resources: | ||
| limits: | ||
| cpu: "250m" | ||
|
|
||
| Original file line number | Diff line number | Diff line change | ||||||||||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
|
@@ -6231,6 +6231,9 @@ spec: | |||||||||||||||||||||||||||||||||
| - name: noobaa-operator | ||||||||||||||||||||||||||||||||||
| image: NOOBAA_OPERATOR_IMAGE | ||||||||||||||||||||||||||||||||||
| terminationMessagePolicy: FallbackToLogsOnError | ||||||||||||||||||||||||||||||||||
| ports: | ||||||||||||||||||||||||||||||||||
| - name: healthz | ||||||||||||||||||||||||||||||||||
| containerPort: 8081 | ||||||||||||||||||||||||||||||||||
| volumeMounts: | ||||||||||||||||||||||||||||||||||
| - name: bound-sa-token | ||||||||||||||||||||||||||||||||||
| mountPath: /var/run/secrets/openshift/serviceaccount | ||||||||||||||||||||||||||||||||||
|
|
@@ -6240,6 +6243,29 @@ spec: | |||||||||||||||||||||||||||||||||
| # SHOULD BE RETURNED ONCE COSI IS BACK | ||||||||||||||||||||||||||||||||||
| # - name: socket | ||||||||||||||||||||||||||||||||||
| # mountPath: /var/lib/cosi | ||||||||||||||||||||||||||||||||||
| readinessProbe: | ||||||||||||||||||||||||||||||||||
| httpGet: | ||||||||||||||||||||||||||||||||||
| path: /readyz | ||||||||||||||||||||||||||||||||||
| port: healthz | ||||||||||||||||||||||||||||||||||
| initialDelaySeconds: 5 | ||||||||||||||||||||||||||||||||||
| periodSeconds: 10 | ||||||||||||||||||||||||||||||||||
| timeoutSeconds: 5 | ||||||||||||||||||||||||||||||||||
| failureThreshold: 3 | ||||||||||||||||||||||||||||||||||
| livenessProbe: | ||||||||||||||||||||||||||||||||||
| httpGet: | ||||||||||||||||||||||||||||||||||
| path: /readyz | ||||||||||||||||||||||||||||||||||
| port: healthz | ||||||||||||||||||||||||||||||||||
| initialDelaySeconds: 15 | ||||||||||||||||||||||||||||||||||
| periodSeconds: 10 | ||||||||||||||||||||||||||||||||||
| timeoutSeconds: 5 | ||||||||||||||||||||||||||||||||||
| failureThreshold: 3 | ||||||||||||||||||||||||||||||||||
| startupProbe: | ||||||||||||||||||||||||||||||||||
|
Comment on lines
+6255
to
+6262
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Liveness probe should hit /healthz, not /readyz. controller-runtime exposes both /healthz and /readyz; liveness is expected to target /healthz. Pointing liveness to /readyz can mask unhealthy-but-ready states. Switch the path to /healthz. livenessProbe:
httpGet:
- path: /readyz
+ path: /healthz
port: healthz
initialDelaySeconds: 15
periodSeconds: 10
timeoutSeconds: 5
failureThreshold: 3📝 Committable suggestion
Suggested change
🤖 Prompt for AI Agents
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. done There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @OdedViner Thanks! Confirming the liveness probe now points to /healthz. As a quick sanity check, you can verify the live settings on a running pod with: If there are multiple definitions of the operator Deployment (e.g., embedded YAML and any generated manifests), make sure they’re all aligned: liveness=/healthz and readiness/startup=/readyz. Nice work. ✏️ Learnings added
If you found this review helpful, would you consider giving us a shout-out on X? Thank you for using CodeRabbit! |
||||||||||||||||||||||||||||||||||
| httpGet: | ||||||||||||||||||||||||||||||||||
| path: /readyz | ||||||||||||||||||||||||||||||||||
| port: healthz | ||||||||||||||||||||||||||||||||||
| periodSeconds: 10 | ||||||||||||||||||||||||||||||||||
| timeoutSeconds: 5 | ||||||||||||||||||||||||||||||||||
| failureThreshold: 30 | ||||||||||||||||||||||||||||||||||
| resources: | ||||||||||||||||||||||||||||||||||
| limits: | ||||||||||||||||||||||||||||||||||
| cpu: "250m" | ||||||||||||||||||||||||||||||||||
|
|
@@ -6923,4 +6949,3 @@ metadata: | |||||||||||||||||||||||||||||||||
| app: prometheus-adapter | ||||||||||||||||||||||||||||||||||
| name: custom-metrics-prometheus-adapter | ||||||||||||||||||||||||||||||||||
| ` | ||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -24,6 +24,7 @@ import ( | |
|
|
||
| "github.com/operator-framework/operator-lib/leader" | ||
| "sigs.k8s.io/controller-runtime/pkg/cache" | ||
| "sigs.k8s.io/controller-runtime/pkg/healthz" | ||
| "sigs.k8s.io/controller-runtime/pkg/manager" | ||
| "sigs.k8s.io/controller-runtime/pkg/manager/signals" | ||
| metricsServer "sigs.k8s.io/controller-runtime/pkg/metrics/server" | ||
|
|
@@ -44,6 +45,11 @@ func RunOperator(cmd *cobra.Command, args []string) { | |
| util.InitLogger(logrus.DebugLevel) | ||
| } | ||
| version.RunVersion(cmd, args) | ||
| // Probe address from CLI flag (defaults to :8081) | ||
| probeAddr := os.Getenv("HEALTH_PROBE_BIND_ADDRESS") | ||
| if probeAddr == "" { | ||
| probeAddr = ":8081" | ||
| } | ||
|
|
||
| config := util.KubeConfig() | ||
|
|
||
|
|
@@ -66,6 +72,7 @@ func RunOperator(cmd *cobra.Command, args []string) { | |
| Metrics: metricsServer.Options{ | ||
| BindAddress: fmt.Sprintf("%s:%d", metricsHost, metricsPort), | ||
| }, | ||
| HealthProbeBindAddress: probeAddr, // Serve /healthz and /readyz here | ||
| }) | ||
| if err != nil { | ||
| log.Fatalf("Failed to create manager: %s", err) | ||
|
|
@@ -97,6 +104,15 @@ func RunOperator(cmd *cobra.Command, args []string) { | |
| log.Fatalf("Failed AddToClusterScopedManager: %s", err) | ||
| } | ||
|
|
||
| // Register health and readiness endpoints on mgr | ||
| if err := mgr.AddHealthzCheck("healthz", healthz.Ping); err != nil { | ||
| log.Fatalf("Failed to add health check: %s", err) | ||
| } | ||
|
|
||
| if err := mgr.AddReadyzCheck("readyz", healthz.Ping); err != nil { | ||
| log.Fatalf("Failed to add readiness check: %s", err) | ||
| } | ||
|
|
||
|
Comment on lines
+112
to
+115
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 🧩 Analysis chainReady check uses Ping; OK, but ensure probes hit /readyz Using healthz.Ping is fine. Since healthz was removed per discussion, verify liveness and startup probes in manifests target /readyz too. To verify across the repo: 🏁 Script executed: #!/bin/bash
# Confirm flag presence and probe paths
rg -n --type=go -C2 'health-probe-bind-address'
rg -n -C3 '(livenessProbe|readinessProbe|startupProbe).*path' deploy pkg/bundle 2>/dev/null || trueLength of output: 272 🏁 Script executed: #!/bin/bash
# Search for all probe definitions (without path restriction)
rg -n 'livenessProbe|readinessProbe|startupProbe' --type=go --type=yaml --type=json 2>/dev/null || trueLength of output: 881 🏁 Script executed: #!/bin/bash
# Extract probe definitions with paths from key files
echo "=== deploy/operator.yaml probes ==="
rg -n -A5 'readinessProbe:|livenessProbe:|startupProbe:' deploy/operator.yaml | head -40
echo ""
echo "=== pkg/bundle/deploy.go manager probes (lines 6250-6270) ==="
sed -n '6248,6275p' pkg/bundle/deploy.goLength of output: 1443 🏁 Script executed: #!/bin/bash
# Check for other health check additions in manager.go
echo "=== Health check registrations in manager.go ==="
rg -n 'Add.*Check' pkg/operator/manager.go -A2
echo ""
echo "=== Startup probe paths across all files ==="
rg -n -B2 -A2 'startupProbe:' deploy/ pkg/bundle/deploy.go 2>/dev/null | grep -A4 'path:'Length of output: 503 Fix livenessProbe paths and resolve startup probe inconsistency The probe configurations violate controller-runtime best practices and create misalignment:
Correct per controller-runtime pattern: livenessProbe → 🤖 Prompt for AI Agents |
||
| util.Panic(mgr.Add(manager.RunnableFunc(func(ctx context.Context) error { | ||
| system.RunOperatorCreate(cmd, args) | ||
| return nil | ||
|
|
||
Uh oh!
There was an error while loading. Please reload this page.