-
Notifications
You must be signed in to change notification settings - Fork 113
add liveness, readiness, and startup probes to noobaa-operator #1684
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Changes from 1 commit
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -42,6 +42,13 @@ spec: | |
| - name: noobaa-operator | ||
| image: NOOBAA_OPERATOR_IMAGE | ||
| terminationMessagePolicy: FallbackToLogsOnError | ||
| args: | ||
| - operator | ||
| - run | ||
| - --health-probe-bind-address=:8081 | ||
| ports: | ||
| - name: healthz | ||
| containerPort: 8081 | ||
coderabbitai[bot] marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| volumeMounts: | ||
| - name: bound-sa-token | ||
| mountPath: /var/run/secrets/openshift/serviceaccount | ||
|
|
@@ -51,6 +58,29 @@ spec: | |
| # SHOULD BE RETURNED ONCE COSI IS BACK | ||
| # - name: socket | ||
| # mountPath: /var/lib/cosi | ||
| readinessProbe: | ||
| httpGet: | ||
| path: /readyz | ||
| port: healthz | ||
| initialDelaySeconds: 5 | ||
| periodSeconds: 10 | ||
| timeoutSeconds: 5 | ||
| failureThreshold: 3 | ||
| livenessProbe: | ||
| httpGet: | ||
| path: /readyz | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Should all probes be the same?
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Since we only perform ping, I suppose it's okay for them to be set to the same path. In the future, we might want to use more advanced probes, but for now, it's enough. |
||
| port: healthz | ||
| initialDelaySeconds: 15 | ||
| periodSeconds: 10 | ||
| timeoutSeconds: 5 | ||
| failureThreshold: 3 | ||
| startupProbe: | ||
|
Comment on lines
+57
to
+73
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 💡 Verification agent 🧩 Analysis chainUse /healthz for liveness; /readyz for readiness (current YAML uses /readyz for liveness). readinessProbe:
httpGet:
path: /readyz
port: healthz
initialDelaySeconds: 5
periodSeconds: 10
timeoutSeconds: 5
failureThreshold: 3
livenessProbe:
httpGet:
- path: /readyz
+ path: /healthz
port: healthz
initialDelaySeconds: 15
periodSeconds: 10
timeoutSeconds: 5
failureThreshold: 3🏁 Script executed: #!/usr/bin/env bash
set -e
# Locate the operator.yaml file
OP_FILE=$(fd --type f operator.yaml || true)
echo "Found file: $OP_FILE"
[ -n "$OP_FILE" ] || { echo "operator.yaml not found"; exit 1; }
# Print lines 60–76 for verification
sed -n '60,76p' "$OP_FILE"Length of output: 698 Use /healthz for livenessProbe 🤖 Prompt for AI Agents |
||
| httpGet: | ||
| path: /readyz | ||
| port: healthz | ||
| periodSeconds: 10 | ||
| timeoutSeconds: 5 | ||
| failureThreshold: 30 | ||
| resources: | ||
| limits: | ||
| cpu: "250m" | ||
|
|
||
| Original file line number | Diff line number | Diff line change | ||||||||||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
|
@@ -6231,6 +6231,13 @@ spec: | |||||||||||||||||||||||||||||||||
| - name: noobaa-operator | ||||||||||||||||||||||||||||||||||
| image: NOOBAA_OPERATOR_IMAGE | ||||||||||||||||||||||||||||||||||
| terminationMessagePolicy: FallbackToLogsOnError | ||||||||||||||||||||||||||||||||||
| args: | ||||||||||||||||||||||||||||||||||
| - operator | ||||||||||||||||||||||||||||||||||
| - run | ||||||||||||||||||||||||||||||||||
| - --health-probe-bind-address=:8081 | ||||||||||||||||||||||||||||||||||
| ports: | ||||||||||||||||||||||||||||||||||
| - name: healthz | ||||||||||||||||||||||||||||||||||
| containerPort: 8081 | ||||||||||||||||||||||||||||||||||
| volumeMounts: | ||||||||||||||||||||||||||||||||||
| - name: bound-sa-token | ||||||||||||||||||||||||||||||||||
| mountPath: /var/run/secrets/openshift/serviceaccount | ||||||||||||||||||||||||||||||||||
|
|
@@ -6240,6 +6247,29 @@ spec: | |||||||||||||||||||||||||||||||||
| # SHOULD BE RETURNED ONCE COSI IS BACK | ||||||||||||||||||||||||||||||||||
| # - name: socket | ||||||||||||||||||||||||||||||||||
| # mountPath: /var/lib/cosi | ||||||||||||||||||||||||||||||||||
| readinessProbe: | ||||||||||||||||||||||||||||||||||
| httpGet: | ||||||||||||||||||||||||||||||||||
| path: /readyz | ||||||||||||||||||||||||||||||||||
| port: healthz | ||||||||||||||||||||||||||||||||||
| initialDelaySeconds: 5 | ||||||||||||||||||||||||||||||||||
| periodSeconds: 10 | ||||||||||||||||||||||||||||||||||
| timeoutSeconds: 5 | ||||||||||||||||||||||||||||||||||
| failureThreshold: 3 | ||||||||||||||||||||||||||||||||||
| livenessProbe: | ||||||||||||||||||||||||||||||||||
| httpGet: | ||||||||||||||||||||||||||||||||||
| path: /readyz | ||||||||||||||||||||||||||||||||||
| port: healthz | ||||||||||||||||||||||||||||||||||
| initialDelaySeconds: 15 | ||||||||||||||||||||||||||||||||||
| periodSeconds: 10 | ||||||||||||||||||||||||||||||||||
| timeoutSeconds: 5 | ||||||||||||||||||||||||||||||||||
| failureThreshold: 3 | ||||||||||||||||||||||||||||||||||
| startupProbe: | ||||||||||||||||||||||||||||||||||
|
Comment on lines
+6255
to
+6262
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Liveness probe should hit /healthz, not /readyz. controller-runtime exposes both /healthz and /readyz; liveness is expected to target /healthz. Pointing liveness to /readyz can mask unhealthy-but-ready states. Switch the path to /healthz. livenessProbe:
httpGet:
- path: /readyz
+ path: /healthz
port: healthz
initialDelaySeconds: 15
periodSeconds: 10
timeoutSeconds: 5
failureThreshold: 3📝 Committable suggestion
Suggested change
🤖 Prompt for AI Agents
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. done There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @OdedViner Thanks! Confirming the liveness probe now points to /healthz. As a quick sanity check, you can verify the live settings on a running pod with: If there are multiple definitions of the operator Deployment (e.g., embedded YAML and any generated manifests), make sure they’re all aligned: liveness=/healthz and readiness/startup=/readyz. Nice work. ✏️ Learnings added
If you found this review helpful, would you consider giving us a shout-out on X? Thank you for using CodeRabbit! |
||||||||||||||||||||||||||||||||||
| httpGet: | ||||||||||||||||||||||||||||||||||
| path: /readyz | ||||||||||||||||||||||||||||||||||
| port: healthz | ||||||||||||||||||||||||||||||||||
| periodSeconds: 10 | ||||||||||||||||||||||||||||||||||
| timeoutSeconds: 5 | ||||||||||||||||||||||||||||||||||
| failureThreshold: 30 | ||||||||||||||||||||||||||||||||||
| resources: | ||||||||||||||||||||||||||||||||||
| limits: | ||||||||||||||||||||||||||||||||||
| cpu: "250m" | ||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||
| Original file line number | Diff line number | Diff line change | ||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
|
@@ -24,6 +24,7 @@ import ( | |||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||
| "github.com/operator-framework/operator-lib/leader" | ||||||||||||||||||||||||||
| "sigs.k8s.io/controller-runtime/pkg/cache" | ||||||||||||||||||||||||||
| "sigs.k8s.io/controller-runtime/pkg/healthz" | ||||||||||||||||||||||||||
| "sigs.k8s.io/controller-runtime/pkg/manager" | ||||||||||||||||||||||||||
| "sigs.k8s.io/controller-runtime/pkg/manager/signals" | ||||||||||||||||||||||||||
| metricsServer "sigs.k8s.io/controller-runtime/pkg/metrics/server" | ||||||||||||||||||||||||||
|
|
@@ -44,6 +45,8 @@ func RunOperator(cmd *cobra.Command, args []string) { | |||||||||||||||||||||||||
| util.InitLogger(logrus.DebugLevel) | ||||||||||||||||||||||||||
| } | ||||||||||||||||||||||||||
| version.RunVersion(cmd, args) | ||||||||||||||||||||||||||
| // Probe address from CLI flag (defaults to :8081) | ||||||||||||||||||||||||||
| probeAddr, _ := cmd.Flags().GetString("health-probe-bind-address") | ||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||
|
Comment on lines
48
to
53
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 💡 Verification agent 🧩 Analysis chainDon’t ignore flag parsing errors; add a safe default for probe address
Apply this minimal diff to handle the error and ensure a sane default: - // Probe address from CLI flag (defaults to :8081)
- probeAddr, _ := cmd.Flags().GetString("health-probe-bind-address")
+ // Probe address from CLI flag (defaults to :8081)
+ probeAddr, err := cmd.Flags().GetString("health-probe-bind-address")
+ if err != nil {
+ log.Warnf("Failed reading --health-probe-bind-address flag: %v (defaulting to :8081)", err)
+ probeAddr = ":8081"
+ }
+ if probeAddr == "" {
+ log.Warn("Empty --health-probe-bind-address disables probes; defaulting to :8081")
+ probeAddr = ":8081"
+ }To double-check the flag wiring across the repo, run: 🏁 Script executed: #!/bin/bash
# Verify the flag is defined and used in manifests
rg -n --type=go -C2 'health-probe-bind-address'
rg -n -C3 'health-probe-bind-address' -- 'deploy/**' 'deploy/*.yaml' 'pkg/bundle/**' 2>/dev/null || trueLength of output: 1109 Handle health-probe-bind-address flag parsing errors in pkg/operator/manager.go
Apply this diff: --- a/pkg/operator/manager.go
+++ b/pkg/operator/manager.go
@@ -46,4 +46,12 @@ func NewManagerCommand(...) {
- // Probe address from CLI flag (defaults to :8081)
- probeAddr, _ := cmd.Flags().GetString("health-probe-bind-address")
+ // Probe address from CLI flag (defaults to :8081)
+ probeAddr, err := cmd.PersistentFlags().GetString("health-probe-bind-address")
+ if err != nil {
+ log.Warnf("Failed to read --health-probe-bind-address: %v, defaulting to :8081", err)
+ probeAddr = ":8081"
+ }
+ if probeAddr == "" {
+ log.Warn("Empty --health-probe-bind-address disables probes; defaulting to :8081")
+ probeAddr = ":8081"
+ }📝 Committable suggestion
Suggested change
🤖 Prompt for AI Agents
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 🧩 Analysis chain🏁 Script executed: Length of output: 0 🏁 Script executed: Length of output: 6721 🏁 Script executed: Length of output: 20129 @OdedViner, the root cause is clear from your error message: the The deployment manifest passes Fix: Add the flag definition to // CmdRun returns a CLI command
func CmdRun() *cobra.Command {
cmd := &cobra.Command{
Use: "run",
Short: "Runs the noobaa-operator",
Run: RunOperator,
Args: cobra.NoArgs,
}
cmd.Flags().String("health-probe-bind-address", ":8081", "The address the health probe endpoint binds to")
return cmd
}This registers the flag with a default of 🧬 Code Graph Analysis Results
|
||||||||||||||||||||||||||
| config := util.KubeConfig() | ||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||
|
|
@@ -66,6 +69,7 @@ func RunOperator(cmd *cobra.Command, args []string) { | |||||||||||||||||||||||||
| Metrics: metricsServer.Options{ | ||||||||||||||||||||||||||
| BindAddress: fmt.Sprintf("%s:%d", metricsHost, metricsPort), | ||||||||||||||||||||||||||
| }, | ||||||||||||||||||||||||||
| HealthProbeBindAddress: probeAddr, // Serve /healthz and /readyz here | ||||||||||||||||||||||||||
| }) | ||||||||||||||||||||||||||
| if err != nil { | ||||||||||||||||||||||||||
| log.Fatalf("Failed to create manager: %s", err) | ||||||||||||||||||||||||||
|
|
@@ -97,6 +101,16 @@ func RunOperator(cmd *cobra.Command, args []string) { | |||||||||||||||||||||||||
| log.Fatalf("Failed AddToClusterScopedManager: %s", err) | ||||||||||||||||||||||||||
| } | ||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||
| // Register health and readiness endpoints on mgr | ||||||||||||||||||||||||||
| if err := mgr.AddHealthzCheck("healthz", healthz.Ping); err != nil { | ||||||||||||||||||||||||||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. If we use
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. done |
||||||||||||||||||||||||||
| log.Fatalf("Failed to add health check: %s", err) | ||||||||||||||||||||||||||
| } | ||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||
| if err := mgr.AddReadyzCheck("readyz", healthz.Ping); err != nil { | ||||||||||||||||||||||||||
| log.Fatalf("Failed to add readiness check: %s", err) | ||||||||||||||||||||||||||
| } | ||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||
|
Comment on lines
+112
to
+115
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 🧩 Analysis chainReady check uses Ping; OK, but ensure probes hit /readyz Using healthz.Ping is fine. Since healthz was removed per discussion, verify liveness and startup probes in manifests target /readyz too. To verify across the repo: 🏁 Script executed: #!/bin/bash
# Confirm flag presence and probe paths
rg -n --type=go -C2 'health-probe-bind-address'
rg -n -C3 '(livenessProbe|readinessProbe|startupProbe).*path' deploy pkg/bundle 2>/dev/null || trueLength of output: 272 🏁 Script executed: #!/bin/bash
# Search for all probe definitions (without path restriction)
rg -n 'livenessProbe|readinessProbe|startupProbe' --type=go --type=yaml --type=json 2>/dev/null || trueLength of output: 881 🏁 Script executed: #!/bin/bash
# Extract probe definitions with paths from key files
echo "=== deploy/operator.yaml probes ==="
rg -n -A5 'readinessProbe:|livenessProbe:|startupProbe:' deploy/operator.yaml | head -40
echo ""
echo "=== pkg/bundle/deploy.go manager probes (lines 6250-6270) ==="
sed -n '6248,6275p' pkg/bundle/deploy.goLength of output: 1443 🏁 Script executed: #!/bin/bash
# Check for other health check additions in manager.go
echo "=== Health check registrations in manager.go ==="
rg -n 'Add.*Check' pkg/operator/manager.go -A2
echo ""
echo "=== Startup probe paths across all files ==="
rg -n -B2 -A2 'startupProbe:' deploy/ pkg/bundle/deploy.go 2>/dev/null | grep -A4 'path:'Length of output: 503 Fix livenessProbe paths and resolve startup probe inconsistency The probe configurations violate controller-runtime best practices and create misalignment:
Correct per controller-runtime pattern: livenessProbe → 🤖 Prompt for AI Agents |
||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||
| util.Panic(mgr.Add(manager.RunnableFunc(func(ctx context.Context) error { | ||||||||||||||||||||||||||
| system.RunOperatorCreate(cmd, args) | ||||||||||||||||||||||||||
| return nil | ||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -33,6 +33,8 @@ func Cmd() *cobra.Command { | |
| Use: "operator", | ||
| Short: "Deployment using operator", | ||
| } | ||
| // health-probe flag available on operator and inherited by subcommands | ||
| cmd.PersistentFlags().String("health-probe-bind-address", ":8081", "HTTP address for health/readiness probes (e.g., :8081)") | ||
|
||
| cmd.AddCommand( | ||
| CmdInstall(), | ||
| CmdUninstall(), | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Verification agent
🧩 Analysis chain
Trim trailing spaces and normalize args indentation.
Lines 45–46 have trailing spaces; fix to keep CI/yamllint green and avoid noisy diffs.
To verify no trailing spaces remain:
🏁 Script executed:
Length of output: 93
Trim trailing spaces in deploy/operator.yaml args
Lines 45–46 contain trailing spaces; remove them to keep CI/yamllint green.
🧰 Tools
🪛 YAMLlint (1.37.1)
[warning] 45-45: wrong indentation: expected 10 but found 12
(indentation)
[error] 45-45: trailing spaces
(trailing-spaces)
[error] 46-46: trailing spaces
(trailing-spaces)
🤖 Prompt for AI Agents