-
Notifications
You must be signed in to change notification settings - Fork 113
add liveness, readiness, and startup probes to noobaa-operator #1684
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
WalkthroughWires controller-runtime health probe address into the manager, registers healthz and readyz checks, and updates the operator Deployment and embedded bundle YAML to expose a healthz port, add startup/readiness/liveness probes, and set memory requests/limits. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
participant KS as Kubernetes
participant Pod as Operator Pod
participant Mgr as controller-runtime Manager
participant HP as Health Endpoint (:8081 /readyz)
participant Kubelet as Kubelet Probes
KS->>Pod: Start operator container (probe port exposed)
Pod->>Mgr: Start manager with HealthProbeBindAddress=":8081"
Mgr->>HP: Serve /healthz and /readyz endpoints
Mgr->>Mgr: Register healthz and readyz checks (healthz.Ping)
rect rgba(230,245,255,0.5)
Note over Kubelet,HP: Startup probe (periodSeconds:10, failureThreshold:30)
Kubelet->>HP: GET /readyz
HP-->>Kubelet: 200 / non-200
end
rect rgba(240,255,240,0.5)
Note over Kubelet,HP: Readiness probe (initialDelay:5, periodSeconds:10)
loop every 10s
Kubelet->>HP: GET /readyz
HP-->>Kubelet: 200 / non-200
end
end
rect rgba(255,245,230,0.5)
Note over Kubelet,HP: Liveness probe (initialDelay:15, periodSeconds:10)
loop every 10s
Kubelet->>HP: GET /readyz
HP-->>Kubelet: 200 / non-200 (restart on failure threshold)
end
end
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
📜 Recent review detailsConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Pro 📒 Files selected for processing (3)
🚧 Files skipped from review as they are similar to previous changes (2)
🧰 Additional context used🧠 Learnings (4)📓 Common learnings📚 Learning: 2025-10-26T14:17:07.989ZApplied to files:
📚 Learning: 2025-09-16T11:55:50.673ZApplied to files:
📚 Learning: 2025-07-14T10:59:21.959ZApplied to files:
🔇 Additional comments (2)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🧹 Nitpick comments (5)
pkg/operator/manager.go (3)
48-50: Harden flag parsing: fall back to a sane default and log.If the flag isn’t present (e.g., invoked via the wrong subcommand) or empty, you’ll end up setting an empty HealthProbeBindAddress, which disables the probe server. Add a defensive fallback.
- // Probe address from CLI flag (defaults to :8081) - probeAddr, _ := cmd.Flags().GetString("health-probe-bind-address") + // Probe address from CLI flag; fall back to default if missing + probeAddr, err := cmd.Flags().GetString("health-probe-bind-address") + if err != nil || probeAddr == "" { + probeAddr = ":8081" + log.Infof("health-probe-bind-address not set; defaulting to %s", probeAddr) + }
72-73: Health probe server bound only on the namespace-scoped manager. Confirm this is intentional.Only mgr has HealthProbeBindAddress; cmgr does not. If the cluster-scoped manager wedges, probes will still report healthy. If that’s acceptable, ignore. Otherwise consider proxying a simple cmgr health indicator into mgr’s checks.
104-112: Consider a stronger readiness check (cache sync) instead of Ping-only.healthz.Ping always returns success. For readiness, it’s common to gate on informer cache sync to avoid advertising Ready before controllers are usable.
Example:
// after mgr is created if err := mgr.AddReadyzCheck("cache-sync", func(_ *http.Request) error { if mgr.GetCache().WaitForCacheSync(util.Context()) { return nil } return fmt.Errorf("caches not yet synced") }); err != nil { log.Fatalf("Failed to add readiness check: %s", err) }If blocking per request is a concern, you can set a one-time “synced” atomic flag from a Runnable that waits on cache sync and have the checker read that flag.
deploy/operator.yaml (2)
48-49: Fix indentation for ports block to satisfy yamllint and avoid accidental structural errors.Yamllint warns: expected indent 10 but found 12. Align ports with args/volumeMounts under the container spec.
- ports: - - name: healthz + ports: + - name: healthz containerPort: 8081You can validate with:
- yamllint deploy/operator.yaml
- kubectl apply --dry-run=client -f deploy/operator.yaml
59-67: Readiness probe looks good./readyz on the named port is correct. Timings are reasonable for controller-runtime defaults. If you adopt cache-sync readiness, /readyz will naturally gate until caches are synced.
📜 Review details
Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (3)
deploy/operator.yaml(2 hunks)pkg/operator/manager.go(4 hunks)pkg/operator/operator.go(1 hunks)
🧰 Additional context used
🪛 YAMLlint (1.37.1)
deploy/operator.yaml
[error] 45-45: trailing spaces
(trailing-spaces)
[warning] 48-48: wrong indentation: expected 10 but found 12
(indentation)
🔇 Additional comments (3)
pkg/operator/manager.go (1)
27-27: Import of controller-runtime healthz is appropriate.This is the correct package for registering /healthz and /readyz checks.
deploy/operator.yaml (2)
68-75: Liveness probe is sensible./healthz with a modest initialDelaySeconds is a standard setup for controller-runtime. No concerns.
76-81: Startup probe configuration is conservative and safe.failureThreshold 30 with periodSeconds 10 gives up to 5 minutes for cold starts and image pulls. This helps prevent premature restarts.
e07011f to
c6acf02
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
📜 Review details
Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (3)
deploy/operator.yaml(2 hunks)pkg/operator/manager.go(4 hunks)pkg/operator/operator.go(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
- pkg/operator/operator.go
- pkg/operator/manager.go
🧰 Additional context used
🪛 YAMLlint (1.37.1)
deploy/operator.yaml
[error] 45-45: trailing spaces
(trailing-spaces)
[error] 46-46: trailing spaces
(trailing-spaces)
[warning] 49-49: wrong indentation: expected 10 but found 12
(indentation)
🔇 Additional comments (2)
deploy/operator.yaml (2)
60-82: LGTM! Health probe configuration looks good.The probe configuration is well-structured:
- Readiness probe uses
/readyzendpoint with a short initial delay (5s)- Liveness probe uses
/healthzendpoint with appropriate initial delay (15s)- Startup probe provides sufficient time for initialization (300s total with 30 failures × 10s)
- All probes use consistent timeout and period settings
The probe timings follow Kubernetes best practices for operator workloads.
52-54: Approved: bound-sa-token mount can safely remain readOnlyI searched the codebase for any write operations targeting
/var/run/secrets/openshift/serviceaccount(including Go standard library file writes) and found only references for reading the token file—no code writes to this path. Therefore, mounting the service account token asreadOnly: trueis safe and aligns with security best practices.
c6acf02 to
63cf177
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
♻️ Duplicate comments (3)
deploy/operator.yaml (3)
44-47: Good: explicit "run" subcommand included.This addresses the earlier issue where the manager wouldn't start without "run". With the whitespace fix above, the CLI will parse as intended.
48-50: Fix ports block indentation to conform with lint rules.Yamllint expects the dash items to align directly under the key. Apply:
- ports: - - name: healthz - containerPort: 8081 + ports: + - name: healthz + containerPort: 8081
44-47: Trailing spaces in args will break subcommand parsing; fix and align list indentation to satisfy yamllint.The values "operator␠␠" and "run␠␠␠␠␠␠␠" include trailing spaces, which become part of the argv and will not match Cobra subcommands. Also, yamllint flags the sequence indentation under args. Apply:
- args: - - operator - - run - - --health-probe-bind-address=:8081 + args: + - operator + - run + - --health-probe-bind-address=:8081
🧹 Nitpick comments (1)
deploy/operator.yaml (1)
68-75: When using startupProbe, drop liveness initialDelaySeconds to avoid redundant delay.Kubelet suppresses liveness until startup passes, so initialDelaySeconds on liveness is unnecessary and can delay recovery after restarts. Suggest:
livenessProbe: httpGet: path: /healthz port: healthz - initialDelaySeconds: 15 periodSeconds: 10 timeoutSeconds: 5 failureThreshold: 3
📜 Review details
Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (3)
deploy/operator.yaml(2 hunks)pkg/operator/manager.go(4 hunks)pkg/operator/operator.go(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
- pkg/operator/operator.go
- pkg/operator/manager.go
🧰 Additional context used
🪛 YAMLlint (1.37.1)
deploy/operator.yaml
[warning] 45-45: wrong indentation: expected 10 but found 12
(indentation)
[error] 45-45: trailing spaces
(trailing-spaces)
[error] 46-46: trailing spaces
(trailing-spaces)
[warning] 49-49: wrong indentation: expected 10 but found 12
(indentation)
🔇 Additional comments (3)
deploy/operator.yaml (3)
60-67: Readiness probe looks correct and matches controller-runtime defaults./readyz on the named port is standard; timings are reasonable.
76-82: Startup probe parameters are sane./healthz with failureThreshold: 30 and period: 10s gives up to ~5 minutes for the operator to come up, which is acceptable for CRD installs/migrations.
44-82: Health/Readiness Probes Configuration VerifiedAll checks confirm that the operator binary and manifest are correctly configured:
- The
--health-probe-bind-addressflag is defined with default:8081inpkg/operator/operator.go:37and consumed inpkg/operator/manager.go:49.- The health and readiness endpoints are registered via
mgr.AddHealthzCheck("healthz", healthz.Ping)andmgr.AddReadyzCheck("readyz", healthz.Ping)inpkg/operator/manager.go:105,109.- No other container in the manifest binds port
8081, and the probe indeploy/operator.yaml:47–50matches these settings.No changes are required.
63cf177 to
3eb84c9
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
♻️ Duplicate comments (2)
deploy/operator.yaml (2)
44-47: Trim trailing spaces and fix args indentation (yamllint errors).YAMLlint flags trailing spaces on Lines 45–46 and wrong indentation on Line 45. Reduce the list item indent by 2 and remove trailing spaces so CI passes and the manifest parses consistently.
Apply:
args: - - operator - - run - - --health-probe-bind-address=:8081 + - operator + - run + - --health-probe-bind-address=:8081Run to verify:
#!/bin/bash pip install -q yamllint yamllint -d "{extends: default, rules: {line-length: disable}}" deploy/operator.yaml
48-50: Align ports list indentation with container-level keys.List item under ports is over-indented (yamllint: expected 10, found 12). Align the dash with two spaces more than the ports key.
Apply:
ports: - - name: healthz - containerPort: 8081 + - name: healthz + containerPort: 8081Re-run yamllint (same command as above) to confirm no indentation warnings.
🧹 Nitpick comments (5)
pkg/operator/manager.go (2)
72-72: Health probes attached only to the namespace-scoped managerYou bind the probe server only on
mgr(namespace-scoped). If the cluster-scoped manager (cmgr) fails to start or crashes, the pod’s readiness/liveness may still report healthy because the probe targets onlymgr. If that’s intentional, ignore; otherwise consider one of:
- Also enabling a probe server for
cmgr(e.g., on a sibling port like:8082) and wiring probes for it.- Or keep a single probe server (on
mgr) but add a readiness check that reflectscmgrhealth (e.g., flip a flag oncecmgrcaches sync or controllers start).If you prefer the second approach, I can sketch a checker that reports NotReady until
cmgrsignals it’s up.
104-111: Readiness uses Ping only; consider gating on cache sync to avoid false positivesRegistering
healthz.Pingfor both health and readiness works, but readiness will go green as soon as the HTTP server is up, not when informers/controllers are actually ready. Consider adding a cache-sync readiness check so/readyzturns 200 only after caches synchronize.Example change (adds a cache-sync runnable and checker):
// Register health and readiness endpoints on mgr if err := mgr.AddHealthzCheck("healthz", healthz.Ping); err != nil { log.Fatalf("Failed to add health check: %s", err) } - if err := mgr.AddReadyzCheck("readyz", healthz.Ping); err != nil { - log.Fatalf("Failed to add readiness check: %s", err) - } + // Ready when caches have synced + if err := mgr.AddReadyzCheck("readyz", healthz.Ping); err != nil { + log.Fatalf("Failed to add readiness ping: %s", err) + } + if err := mgr.AddReadyzCheck("caches", cacheReadyChecker()); err != nil { + log.Fatalf("Failed to add cache readiness check: %s", err) + }Add this helper (outside the shown range), which uses a one-time sync and a fast check per request:
// at file top-level: var cachesSynced atomic.Bool // near other runnables, before starting managers: util.Panic(mgr.Add(manager.RunnableFunc(func(ctx context.Context) error { if ok := mgr.GetCache().WaitForCacheSync(ctx); !ok { return fmt.Errorf("cache sync failed") } cachesSynced.Store(true) <-ctx.Done() return nil }))) // helper: func cacheReadyChecker() healthz.Checker { return func(_ *http.Request) error { if !cachesSynced.Load() { return fmt.Errorf("caches not yet synced") } return nil } }Note: you’ll need
import "net/http"andimport "sync/atomic"for the helper.pkg/bundle/deploy.go (2)
6251-6273: Probe set is sensible; consider strengthening readyz semantics.
- The readiness, liveness, and startup probes use /readyz and /healthz on the dedicated probe server. Timing defaults (readiness=5s initial delay, liveness=15s, startup=30×10s) look balanced for controller-runtime operators.
- One potential gap: if your manager registers
readyzwithhealthz.Ping, readiness will always succeed once the probe server is up (it doesn’t wait for caches/leader election). If that’s your current implementation, consider a readiness check that flips true only after caches are synced (or after leader election if that’s your intended signal).Example (Go) of a non-blocking cache-sync readiness gate:
// after mgr is created: var cachesSynced atomic.Bool go func() { // Wait until the manager starts and caches sync once <-mgr.Elected() // optional: if leader election enabled if ok := mgr.GetCache().WaitForCacheSync(context.Background()); ok { cachesSynced.Store(true) } }() // register probes _ = mgr.AddHealthzCheck("healthz", healthz.Ping) _ = mgr.AddReadyzCheck("cache-sync", func(_ *http.Request) error { if cachesSynced.Load() { return nil } return fmt.Errorf("caches not yet synced") })This avoids reporting Pods as Ready before controllers can actually reconcile.
If you’ve already wired a stronger readyz check in manager.go, feel free to ignore; otherwise I can open a follow-up PR with this change.
6251-6273: Nit: explicitly set probe scheme for forward-compatibility (optional).Kubelet defaults HTTP for
httpGet, but being explicit can prevent surprises if defaults change or if TLS is enabled later:readinessProbe: httpGet: path: /readyz - port: healthz + port: healthz + scheme: HTTP ... livenessProbe: httpGet: path: /healthz - port: healthz + port: healthz + scheme: HTTP ... startupProbe: httpGet: path: /healthz - port: healthz + port: healthz + scheme: HTTPdeploy/operator.yaml (1)
60-82: Probes wiring looks correct; optional tuning with startupProbe present.Endpoints and named port resolve properly. With startupProbe enabled, initialDelaySeconds on liveness/readiness is redundant; kubelet suppresses them until startup succeeds. Consider removing the initial delays to simplify timing semantics.
Optional diff:
readinessProbe: httpGet: path: /readyz port: healthz - initialDelaySeconds: 5 periodSeconds: 10 timeoutSeconds: 5 failureThreshold: 3 livenessProbe: httpGet: path: /healthz port: healthz - initialDelaySeconds: 15 periodSeconds: 10 timeoutSeconds: 5 failureThreshold: 3If you keep the delays, that’s fine too—behavior is still correct.
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (4)
deploy/operator.yaml(2 hunks)pkg/bundle/deploy.go(3 hunks)pkg/operator/manager.go(4 hunks)pkg/operator/operator.go(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- pkg/operator/operator.go
🧰 Additional context used
🪛 YAMLlint (1.37.1)
deploy/operator.yaml
[warning] 45-45: wrong indentation: expected 10 but found 12
(indentation)
[error] 45-45: trailing spaces
(trailing-spaces)
[error] 46-46: trailing spaces
(trailing-spaces)
[warning] 49-49: wrong indentation: expected 10 but found 12
(indentation)
🔇 Additional comments (3)
pkg/operator/manager.go (1)
27-27: LGTM: correct import for controller-runtime health checksImporting
sigs.k8s.io/controller-runtime/pkg/healthzis appropriate for registering/healthzand/readyzendpoints.pkg/bundle/deploy.go (2)
6235-6238: Args look correct; verify entrypoint supports subcommand and flag.
- The container now invokes:
operator run --health-probe-bind-address=:8081. That matches controller-runtime convention, but only if your binary’s root command exposes therunsubcommand and recognizes the probe flag.- Please sanity-check the operator image you ship in all deployment channels accepts these args; otherwise the container may exit immediately.
Would you like a quick script to pull the image used in CI and validate the help/flags exposed by the binary? I can supply one.
6240-6241: Port naming and probe port reference are consistent.Defining a named container port
healthz: 8081and referencing it from the probes is a good practice that makes refactors safer. No issues here.
| // Probe address from CLI flag (defaults to :8081) | ||
| probeAddr, _ := cmd.Flags().GetString("health-probe-bind-address") | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Verification agent
🧩 Analysis chain
Don’t ignore flag parsing errors; add a safe default for probe address
cmd.Flags().GetString(...) error is discarded. While it should succeed if the flag is defined, ignoring the error can mask regressions and leaves probeAddr empty (which disables the probe server) if the flag is missing or mistyped.
Apply this minimal diff to handle the error and ensure a sane default:
- // Probe address from CLI flag (defaults to :8081)
- probeAddr, _ := cmd.Flags().GetString("health-probe-bind-address")
+ // Probe address from CLI flag (defaults to :8081)
+ probeAddr, err := cmd.Flags().GetString("health-probe-bind-address")
+ if err != nil {
+ log.Warnf("Failed reading --health-probe-bind-address flag: %v (defaulting to :8081)", err)
+ probeAddr = ":8081"
+ }
+ if probeAddr == "" {
+ log.Warn("Empty --health-probe-bind-address disables probes; defaulting to :8081")
+ probeAddr = ":8081"
+ }To double-check the flag wiring across the repo, run:
🏁 Script executed:
#!/bin/bash
# Verify the flag is defined and used in manifests
rg -n --type=go -C2 'health-probe-bind-address'
rg -n -C3 'health-probe-bind-address' -- 'deploy/**' 'deploy/*.yaml' 'pkg/bundle/**' 2>/dev/null || trueLength of output: 1109
Handle health-probe-bind-address flag parsing errors in pkg/operator/manager.go
- Verified that
--health-probe-bind-addressis defined as a persistent flag inpkg/operator/operator.gowith a default of:8081, and is referenced in the bundled manifests. - In
pkg/operator/manager.go(around line 49), callingcmd.Flags().GetString(...)- Discards any lookup error (e.g., if the flag name is mistyped or not registered on this command)
- May never pick up the persistent flag, since it lives in the parent command’s flag set
- Leaves
probeAddrempty on error, inadvertently disabling health probes
- To ensure the flag is read correctly and to guard against missing or empty values, switch to the persistent flag set, check the error, and fall back to the known default.
Apply this diff:
--- a/pkg/operator/manager.go
+++ b/pkg/operator/manager.go
@@ -46,4 +46,12 @@ func NewManagerCommand(...) {
- // Probe address from CLI flag (defaults to :8081)
- probeAddr, _ := cmd.Flags().GetString("health-probe-bind-address")
+ // Probe address from CLI flag (defaults to :8081)
+ probeAddr, err := cmd.PersistentFlags().GetString("health-probe-bind-address")
+ if err != nil {
+ log.Warnf("Failed to read --health-probe-bind-address: %v, defaulting to :8081", err)
+ probeAddr = ":8081"
+ }
+ if probeAddr == "" {
+ log.Warn("Empty --health-probe-bind-address disables probes; defaulting to :8081")
+ probeAddr = ":8081"
+ }📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| // Probe address from CLI flag (defaults to :8081) | |
| probeAddr, _ := cmd.Flags().GetString("health-probe-bind-address") | |
| // Probe address from CLI flag (defaults to :8081) | |
| probeAddr, err := cmd.PersistentFlags().GetString("health-probe-bind-address") | |
| if err != nil { | |
| log.Warnf("Failed to read --health-probe-bind-address: %v, defaulting to :8081", err) | |
| probeAddr = ":8081" | |
| } | |
| if probeAddr == "" { | |
| log.Warn("Empty --health-probe-bind-address disables probes; defaulting to :8081") | |
| probeAddr = ":8081" | |
| } |
🤖 Prompt for AI Agents
In pkg/operator/manager.go around lines 48 to 50, the call to
cmd.Flags().GetString("health-probe-bind-address") ignores errors and reads the
local flagset (so it can miss the persistent flag), which can leave probeAddr
empty and disable probes; update to use
cmd.PersistentFlags().GetString("health-probe-bind-address"), check and handle
the returned error (log/return or fallback), and if the value is empty or an
error occurred, set probeAddr to the known default ":8081" so health probes
remain enabled.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
$ kubectl describe pod noobaa-operator-6fc879d987-wlkx9
Args:
operator
run
--health-probe-bind-address=:8081
State: Waiting
Reason: CrashLoopBackOff
Last State: Terminated
Reason: Error
Message: Error: unknown flag: --health-probe-bind-address
See 'noobaa operator run --help' for usage.
Exit Code: 1
Started: Sun, 26 Oct 2025 16:25:56 +0200
Finished: Sun, 26 Oct 2025 16:25:56 +0200
Ready: False
Restart Count: 2
Limits:
cpu: 250m
memory: 512Mi
Requests:
cpu: 100m
memory: 256Mi
Liveness: http-get http://:healthz/readyz delay=15s timeout=5s period=10s #success=1 #failure=3
Readiness: http-get http://:healthz/readyz delay=5s timeout=5s period=10s #success=1 #failure=3
Startup: http-get http://:healthz/healthz delay=0s timeout=5s period=10s #success=1 #failure=30
Environment:
OPERATOR_NAME: noobaa-operator
POD_NAME: noobaa-operator-6fc879d987-wlkx9 (v1:metadata.name)
WATCH_NAMESPACE: default (v1:metadata.namespace)
DEV_ENV: true
NOOBAA_CLI_DEPLOYMENT: true
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Find where the Cobra commands are defined, especially the "run" command
rg -n --type=go -C5 'cobra\.Command.*run'
Length of output: 0
🏁 Script executed:
#!/bin/bash
# Look for flag definitions in the operator package
rg -n --type=go -C3 'PersistentFlags\(\)|Flags\(\)' pkg/operator/
Length of output: 6721
🏁 Script executed:
#!/bin/bash
# Find the operator.go file and examine how commands are structured
fd -t f 'operator.go' pkg/operator/ --exec cat {}
Length of output: 20129
@OdedViner, the root cause is clear from your error message: the --health-probe-bind-address flag is never defined in the Cobra command structure.
The deployment manifest passes --health-probe-bind-address=:8081, and pkg/operator/manager.go tries to read it, but CmdRun() in pkg/operator/operator.go never registers the flag, so the CLI rejects it as unknown.
Fix: Add the flag definition to CmdRun() in pkg/operator/operator.go:
// CmdRun returns a CLI command
func CmdRun() *cobra.Command {
cmd := &cobra.Command{
Use: "run",
Short: "Runs the noobaa-operator",
Run: RunOperator,
Args: cobra.NoArgs,
}
cmd.Flags().String("health-probe-bind-address", ":8081", "The address the health probe endpoint binds to")
return cmd
}This registers the flag with a default of :8081, allowing the manifest's --health-probe-bind-address=:8081 argument to be parsed successfully and the pod to start.
🧬 Code Graph Analysis Results
- File: pkg/operator/manager.go
- Summary (re health probe flag and address):
- Description: The operator's RunOperator constructs a manager with a HealthProbeBindAddress set to healthProbeAddr. The variable healthProbeAddr is defined as ":8081", which means the operator exposes health probes on port 8081. If the deployment or binary version does not recognize the --health-probe-bind-address flag, this mapping indicates the code expects the health probe address to be provided via HealthProbeBindAddress on the controller-runtime manager, not as a separate CLI flag.
- Relevant details:
- healthProbeAddr is defined as ":8081" (default for the health probe endpoint).
- The manager is created with HealthProbeBindAddress: healthProbeAddr, which wires /healthz and /readyz to that address.
- Location notes:
- Health probe address default and usage are in the RunOperator setup where the manager is instantiated, and the healthProbeAddr variable is defined at the top of the file.
- Summary (re health probe flag and address):
|
@dannyzaken @tangledbytes please take a look |
4e1b572 to
6095110
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
♻️ Duplicate comments (3)
deploy/operator.yaml (2)
44-47: Trim trailing spaces in args to keep YAML clean and avoid linter failures.
Matches prior feedback; still present on "operator" and "run".Apply:
args: - - operator - - run + - operator + - run - --health-probe-bind-address=:8081
48-50: Fix ports block indentation.
Keep it aligned with other container-level keys.- ports: - - name: healthz - containerPort: 8081 + ports: + - name: healthz + containerPort: 8081pkg/operator/manager.go (1)
48-49: Don’t ignore flag errors; read from PersistentFlags and default safely.
Ignoring the error can disable probes if the flag isn’t found.- // Probe address from CLI flag (defaults to :8081) - probeAddr, _ := cmd.Flags().GetString("health-probe-bind-address") + // Probe address from CLI flag (defaults to :8081) + probeAddr, err := cmd.PersistentFlags().GetString("health-probe-bind-address") + if err != nil { + log.Warnf("Failed reading --health-probe-bind-address: %v (defaulting to :8081)", err) + probeAddr = ":8081" + } + if probeAddr == "" { + log.Warn("Empty --health-probe-bind-address disables probes; defaulting to :8081") + probeAddr = ":8081" + }Run to confirm the flag is defined and referenced:
#!/bin/bash rg -n --type=go -C2 'health-probe-bind-address'
🧹 Nitpick comments (1)
pkg/operator/manager.go (1)
104-111: Checks registration LGTM; consider clearer names and optional cache-ready check.
Minor: use names like "ping" instead of repeating endpoint names; optionally add a cache sync readiness check if needed.- if err := mgr.AddHealthzCheck("healthz", healthz.Ping); err != nil { + if err := mgr.AddHealthzCheck("ping", healthz.Ping); err != nil { log.Fatalf("Failed to add health check: %s", err) } - if err := mgr.AddReadyzCheck("readyz", healthz.Ping); err != nil { + if err := mgr.AddReadyzCheck("ping", healthz.Ping); err != nil { log.Fatalf("Failed to add readiness check: %s", err) }Optional cache-ready example (if you want stricter readiness):
ready := atomic.Bool{} _ = mgr.Add(manager.RunnableFunc(func(ctx context.Context) error { // mark ready after caches start ready.Store(true) <-ctx.Done() return nil })) _ = mgr.AddReadyzCheck("started", func(_ *http.Request) error { if !ready.Load() { return fmt.Errorf("not started") } return nil })
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (4)
deploy/operator.yaml(2 hunks)pkg/bundle/deploy.go(3 hunks)pkg/operator/manager.go(4 hunks)pkg/operator/operator.go(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
- pkg/operator/operator.go
- pkg/bundle/deploy.go
🧰 Additional context used
🪛 YAMLlint (1.37.1)
deploy/operator.yaml
[warning] 45-45: wrong indentation: expected 10 but found 12
(indentation)
[error] 45-45: trailing spaces
(trailing-spaces)
[error] 46-46: trailing spaces
(trailing-spaces)
[warning] 49-49: wrong indentation: expected 10 but found 12
(indentation)
🔇 Additional comments (3)
deploy/operator.yaml (1)
76-82: Startup probe endpoint choice is fine.
Using /readyz here is acceptable; no change needed.pkg/operator/manager.go (2)
27-27: Import healthz looks good.
Required for adding checks.
72-72: Wiring HealthProbeBindAddress into mgr is correct.
This enables /healthz and /readyz.
| readinessProbe: | ||
| httpGet: | ||
| path: /readyz | ||
| port: healthz | ||
| initialDelaySeconds: 5 | ||
| periodSeconds: 10 | ||
| timeoutSeconds: 5 | ||
| failureThreshold: 3 | ||
| livenessProbe: | ||
| httpGet: | ||
| path: /readyz | ||
| port: healthz | ||
| initialDelaySeconds: 15 | ||
| periodSeconds: 10 | ||
| timeoutSeconds: 5 | ||
| failureThreshold: 3 | ||
| startupProbe: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Verification agent
🧩 Analysis chain
Use /healthz for liveness; /readyz for readiness (current YAML uses /readyz for liveness).
This avoids killing a live process just because it isn’t ready yet and matches the PR description.
readinessProbe:
httpGet:
path: /readyz
port: healthz
initialDelaySeconds: 5
periodSeconds: 10
timeoutSeconds: 5
failureThreshold: 3
livenessProbe:
httpGet:
- path: /readyz
+ path: /healthz
port: healthz
initialDelaySeconds: 15
periodSeconds: 10
timeoutSeconds: 5
failureThreshold: 3🏁 Script executed:
#!/usr/bin/env bash
set -e
# Locate the operator.yaml file
OP_FILE=$(fd --type f operator.yaml || true)
echo "Found file: $OP_FILE"
[ -n "$OP_FILE" ] || { echo "operator.yaml not found"; exit 1; }
# Print lines 60–76 for verification
sed -n '60,76p' "$OP_FILE"Length of output: 698
Use /healthz for livenessProbe
Swap the livenessProbe HTTP path to /healthz (keep readinessProbe on /readyz) to avoid killing a live process before it’s ready.
🤖 Prompt for AI Agents
In deploy/operator.yaml around lines 60 to 76, the livenessProbe is using the
/readyz path which can cause a live process to be killed before it is fully
ready; change the livenessProbe httpGet.path value from /readyz to /healthz
while leaving the readinessProbe on /readyz (no other probe changes required).
| failureThreshold: 3 | ||
| livenessProbe: | ||
| httpGet: | ||
| path: /readyz |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should all probes be the same?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we only perform ping, I suppose it's okay for them to be set to the same path. In the future, we might want to use more advanced probes, but for now, it's enough.
| } | ||
|
|
||
| // Register health and readiness endpoints on mgr | ||
| if err := mgr.AddHealthzCheck("healthz", healthz.Ping); err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we use readyz for both probes, I think we can remove healthz
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
pkg/operator/operator.go
Outdated
| Short: "Deployment using operator", | ||
| } | ||
| // health-probe flag available on operator and inherited by subcommands | ||
| cmd.PersistentFlags().String("health-probe-bind-address", ":8081", "HTTP address for health/readiness probes (e.g., :8081)") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it necessary to have a flag for the probe address? If there isn't a need to change it for some cases, we can have it defined in manager.go (e.g.: similar to the metrics address\port here)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When I moved the probe address to manager.go, the operator failed to start because the flag is still passed in the Deployment arguments. The pod crashes with the following error:
$ kubectl describe pod noobaa-operator-6fc879d987-wlkx9
Args:
operator
run
--health-probe-bind-address=:8081
State: Waiting
Reason: CrashLoopBackOff
Last State: Terminated
Reason: Error
Message: Error: unknown flag: --health-probe-bind-address
See 'noobaa operator run --help' for usage.
Exit Code: 1
Started: Sun, 26 Oct 2025 16:25:56 +0200
Finished: Sun, 26 Oct 2025 16:25:56 +0200
Ready: False
Restart Count: 2
Limits:
cpu: 250m
memory: 512Mi
Requests:
cpu: 100m
memory: 256Mi
Liveness: http-get http://:healthz/readyz delay=15s timeout=5s period=10s #success=1 #failure=3
Readiness: http-get http://:healthz/readyz delay=5s timeout=5s period=10s #success=1 #failure=3
Startup: http-get http://:healthz/healthz delay=0s timeout=5s period=10s #success=1 #failure=30
Environment:
OPERATOR_NAME: noobaa-operator
POD_NAME: noobaa-operator-6fc879d987-wlkx9 (v1:metadata.name)
WATCH_NAMESPACE: default (v1:metadata.namespace)
DEV_ENV: true
NOOBAA_CLI_DEPLOYMENT: true
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you can remove it from operator.yaml
6095110 to
f87a9a7
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
♻️ Duplicate comments (2)
deploy/operator.yaml (2)
48-50: Align ports list indentation under 'ports:'.Re-align the sequence to satisfy yamllint’s expected indentation.
- ports: - - name: healthz - containerPort: 8081 + ports: + - name: healthz + containerPort: 8081
68-75: Use /healthz for livenessProbe (keep /readyz for readiness/startup).Best practice is liveness→/healthz, readiness/startup→/readyz. With current ping checks it’s mostly equivalent, but /healthz avoids coupling liveness to readiness state.
livenessProbe: httpGet: - path: /readyz + path: /healthz port: healthz
🧹 Nitpick comments (1)
pkg/bundle/deploy.go (1)
6270-6275: Optional: consider /healthz for startupProbe for consistency.Startup can also target /healthz; keeping readiness on /readyz and liveness/startup on /healthz is a common pattern.
startupProbe: httpGet: - path: /readyz + path: /healthz port: healthz periodSeconds: 10 timeoutSeconds: 5 failureThreshold: 30
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (4)
deploy/operator.yaml(2 hunks)pkg/bundle/deploy.go(3 hunks)pkg/operator/manager.go(4 hunks)pkg/operator/operator.go(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
- pkg/operator/manager.go
- pkg/operator/operator.go
🧰 Additional context used
🪛 YAMLlint (1.37.1)
deploy/operator.yaml
[warning] 45-45: wrong indentation: expected 10 but found 12
(indentation)
[error] 45-45: trailing spaces
(trailing-spaces)
[error] 46-46: trailing spaces
(trailing-spaces)
[warning] 49-49: wrong indentation: expected 10 but found 12
(indentation)
🔇 Additional comments (3)
pkg/bundle/deploy.go (3)
6237-6241: Add health probe bind address CLI flag (good).Flag looks correct for controller-runtime; confirms the probe server will listen on :8081. Ensure the operator binary exposes this flag and uses it to start the health server.
6242-6244: Expose named health port (good).Named port "healthz" matches the httpGet probe references; no issues.
6253-6261: Readiness probe config looks sensible./readyz path, timing, and thresholds are reasonable defaults.
deploy/operator.yaml
Outdated
| args: | ||
| - operator | ||
| - run | ||
| - --health-probe-bind-address=:8081 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Verification agent
🧩 Analysis chain
Trim trailing spaces and normalize args indentation.
Lines 45–46 have trailing spaces; fix to keep CI/yamllint green and avoid noisy diffs.
- args:
- - operator
- - run
- - --health-probe-bind-address=:8081
+ args:
+ - operator
+ - run
+ - --health-probe-bind-address=:8081To verify no trailing spaces remain:
🏁 Script executed:
#!/usr/bin/env bash
grep -nE ' +$' deploy/operator.yaml || echo "No trailing spaces"Length of output: 93
Trim trailing spaces in deploy/operator.yaml args
Lines 45–46 contain trailing spaces; remove them to keep CI/yamllint green.
🧰 Tools
🪛 YAMLlint (1.37.1)
[warning] 45-45: wrong indentation: expected 10 but found 12
(indentation)
[error] 45-45: trailing spaces
(trailing-spaces)
[error] 46-46: trailing spaces
(trailing-spaces)
🤖 Prompt for AI Agents
In deploy/operator.yaml around lines 44 to 47, the args entries ("operator" and
"run") include trailing spaces on lines 45–46; remove the trailing whitespace
characters from those two args so each arg is trimmed to exact tokens (e.g.,
"operator" and "run") to satisfy yamllint/CI and keep YAML clean.
| httpGet: | ||
| path: /readyz | ||
| port: healthz | ||
| initialDelaySeconds: 15 | ||
| periodSeconds: 10 | ||
| timeoutSeconds: 5 | ||
| failureThreshold: 3 | ||
| startupProbe: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Liveness probe should hit /healthz, not /readyz.
controller-runtime exposes both /healthz and /readyz; liveness is expected to target /healthz. Pointing liveness to /readyz can mask unhealthy-but-ready states. Switch the path to /healthz.
livenessProbe:
httpGet:
- path: /readyz
+ path: /healthz
port: healthz
initialDelaySeconds: 15
periodSeconds: 10
timeoutSeconds: 5
failureThreshold: 3📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| httpGet: | |
| path: /readyz | |
| port: healthz | |
| initialDelaySeconds: 15 | |
| periodSeconds: 10 | |
| timeoutSeconds: 5 | |
| failureThreshold: 3 | |
| startupProbe: | |
| httpGet: | |
| path: /healthz | |
| port: healthz | |
| initialDelaySeconds: 15 | |
| periodSeconds: 10 | |
| timeoutSeconds: 5 | |
| failureThreshold: 3 | |
| startupProbe: |
🤖 Prompt for AI Agents
In pkg/bundle/deploy.go around lines 6262 to 6269 the livenessProbe httpGet path
is set to /readyz but should target /healthz; change the livenessProbe httpGet
path value from "/readyz" to "/healthz" so the liveness check uses the
controller-runtime health endpoint and does not mask unhealthy-but-ready states.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@OdedViner Thanks! Confirming the liveness probe now points to /healthz. As a quick sanity check, you can verify the live settings on a running pod with:
kubectl -n noobaa get deploy noobaa-operator -o jsonpath='{.spec.template.spec.containers[?(@.name=="noobaa-operator")].livenessProbe.httpGet.path}{"\n"}'
If there are multiple definitions of the operator Deployment (e.g., embedded YAML and any generated manifests), make sure they’re all aligned: liveness=/healthz and readiness/startup=/readyz. Nice work.
✏️ Learnings added
Learnt from: OdedViner
PR: noobaa/noobaa-operator#1684
File: pkg/bundle/deploy.go:6259-6266
Timestamp: 2025-10-26T14:17:07.989Z
Learning: For controller-runtime operators, use livenessProbe path /healthz and readiness/startup path /readyz (port :8081) to avoid masking unhealthy-but-ready states.
Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.
If you found this review helpful, would you consider giving us a shout-out on X?
Thank you for using CodeRabbit!
- integrate controller-runtime healthz and readyz endpoints into the namespace-scoped manager - expose probe server on :8081 and register default checks - update operator Deployment with liveness, readiness, and startup probes Signed-off-by: Oded Viner <[email protected]>
3fa2e24 to
17b70f5
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
♻️ Duplicate comments (2)
deploy/operator.yaml (2)
69-76: Consider using /healthz for livenessProbe.Best practice is to use
/healthzfor liveness (to check if process is alive) and/readyzfor readiness (to check if ready to serve traffic). Using/readyzfor liveness can kill a live process that isn't ready yet.However, as noted by dannyzaken in previous comments, since the current implementation only performs a ping check, using the same path for all probes is acceptable for now.
45-48: Remove trailing spaces from args entries.Lines 46-47 still contain trailing spaces after "operator" and "run", which will cause yamllint failures in CI.
Apply this diff:
args: - - operator - - run + - operator + - run - --health-probe-bind-address=:8081
🧹 Nitpick comments (1)
pkg/operator/manager.go (1)
35-38: Health probe address is hardcoded; consider reading the CLI flag.The deployment manifest (deploy/operator.yaml) passes
--health-probe-bind-address=:8081, but the code hardcodeshealthProbeAddr = ":8081"and never reads the flag value. This means the CLI flag has no effect.If the flag is intended to be configurable, read it from the command:
probeAddr := ":8081" // default if cmd != nil { if val, err := cmd.Flags().GetString("health-probe-bind-address"); err == nil && val != "" { probeAddr = val } } healthProbeAddr = probeAddrOtherwise, remove the unused flag from the deployment args.
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
deploy/operator.yaml(2 hunks)pkg/bundle/deploy.go(2 hunks)pkg/operator/manager.go(3 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- pkg/bundle/deploy.go
🧰 Additional context used
🧬 Code graph analysis (1)
pkg/operator/manager.go (1)
pkg/util/util.go (1)
Logger(903-905)
🔇 Additional comments (4)
deploy/operator.yaml (1)
49-51: LGTM!The healthz port is correctly configured and properly indented.
pkg/operator/manager.go (3)
60-72: LGTM!Health probe bind address is correctly wired into the namespace-scoped manager, enabling /healthz and /readyz endpoints.
78-86: Verify that cluster-scoped manager intentionally omits health probes.The cluster-scoped manager (cmgr) is created without
HealthProbeBindAddress, so it won't expose /healthz or /readyz endpoints. This is likely intentional since only one manager needs to serve probes for the pod, but please confirm this design choice is correct.
103-105: LGTM!The readiness check is properly registered on the namespace-scoped manager using the standard
healthz.Pinghandler, with appropriate error handling.
17b70f5 to
c804893
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
♻️ Duplicate comments (2)
pkg/operator/manager.go (1)
35-39: Probe address ignores CLI flag; wire to PersistentFlags with safe defaultCurrently a hardcoded healthProbeAddr is used; the --health-probe-bind-address flag is not read, so user/manifest config is ignored. Please read the persistent flag, handle errors, and fall back to the default.
Apply this minimal diff:
@@ var ( metricsHost = "0.0.0.0" metricsPort int32 = 8383 healthProbeAddr = ":8081" log = util.Logger() ) @@ - // mgr => namespace scoped manager - mgr, err := manager.New(config, manager.Options{ + // Resolve probe address from flag with safe default + probeAddr, err := cmd.PersistentFlags().GetString("health-probe-bind-address") + if err != nil { + log.Warnf("Failed to read --health-probe-bind-address: %v; defaulting to %s", err, healthProbeAddr) + probeAddr = healthProbeAddr + } + if probeAddr == "" { + log.Warnf("Empty --health-probe-bind-address disables probes; defaulting to %s", healthProbeAddr) + probeAddr = healthProbeAddr + } + + // mgr => namespace scoped manager + mgr, err := manager.New(config, manager.Options{ @@ - HealthProbeBindAddress: healthProbeAddr, // Serve /healthz and /readyz here + HealthProbeBindAddress: probeAddr, // Serve /healthz and /readyz here })pkg/bundle/deploy.go (1)
6259-6265: Fix probe endpoints: liveness => /healthz, startup => /readyz.
- Liveness is incorrectly checking /readyz; should hit /healthz.
- Startup should gate on readiness (/readyz), not /healthz.
This prevents masking unhealthy-but-ready states. Based on learnings.Apply this diff:
livenessProbe: httpGet: - path: /readyz + path: /healthz port: healthz initialDelaySeconds: 15 periodSeconds: 10 timeoutSeconds: 5 failureThreshold: 3 startupProbe: httpGet: - path: /healthz + path: /readyz port: healthz periodSeconds: 10 timeoutSeconds: 5 failureThreshold: 30Also applies to: 6267-6272
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
pkg/bundle/deploy.go(2 hunks)pkg/operator/manager.go(3 hunks)
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-10-26T14:17:07.989Z
Learnt from: OdedViner
PR: noobaa/noobaa-operator#1684
File: pkg/bundle/deploy.go:6259-6266
Timestamp: 2025-10-26T14:17:07.989Z
Learning: For controller-runtime operators, use livenessProbe path /healthz and readiness/startup path /readyz (port :8081) to avoid masking unhealthy-but-ready states.
Applied to files:
pkg/bundle/deploy.go
🧬 Code graph analysis (1)
pkg/operator/manager.go (1)
pkg/util/util.go (1)
Logger(903-905)
🔇 Additional comments (3)
pkg/operator/manager.go (2)
27-27: Import looks goodhealthz import is correct for readiness/liveness wiring.
71-71: LGTM; will honor flags after wiring aboveUsing HealthProbeBindAddress on the namespace-scoped manager is correct. Once the flag wiring above is in, this will respect manifests/CLI.
pkg/bundle/deploy.go (1)
6234-6241: Health server wiring looks good.--health-probe-bind-address=:8081 and the named port healthz are correctly configured.
| if err := mgr.AddReadyzCheck("readyz", healthz.Ping); err != nil { | ||
| log.Fatalf("Failed to add readiness check: %s", err) | ||
| } | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
Ready check uses Ping; OK, but ensure probes hit /readyz
Using healthz.Ping is fine. Since healthz was removed per discussion, verify liveness and startup probes in manifests target /readyz too.
To verify across the repo:
🏁 Script executed:
#!/bin/bash
# Confirm flag presence and probe paths
rg -n --type=go -C2 'health-probe-bind-address'
rg -n -C3 '(livenessProbe|readinessProbe|startupProbe).*path' deploy pkg/bundle 2>/dev/null || trueLength of output: 272
🏁 Script executed:
#!/bin/bash
# Search for all probe definitions (without path restriction)
rg -n 'livenessProbe|readinessProbe|startupProbe' --type=go --type=yaml --type=json 2>/dev/null || trueLength of output: 881
🏁 Script executed:
#!/bin/bash
# Extract probe definitions with paths from key files
echo "=== deploy/operator.yaml probes ==="
rg -n -A5 'readinessProbe:|livenessProbe:|startupProbe:' deploy/operator.yaml | head -40
echo ""
echo "=== pkg/bundle/deploy.go manager probes (lines 6250-6270) ==="
sed -n '6248,6275p' pkg/bundle/deploy.goLength of output: 1443
🏁 Script executed:
#!/bin/bash
# Check for other health check additions in manager.go
echo "=== Health check registrations in manager.go ==="
rg -n 'Add.*Check' pkg/operator/manager.go -A2
echo ""
echo "=== Startup probe paths across all files ==="
rg -n -B2 -A2 'startupProbe:' deploy/ pkg/bundle/deploy.go 2>/dev/null | grep -A4 'path:'Length of output: 503
Fix livenessProbe paths and resolve startup probe inconsistency
The probe configurations violate controller-runtime best practices and create misalignment:
- livenessProbe paths are wrong: Both
deploy/operator.yaml(line 71) andpkg/bundle/deploy.go(line 6263) incorrectly use/readyzinstead of/healthz. This masks unhealthy-but-ready states. - startupProbe inconsistency:
deploy/operator.yaml(line 79) uses/readyz, butpkg/bundle/deploy.go(line 6268) uses/healthz. - Missing check registrations: Only
AddReadyzCheckis registered in manager.go (line 103). Liveness and startup checks are not registered in code despite probes expecting them.
Correct per controller-runtime pattern: livenessProbe → /healthz, readinessProbe/startupProbe → /readyz. Fix probe paths in both manifest files, and register liveness/startup checks in manager.go.
🤖 Prompt for AI Agents
In pkg/operator/manager.go around lines 103 to 106, only AddReadyzCheck is
registered; register the missing liveness and startup checks and use the
controller-runtime health endpoints: add mgr.AddHealthzCheck("healthz",
healthz.Ping) for the liveness check and mgr.AddStartupCheck("startup",
healthz.Ping) (or AddStartUpCheck if using that naming in your
controller-runtime version) for the startup check alongside the existing
AddReadyzCheck("readyz", healthz.Ping). Also update probe paths in the manifests
so livenessProbe -> /healthz and readinessProbe/startupProbe -> /readyz in both
deploy/operator.yaml (lines ~71 and ~79) and pkg/bundle/deploy.go (around lines
~6263 and ~6268) to match the registered checks.
c804893 to
98afab0
Compare
Signed-off-by: Oded Viner <[email protected]>
Tested locally on my machine:
Explain the changes
Issues: Fixed #xxx / Gap #xxx
Testing Instructions:
Summary by CodeRabbit
New Features
Chores