Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
26 changes: 26 additions & 0 deletions deploy/operator.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -42,6 +42,9 @@ spec:
- name: noobaa-operator
image: NOOBAA_OPERATOR_IMAGE
terminationMessagePolicy: FallbackToLogsOnError
ports:
- name: healthz
containerPort: 8081
volumeMounts:
- name: bound-sa-token
mountPath: /var/run/secrets/openshift/serviceaccount
Expand All @@ -51,6 +54,29 @@ spec:
# SHOULD BE RETURNED ONCE COSI IS BACK
# - name: socket
# mountPath: /var/lib/cosi
readinessProbe:
httpGet:
path: /readyz
port: healthz
initialDelaySeconds: 5
periodSeconds: 10
timeoutSeconds: 5
failureThreshold: 3
livenessProbe:
httpGet:
path: /readyz
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should all probes be the same?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we only perform ping, I suppose it's okay for them to be set to the same path. In the future, we might want to use more advanced probes, but for now, it's enough.

port: healthz
initialDelaySeconds: 15
periodSeconds: 10
timeoutSeconds: 5
failureThreshold: 3
startupProbe:
Comment on lines +57 to +73
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Verification agent

🧩 Analysis chain

Use /healthz for liveness; /readyz for readiness (current YAML uses /readyz for liveness).
This avoids killing a live process just because it isn’t ready yet and matches the PR description.

           readinessProbe:
             httpGet:
               path: /readyz
               port: healthz
             initialDelaySeconds: 5
             periodSeconds: 10
             timeoutSeconds: 5
             failureThreshold: 3
           livenessProbe:
             httpGet:
-              path: /readyz
+              path: /healthz
               port: healthz
             initialDelaySeconds: 15
             periodSeconds: 10
             timeoutSeconds: 5
             failureThreshold: 3

🏁 Script executed:

#!/usr/bin/env bash
set -e
# Locate the operator.yaml file
OP_FILE=$(fd --type f operator.yaml || true)
echo "Found file: $OP_FILE"
[ -n "$OP_FILE" ] || { echo "operator.yaml not found"; exit 1; }
# Print lines 60–76 for verification
sed -n '60,76p' "$OP_FILE"

Length of output: 698


Use /healthz for livenessProbe
Swap the livenessProbe HTTP path to /healthz (keep readinessProbe on /readyz) to avoid killing a live process before it’s ready.

🤖 Prompt for AI Agents
In deploy/operator.yaml around lines 60 to 76, the livenessProbe is using the
/readyz path which can cause a live process to be killed before it is fully
ready; change the livenessProbe httpGet.path value from /readyz to /healthz
while leaving the readinessProbe on /readyz (no other probe changes required).

httpGet:
path: /readyz
port: healthz
periodSeconds: 10
timeoutSeconds: 5
failureThreshold: 30
resources:
limits:
cpu: "250m"
Expand Down
27 changes: 26 additions & 1 deletion pkg/bundle/deploy.go
Original file line number Diff line number Diff line change
Expand Up @@ -6231,6 +6231,9 @@ spec:
- name: noobaa-operator
image: NOOBAA_OPERATOR_IMAGE
terminationMessagePolicy: FallbackToLogsOnError
ports:
- name: healthz
containerPort: 8081
volumeMounts:
- name: bound-sa-token
mountPath: /var/run/secrets/openshift/serviceaccount
Expand All @@ -6240,6 +6243,29 @@ spec:
# SHOULD BE RETURNED ONCE COSI IS BACK
# - name: socket
# mountPath: /var/lib/cosi
readinessProbe:
httpGet:
path: /readyz
port: healthz
initialDelaySeconds: 5
periodSeconds: 10
timeoutSeconds: 5
failureThreshold: 3
livenessProbe:
httpGet:
path: /readyz
port: healthz
initialDelaySeconds: 15
periodSeconds: 10
timeoutSeconds: 5
failureThreshold: 3
startupProbe:
Comment on lines +6255 to +6262
Copy link

@coderabbitai coderabbitai bot Sep 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Liveness probe should hit /healthz, not /readyz.

controller-runtime exposes both /healthz and /readyz; liveness is expected to target /healthz. Pointing liveness to /readyz can mask unhealthy-but-ready states. Switch the path to /healthz.

           livenessProbe:
             httpGet:
-              path: /readyz
+              path: /healthz
               port: healthz
             initialDelaySeconds: 15
             periodSeconds: 10
             timeoutSeconds: 5
             failureThreshold: 3
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
httpGet:
path: /readyz
port: healthz
initialDelaySeconds: 15
periodSeconds: 10
timeoutSeconds: 5
failureThreshold: 3
startupProbe:
httpGet:
path: /healthz
port: healthz
initialDelaySeconds: 15
periodSeconds: 10
timeoutSeconds: 5
failureThreshold: 3
startupProbe:
🤖 Prompt for AI Agents
In pkg/bundle/deploy.go around lines 6262 to 6269 the livenessProbe httpGet path
is set to /readyz but should target /healthz; change the livenessProbe httpGet
path value from "/readyz" to "/healthz" so the liveness check uses the
controller-runtime health endpoint and does not mask unhealthy-but-ready states.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@OdedViner Thanks! Confirming the liveness probe now points to /healthz. As a quick sanity check, you can verify the live settings on a running pod with:
kubectl -n noobaa get deploy noobaa-operator -o jsonpath='{.spec.template.spec.containers[?(@.name=="noobaa-operator")].livenessProbe.httpGet.path}{"\n"}'

If there are multiple definitions of the operator Deployment (e.g., embedded YAML and any generated manifests), make sure they’re all aligned: liveness=/healthz and readiness/startup=/readyz. Nice work.


✏️ Learnings added
Learnt from: OdedViner
PR: noobaa/noobaa-operator#1684
File: pkg/bundle/deploy.go:6259-6266
Timestamp: 2025-10-26T14:17:07.989Z
Learning: For controller-runtime operators, use livenessProbe path /healthz and readiness/startup path /readyz (port :8081) to avoid masking unhealthy-but-ready states.

Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.


If you found this review helpful, would you consider giving us a shout-out on X?

Thank you for using CodeRabbit!

httpGet:
path: /readyz
port: healthz
periodSeconds: 10
timeoutSeconds: 5
failureThreshold: 30
resources:
limits:
cpu: "250m"
Expand Down Expand Up @@ -6923,4 +6949,3 @@ metadata:
app: prometheus-adapter
name: custom-metrics-prometheus-adapter
`

16 changes: 16 additions & 0 deletions pkg/operator/manager.go
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,7 @@ import (

"github.com/operator-framework/operator-lib/leader"
"sigs.k8s.io/controller-runtime/pkg/cache"
"sigs.k8s.io/controller-runtime/pkg/healthz"
"sigs.k8s.io/controller-runtime/pkg/manager"
"sigs.k8s.io/controller-runtime/pkg/manager/signals"
metricsServer "sigs.k8s.io/controller-runtime/pkg/metrics/server"
Expand All @@ -44,6 +45,11 @@ func RunOperator(cmd *cobra.Command, args []string) {
util.InitLogger(logrus.DebugLevel)
}
version.RunVersion(cmd, args)
// Probe address from CLI flag (defaults to :8081)
probeAddr := os.Getenv("HEALTH_PROBE_BIND_ADDRESS")
if probeAddr == "" {
probeAddr = ":8081"
}

config := util.KubeConfig()

Expand All @@ -66,6 +72,7 @@ func RunOperator(cmd *cobra.Command, args []string) {
Metrics: metricsServer.Options{
BindAddress: fmt.Sprintf("%s:%d", metricsHost, metricsPort),
},
HealthProbeBindAddress: probeAddr, // Serve /healthz and /readyz here
})
if err != nil {
log.Fatalf("Failed to create manager: %s", err)
Expand Down Expand Up @@ -97,6 +104,15 @@ func RunOperator(cmd *cobra.Command, args []string) {
log.Fatalf("Failed AddToClusterScopedManager: %s", err)
}

// Register health and readiness endpoints on mgr
if err := mgr.AddHealthzCheck("healthz", healthz.Ping); err != nil {
log.Fatalf("Failed to add health check: %s", err)
}

if err := mgr.AddReadyzCheck("readyz", healthz.Ping); err != nil {
log.Fatalf("Failed to add readiness check: %s", err)
}

Comment on lines +112 to +115
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

Ready check uses Ping; OK, but ensure probes hit /readyz

Using healthz.Ping is fine. Since healthz was removed per discussion, verify liveness and startup probes in manifests target /readyz too.

To verify across the repo:


🏁 Script executed:

#!/bin/bash
# Confirm flag presence and probe paths
rg -n --type=go -C2 'health-probe-bind-address'
rg -n -C3 '(livenessProbe|readinessProbe|startupProbe).*path' deploy pkg/bundle 2>/dev/null || true

Length of output: 272


🏁 Script executed:

#!/bin/bash
# Search for all probe definitions (without path restriction)
rg -n 'livenessProbe|readinessProbe|startupProbe' --type=go --type=yaml --type=json 2>/dev/null || true

Length of output: 881


🏁 Script executed:

#!/bin/bash
# Extract probe definitions with paths from key files
echo "=== deploy/operator.yaml probes ==="
rg -n -A5 'readinessProbe:|livenessProbe:|startupProbe:' deploy/operator.yaml | head -40

echo ""
echo "=== pkg/bundle/deploy.go manager probes (lines 6250-6270) ==="
sed -n '6248,6275p' pkg/bundle/deploy.go

Length of output: 1443


🏁 Script executed:

#!/bin/bash
# Check for other health check additions in manager.go
echo "=== Health check registrations in manager.go ==="
rg -n 'Add.*Check' pkg/operator/manager.go -A2

echo ""
echo "=== Startup probe paths across all files ==="
rg -n -B2 -A2 'startupProbe:' deploy/ pkg/bundle/deploy.go 2>/dev/null | grep -A4 'path:'

Length of output: 503


Fix livenessProbe paths and resolve startup probe inconsistency

The probe configurations violate controller-runtime best practices and create misalignment:

  • livenessProbe paths are wrong: Both deploy/operator.yaml (line 71) and pkg/bundle/deploy.go (line 6263) incorrectly use /readyz instead of /healthz. This masks unhealthy-but-ready states.
  • startupProbe inconsistency: deploy/operator.yaml (line 79) uses /readyz, but pkg/bundle/deploy.go (line 6268) uses /healthz.
  • Missing check registrations: Only AddReadyzCheck is registered in manager.go (line 103). Liveness and startup checks are not registered in code despite probes expecting them.

Correct per controller-runtime pattern: livenessProbe → /healthz, readinessProbe/startupProbe → /readyz. Fix probe paths in both manifest files, and register liveness/startup checks in manager.go.

🤖 Prompt for AI Agents
In pkg/operator/manager.go around lines 103 to 106, only AddReadyzCheck is
registered; register the missing liveness and startup checks and use the
controller-runtime health endpoints: add mgr.AddHealthzCheck("healthz",
healthz.Ping) for the liveness check and mgr.AddStartupCheck("startup",
healthz.Ping) (or AddStartUpCheck if using that naming in your
controller-runtime version) for the startup check alongside the existing
AddReadyzCheck("readyz", healthz.Ping). Also update probe paths in the manifests
so livenessProbe -> /healthz and readinessProbe/startupProbe -> /readyz in both
deploy/operator.yaml (lines ~71 and ~79) and pkg/bundle/deploy.go (around lines
~6263 and ~6268) to match the registered checks.

util.Panic(mgr.Add(manager.RunnableFunc(func(ctx context.Context) error {
system.RunOperatorCreate(cmd, args)
return nil
Expand Down