Skip to content

Conversation

@OdedViner
Copy link

@OdedViner OdedViner commented Aug 21, 2025

  • integrate controller-runtime healthz and readyz endpoints into the namespace-scoped manager
  • expose probe server on :8081 and register default checks
  • update operator Deployment with liveness, readiness, and startup probes

Tested locally on my machine:

1.Create a private image and push to quay.io:
$ make all
$ docker tag noobaa/noobaa-operator:5.20.0 quay.io/oviner/noobaa-operator:liveness1
$ docker push quay.io/oviner/noobaa-operator:liveness1

2.Create minikube:
$ minikube delete --all
$ minikube config set memory 8000
$ minikube config set cpus 5
$ minikube start

3.create noobaa-cli
$ make cli
oviner~/DEV_REPOS/noobaa-operator/build/_output/bin(liveness_probe)$ ./noobaa-operator-local version
INFO[0000] CLI version: 5.20.0                          
INFO[0000] noobaa-image: noobaa/noobaa-core:master-20250521 
INFO[0000] operator-image: noobaa/noobaa-operator:5.20.0 

4.Deploy Noobaa-operator
oviner~/DEV_REPOS/noobaa-operator/build/_output/bin(liveness_probe)$ ./noobaa-operator-local install --dev --operator-image='quay.io/oviner/noobaa-operator:liveness1'
$ kubectl get pods
NAME                                               READY   STATUS    RESTARTS   AGE
cnpg-controller-manager-bc6cc5889-wl8n8            1/1     Running   0          5m37s
noobaa-core-0                                      2/2     Running   0          3m51s
noobaa-db-pg-cluster-1                             1/1     Running   0          4m43s
noobaa-db-pg-cluster-2                             1/1     Running   0          4m3s
noobaa-default-backing-store-noobaa-pod-9c730ee1   1/1     Running   0          2m38s
noobaa-endpoint-66499bd857-25tjn                   1/1     Running   0          2m51s
noobaa-operator-7fdbdbdfb6-6t5pq                   1/1     Running   0          5m37s

$ kubectl get pods noobaa-operator-7fdbdbdfb6-6t5pq -o yaml| grep oviner
    image: quay.io/oviner/noobaa-operator:liveness1
    image: quay.io/oviner/noobaa-operator:liveness1
    imageID: docker-pullable://quay.io/oviner/noobaa-operator@sha256:e46e35f954f8722db16283ad36d7b742dec76875e2f2d1fcf8c39490373c29b2

$ kubectl get pods noobaa-operator-7fdbdbdfb6-6t5pq -o yaml| grep -i liveness -C 4
    livenessProbe:
      failureThreshold: 3
      httpGet:
        path: /healthz
        port: healthz


5. Force Deployment to Single Pod (no overlap rollouts)
$ kubectl patch deploy noobaa-operator --type='json' -p='[
  {"op":"remove","path":"/spec/strategy/rollingUpdate"},
  {"op":"replace","path":"/spec/strategy/type","value":"Recreate"}
]'
deployment.apps/noobaa-operator patched

$ kubectl scale deploy noobaa-operator --replicas=0
deployment.apps/noobaa-operator scaled

$ kubectl wait --for=delete pod -l noobaa-operator=deployment --timeout=90s || true

$ kubectl scale deploy noobaa-operator --replicas=1
deployment.apps/noobaa-operator scaled

$ kubectl rollout status deploy/noobaa-operator
Waiting for deployment "noobaa-operator" rollout to finish: 0 of 1 updated replicas are available...
deployment "noobaa-operator" successfully rolled out

$ kubectl get pods noobaa-operator-7fdbdbdfb6-2wkss
NAME                               READY   STATUS    RESTARTS   AGE
noobaa-operator-7fdbdbdfb6-2wkss   1/1     Running   0          2m2s

6 Break Liveness Probe:
$ kubectl patch deploy noobaa-operator --type='json' -p='[
  {"op":"replace","path":"/spec/template/spec/containers/0/livenessProbe/httpGet/path","value":"/bad"},
  {"op":"replace","path":"/spec/template/spec/containers/0/livenessProbe/initialDelaySeconds","value":5},
  {"op":"replace","path":"/spec/template/spec/containers/0/livenessProbe/periodSeconds","value":5},
  {"op":"replace","path":"/spec/template/spec/containers/0/livenessProbe/failureThreshold","value":1}
]'
deployment.apps/noobaa-operator patched

7.Observe Pod Restarts:
$ kubectl get pods noobaa-operator-7dff47889d-9b2kt 
NAME                               READY   STATUS             RESTARTS     AGE
noobaa-operator-7dff47889d-9b2kt   0/1     CrashLoopBackOff   4 (7s ago)   2m2s

8. Restore Healthy Liveness Probe
$ kubectl patch deploy noobaa-operator --type='json' -p='[
  {"op":"replace","path":"/spec/template/spec/containers/0/livenessProbe/httpGet/path","value":"/healthz"},
  {"op":"replace","path":"/spec/template/spec/containers/0/livenessProbe/httpGet/port","value":"healthz"},
  {"op":"replace","path":"/spec/template/spec/containers/0/livenessProbe/initialDelaySeconds","value":15},
  {"op":"replace","path":"/spec/template/spec/containers/0/livenessProbe/periodSeconds","value":10},
  {"op":"replace","path":"/spec/template/spec/containers/0/livenessProbe/failureThreshold","value":3}
]'
deployment.apps/noobaa-operator patched

$ kubectl get pods noobaa-operator-7fdbdbdfb6-75mcw
NAME                               READY   STATUS    RESTARTS   AGE
noobaa-operator-7fdbdbdfb6-75mcw   1/1     Running   0          2m7s

Explain the changes

Issues: Fixed #xxx / Gap #xxx

Testing Instructions:

  • Doc added/updated
  • Tests added

Summary by CodeRabbit

  • New Features

    • Added readiness, liveness, and startup health endpoints and exposed a dedicated health port for external monitoring and faster issue detection.
    • Operator now binds a configurable health-probe address to serve health checks.
  • Chores

    • Updated operator memory defaults to requests: 256Mi and limits: 512Mi for improved stability.

@coderabbitai
Copy link

coderabbitai bot commented Aug 21, 2025

Walkthrough

Wires controller-runtime health probe address into the manager, registers healthz and readyz checks, and updates the operator Deployment and embedded bundle YAML to expose a healthz port, add startup/readiness/liveness probes, and set memory requests/limits.

Changes

Cohort / File(s) Summary
Operator Deployment YAML & embedded bundle
deploy/operator.yaml, pkg/bundle/deploy.go
Adds container port healthz: 8081, adds readinessProbe, livenessProbe, and startupProbe targeting /readyz on port healthz with specified timings/thresholds, and adds container memory requests: "256Mi" and limits: "512Mi".
Manager health probe wiring
pkg/operator/manager.go
Adds binding of health probe address (from env/default :8081) to manager.Options, and registers health checks via mgr.AddHealthzCheck("healthz", healthz.Ping) and mgr.AddReadyzCheck("readyz", healthz.Ping) with error handling.

Sequence Diagram(s)

sequenceDiagram
    autonumber
    participant KS as Kubernetes
    participant Pod as Operator Pod
    participant Mgr as controller-runtime Manager
    participant HP as Health Endpoint (:8081 /readyz)
    participant Kubelet as Kubelet Probes

    KS->>Pod: Start operator container (probe port exposed)
    Pod->>Mgr: Start manager with HealthProbeBindAddress=":8081"
    Mgr->>HP: Serve /healthz and /readyz endpoints
    Mgr->>Mgr: Register healthz and readyz checks (healthz.Ping)

    rect rgba(230,245,255,0.5)
      Note over Kubelet,HP: Startup probe (periodSeconds:10, failureThreshold:30)
      Kubelet->>HP: GET /readyz
      HP-->>Kubelet: 200 / non-200
    end

    rect rgba(240,255,240,0.5)
      Note over Kubelet,HP: Readiness probe (initialDelay:5, periodSeconds:10)
      loop every 10s
        Kubelet->>HP: GET /readyz
        HP-->>Kubelet: 200 / non-200
      end
    end

    rect rgba(255,245,230,0.5)
      Note over Kubelet,HP: Liveness probe (initialDelay:15, periodSeconds:10)
      loop every 10s
        Kubelet->>HP: GET /readyz
        HP-->>Kubelet: 200 / non-200 (restart on failure threshold)
      end
    end
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

  • Verify embedded YAML in pkg/bundle/deploy.go exactly matches deploy/operator.yaml (quoting/indentation).
  • Confirm manager uses the correct probe bind address source and that env/default logic is correct.
  • Ensure probe paths/ports and timings align between manager endpoints and Deployment probes.

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Description Check ⚠️ Warning The PR description does not follow the required template structure properly. While the author provided valuable content—including a bulleted explanation of the changes and comprehensive local testing steps with output—the three main template sections are largely empty placeholders showing only "1." The "Explain the changes" and "Testing Instructions" sections should contain substantive content, and the "Issues" section is empty. Although the information needed is present in the PR (explanation via bullet points and testing evidence), it is not positioned within the required template sections, which means the description does not meet the expected format and completeness criteria.
✅ Passed checks (2 passed)
Check name Status Explanation
Title Check ✅ Passed The PR title "add liveness, readiness, and startup probes to noobaa-operator" is concise, specific, and directly reflects the main changes in the changeset. The title clearly summarizes what was added—three types of Kubernetes health probes for the noobaa-operator container—which aligns with the actual modifications across deploy/operator.yaml, pkg/operator/manager.go, and pkg/bundle/deploy.go. The title is neither vague nor misleading, and a teammate reviewing the git history would immediately understand the primary purpose of this change.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

📜 Recent review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between c804893 and befe1de.

📒 Files selected for processing (3)
  • deploy/operator.yaml (2 hunks)
  • pkg/bundle/deploy.go (2 hunks)
  • pkg/operator/manager.go (4 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
  • pkg/operator/manager.go
  • pkg/bundle/deploy.go
🧰 Additional context used
🧠 Learnings (4)
📓 Common learnings
Learnt from: OdedViner
PR: noobaa/noobaa-operator#1684
File: pkg/bundle/deploy.go:6259-6266
Timestamp: 2025-10-26T14:17:07.989Z
Learning: For controller-runtime operators, use livenessProbe path /healthz and readiness/startup path /readyz (port :8081) to avoid masking unhealthy-but-ready states.
📚 Learning: 2025-10-26T14:17:07.989Z
Learnt from: OdedViner
PR: noobaa/noobaa-operator#1684
File: pkg/bundle/deploy.go:6259-6266
Timestamp: 2025-10-26T14:17:07.989Z
Learning: For controller-runtime operators, use livenessProbe path /healthz and readiness/startup path /readyz (port :8081) to avoid masking unhealthy-but-ready states.

Applied to files:

  • deploy/operator.yaml
📚 Learning: 2025-09-16T11:55:50.673Z
Learnt from: tangledbytes
PR: noobaa/noobaa-operator#1702
File: deploy/warp/warp-job.yaml:9-16
Timestamp: 2025-09-16T11:55:50.673Z
Learning: In the noobaa-operator warp benchmark feature, the warp-job.yaml template serves as a base that gets modified by the CLI code in pkg/bench/warp.go. The CLI dynamically overwrites the container image, environment variables, and args fields, but does not modify security context, resource limits, or TTL settings.

Applied to files:

  • deploy/operator.yaml
📚 Learning: 2025-07-14T10:59:21.959Z
Learnt from: Neon-White
PR: noobaa/noobaa-operator#1587
File: pkg/system/reconciler.go:287-291
Timestamp: 2025-07-14T10:59:21.959Z
Learning: NooBaa operator design constraint: Multiple NooBaa instances are not supported in the same namespace, only across different namespaces. This means hard-coded resource names like ConfigMaps are acceptable within a namespace scope.

Applied to files:

  • deploy/operator.yaml
🔇 Additional comments (2)
deploy/operator.yaml (2)

45-47: Port configuration looks good.

The healthz port is properly exposed on 8081 and correctly named for probe references.


57-64: Readiness and startup probes are well-configured.

The /readyz path, timing defaults, and failure thresholds are appropriate for detecting operator readiness and startup completion.

Also applies to: 73-79


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (5)
pkg/operator/manager.go (3)

48-50: Harden flag parsing: fall back to a sane default and log.

If the flag isn’t present (e.g., invoked via the wrong subcommand) or empty, you’ll end up setting an empty HealthProbeBindAddress, which disables the probe server. Add a defensive fallback.

- // Probe address from CLI flag (defaults to :8081)
- probeAddr, _ := cmd.Flags().GetString("health-probe-bind-address")
+ // Probe address from CLI flag; fall back to default if missing
+ probeAddr, err := cmd.Flags().GetString("health-probe-bind-address")
+ if err != nil || probeAddr == "" {
+   probeAddr = ":8081"
+   log.Infof("health-probe-bind-address not set; defaulting to %s", probeAddr)
+ }

72-73: Health probe server bound only on the namespace-scoped manager. Confirm this is intentional.

Only mgr has HealthProbeBindAddress; cmgr does not. If the cluster-scoped manager wedges, probes will still report healthy. If that’s acceptable, ignore. Otherwise consider proxying a simple cmgr health indicator into mgr’s checks.


104-112: Consider a stronger readiness check (cache sync) instead of Ping-only.

healthz.Ping always returns success. For readiness, it’s common to gate on informer cache sync to avoid advertising Ready before controllers are usable.

Example:

// after mgr is created
if err := mgr.AddReadyzCheck("cache-sync", func(_ *http.Request) error {
    if mgr.GetCache().WaitForCacheSync(util.Context()) {
        return nil
    }
    return fmt.Errorf("caches not yet synced")
}); err != nil {
    log.Fatalf("Failed to add readiness check: %s", err)
}

If blocking per request is a concern, you can set a one-time “synced” atomic flag from a Runnable that waits on cache sync and have the checker read that flag.

deploy/operator.yaml (2)

48-49: Fix indentation for ports block to satisfy yamllint and avoid accidental structural errors.

Yamllint warns: expected indent 10 but found 12. Align ports with args/volumeMounts under the container spec.

-          ports:
-            - name: healthz
+          ports:
+          - name: healthz
             containerPort: 8081

You can validate with:

  • yamllint deploy/operator.yaml
  • kubectl apply --dry-run=client -f deploy/operator.yaml

59-67: Readiness probe looks good.

/readyz on the named port is correct. Timings are reasonable for controller-runtime defaults. If you adopt cache-sync readiness, /readyz will naturally gate until caches are synced.

📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between e0a1a8d and e07011f.

📒 Files selected for processing (3)
  • deploy/operator.yaml (2 hunks)
  • pkg/operator/manager.go (4 hunks)
  • pkg/operator/operator.go (1 hunks)
🧰 Additional context used
🪛 YAMLlint (1.37.1)
deploy/operator.yaml

[error] 45-45: trailing spaces

(trailing-spaces)


[warning] 48-48: wrong indentation: expected 10 but found 12

(indentation)

🔇 Additional comments (3)
pkg/operator/manager.go (1)

27-27: Import of controller-runtime healthz is appropriate.

This is the correct package for registering /healthz and /readyz checks.

deploy/operator.yaml (2)

68-75: Liveness probe is sensible.

/healthz with a modest initialDelaySeconds is a standard setup for controller-runtime. No concerns.


76-81: Startup probe configuration is conservative and safe.

failureThreshold 30 with periodSeconds 10 gives up to 5 minutes for cold starts and image pulls. This helps prevent premature restarts.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between e07011f and c6acf02.

📒 Files selected for processing (3)
  • deploy/operator.yaml (2 hunks)
  • pkg/operator/manager.go (4 hunks)
  • pkg/operator/operator.go (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
  • pkg/operator/operator.go
  • pkg/operator/manager.go
🧰 Additional context used
🪛 YAMLlint (1.37.1)
deploy/operator.yaml

[error] 45-45: trailing spaces

(trailing-spaces)


[error] 46-46: trailing spaces

(trailing-spaces)


[warning] 49-49: wrong indentation: expected 10 but found 12

(indentation)

🔇 Additional comments (2)
deploy/operator.yaml (2)

60-82: LGTM! Health probe configuration looks good.

The probe configuration is well-structured:

  • Readiness probe uses /readyz endpoint with a short initial delay (5s)
  • Liveness probe uses /healthz endpoint with appropriate initial delay (15s)
  • Startup probe provides sufficient time for initialization (300s total with 30 failures × 10s)
  • All probes use consistent timeout and period settings

The probe timings follow Kubernetes best practices for operator workloads.


52-54: Approved: bound-sa-token mount can safely remain readOnly

I searched the codebase for any write operations targeting /var/run/secrets/openshift/serviceaccount (including Go standard library file writes) and found only references for reading the token file—no code writes to this path. Therefore, mounting the service account token as readOnly: true is safe and aligns with security best practices.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (3)
deploy/operator.yaml (3)

44-47: Good: explicit "run" subcommand included.

This addresses the earlier issue where the manager wouldn't start without "run". With the whitespace fix above, the CLI will parse as intended.


48-50: Fix ports block indentation to conform with lint rules.

Yamllint expects the dash items to align directly under the key. Apply:

-          ports:
-            - name: healthz
-              containerPort: 8081
+          ports:
+          - name: healthz
+            containerPort: 8081

44-47: Trailing spaces in args will break subcommand parsing; fix and align list indentation to satisfy yamllint.

The values "operator␠␠" and "run␠␠␠␠␠␠␠" include trailing spaces, which become part of the argv and will not match Cobra subcommands. Also, yamllint flags the sequence indentation under args. Apply:

-          args:
-            - operator  
-            - run       
-            - --health-probe-bind-address=:8081
+          args:
+          - operator
+          - run
+          - --health-probe-bind-address=:8081
🧹 Nitpick comments (1)
deploy/operator.yaml (1)

68-75: When using startupProbe, drop liveness initialDelaySeconds to avoid redundant delay.

Kubelet suppresses liveness until startup passes, so initialDelaySeconds on liveness is unnecessary and can delay recovery after restarts. Suggest:

           livenessProbe:
             httpGet:
               path: /healthz
               port: healthz
-            initialDelaySeconds: 15
             periodSeconds: 10
             timeoutSeconds: 5
             failureThreshold: 3
📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between c6acf02 and 63cf177.

📒 Files selected for processing (3)
  • deploy/operator.yaml (2 hunks)
  • pkg/operator/manager.go (4 hunks)
  • pkg/operator/operator.go (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
  • pkg/operator/operator.go
  • pkg/operator/manager.go
🧰 Additional context used
🪛 YAMLlint (1.37.1)
deploy/operator.yaml

[warning] 45-45: wrong indentation: expected 10 but found 12

(indentation)


[error] 45-45: trailing spaces

(trailing-spaces)


[error] 46-46: trailing spaces

(trailing-spaces)


[warning] 49-49: wrong indentation: expected 10 but found 12

(indentation)

🔇 Additional comments (3)
deploy/operator.yaml (3)

60-67: Readiness probe looks correct and matches controller-runtime defaults.

/readyz on the named port is standard; timings are reasonable.


76-82: Startup probe parameters are sane.

/healthz with failureThreshold: 30 and period: 10s gives up to ~5 minutes for the operator to come up, which is acceptable for CRD installs/migrations.


44-82: Health/Readiness Probes Configuration Verified

All checks confirm that the operator binary and manifest are correctly configured:

  • The --health-probe-bind-address flag is defined with default :8081 in pkg/operator/operator.go:37 and consumed in pkg/operator/manager.go:49.
  • The health and readiness endpoints are registered via mgr.AddHealthzCheck("healthz", healthz.Ping) and mgr.AddReadyzCheck("readyz", healthz.Ping) in pkg/operator/manager.go:105,109.
  • No other container in the manifest binds port 8081, and the probe in deploy/operator.yaml:47–50 matches these settings.

No changes are required.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (2)
deploy/operator.yaml (2)

44-47: Trim trailing spaces and fix args indentation (yamllint errors).

YAMLlint flags trailing spaces on Lines 45–46 and wrong indentation on Line 45. Reduce the list item indent by 2 and remove trailing spaces so CI passes and the manifest parses consistently.

Apply:

           args:
-            - operator  
-            - run       
-            - --health-probe-bind-address=:8081
+          - operator
+          - run
+          - --health-probe-bind-address=:8081

Run to verify:

#!/bin/bash
pip install -q yamllint
yamllint -d "{extends: default, rules: {line-length: disable}}" deploy/operator.yaml

48-50: Align ports list indentation with container-level keys.

List item under ports is over-indented (yamllint: expected 10, found 12). Align the dash with two spaces more than the ports key.

Apply:

           ports:
-            - name: healthz
-              containerPort: 8081
+          - name: healthz
+            containerPort: 8081

Re-run yamllint (same command as above) to confirm no indentation warnings.

🧹 Nitpick comments (5)
pkg/operator/manager.go (2)

72-72: Health probes attached only to the namespace-scoped manager

You bind the probe server only on mgr (namespace-scoped). If the cluster-scoped manager (cmgr) fails to start or crashes, the pod’s readiness/liveness may still report healthy because the probe targets only mgr. If that’s intentional, ignore; otherwise consider one of:

  • Also enabling a probe server for cmgr (e.g., on a sibling port like :8082) and wiring probes for it.
  • Or keep a single probe server (on mgr) but add a readiness check that reflects cmgr health (e.g., flip a flag once cmgr caches sync or controllers start).

If you prefer the second approach, I can sketch a checker that reports NotReady until cmgr signals it’s up.


104-111: Readiness uses Ping only; consider gating on cache sync to avoid false positives

Registering healthz.Ping for both health and readiness works, but readiness will go green as soon as the HTTP server is up, not when informers/controllers are actually ready. Consider adding a cache-sync readiness check so /readyz turns 200 only after caches synchronize.

Example change (adds a cache-sync runnable and checker):

  // Register health and readiness endpoints on mgr
  if err := mgr.AddHealthzCheck("healthz", healthz.Ping); err != nil {
    log.Fatalf("Failed to add health check: %s", err)
  }

- if err := mgr.AddReadyzCheck("readyz", healthz.Ping); err != nil {
-   log.Fatalf("Failed to add readiness check: %s", err)
- }
+ // Ready when caches have synced
+ if err := mgr.AddReadyzCheck("readyz", healthz.Ping); err != nil {
+   log.Fatalf("Failed to add readiness ping: %s", err)
+ }
+ if err := mgr.AddReadyzCheck("caches", cacheReadyChecker()); err != nil {
+   log.Fatalf("Failed to add cache readiness check: %s", err)
+ }

Add this helper (outside the shown range), which uses a one-time sync and a fast check per request:

// at file top-level:
var cachesSynced atomic.Bool

// near other runnables, before starting managers:
util.Panic(mgr.Add(manager.RunnableFunc(func(ctx context.Context) error {
  if ok := mgr.GetCache().WaitForCacheSync(ctx); !ok {
    return fmt.Errorf("cache sync failed")
  }
  cachesSynced.Store(true)
  <-ctx.Done()
  return nil
})))

// helper:
func cacheReadyChecker() healthz.Checker {
  return func(_ *http.Request) error {
    if !cachesSynced.Load() {
      return fmt.Errorf("caches not yet synced")
    }
    return nil
  }
}

Note: you’ll need import "net/http" and import "sync/atomic" for the helper.

pkg/bundle/deploy.go (2)

6251-6273: Probe set is sensible; consider strengthening readyz semantics.

  • The readiness, liveness, and startup probes use /readyz and /healthz on the dedicated probe server. Timing defaults (readiness=5s initial delay, liveness=15s, startup=30×10s) look balanced for controller-runtime operators.
  • One potential gap: if your manager registers readyz with healthz.Ping, readiness will always succeed once the probe server is up (it doesn’t wait for caches/leader election). If that’s your current implementation, consider a readiness check that flips true only after caches are synced (or after leader election if that’s your intended signal).

Example (Go) of a non-blocking cache-sync readiness gate:

// after mgr is created:
var cachesSynced atomic.Bool
go func() {
    // Wait until the manager starts and caches sync once
    <-mgr.Elected() // optional: if leader election enabled
    if ok := mgr.GetCache().WaitForCacheSync(context.Background()); ok {
        cachesSynced.Store(true)
    }
}()

// register probes
_ = mgr.AddHealthzCheck("healthz", healthz.Ping)
_ = mgr.AddReadyzCheck("cache-sync", func(_ *http.Request) error {
    if cachesSynced.Load() {
        return nil
    }
    return fmt.Errorf("caches not yet synced")
})

This avoids reporting Pods as Ready before controllers can actually reconcile.

If you’ve already wired a stronger readyz check in manager.go, feel free to ignore; otherwise I can open a follow-up PR with this change.


6251-6273: Nit: explicitly set probe scheme for forward-compatibility (optional).

Kubelet defaults HTTP for httpGet, but being explicit can prevent surprises if defaults change or if TLS is enabled later:

 readinessProbe:
   httpGet:
     path: /readyz
-    port: healthz
+    port: healthz
+    scheme: HTTP
 ...
 livenessProbe:
   httpGet:
     path: /healthz
-    port: healthz
+    port: healthz
+    scheme: HTTP
 ...
 startupProbe:
   httpGet:
     path: /healthz
-    port: healthz
+    port: healthz
+    scheme: HTTP
deploy/operator.yaml (1)

60-82: Probes wiring looks correct; optional tuning with startupProbe present.

Endpoints and named port resolve properly. With startupProbe enabled, initialDelaySeconds on liveness/readiness is redundant; kubelet suppresses them until startup succeeds. Consider removing the initial delays to simplify timing semantics.

Optional diff:

           readinessProbe:
             httpGet:
               path: /readyz
               port: healthz
-            initialDelaySeconds: 5
             periodSeconds: 10
             timeoutSeconds: 5
             failureThreshold: 3
           livenessProbe:
             httpGet:
               path: /healthz
               port: healthz
-            initialDelaySeconds: 15
             periodSeconds: 10
             timeoutSeconds: 5
             failureThreshold: 3

If you keep the delays, that’s fine too—behavior is still correct.

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 63cf177 and 3eb84c9.

📒 Files selected for processing (4)
  • deploy/operator.yaml (2 hunks)
  • pkg/bundle/deploy.go (3 hunks)
  • pkg/operator/manager.go (4 hunks)
  • pkg/operator/operator.go (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • pkg/operator/operator.go
🧰 Additional context used
🪛 YAMLlint (1.37.1)
deploy/operator.yaml

[warning] 45-45: wrong indentation: expected 10 but found 12

(indentation)


[error] 45-45: trailing spaces

(trailing-spaces)


[error] 46-46: trailing spaces

(trailing-spaces)


[warning] 49-49: wrong indentation: expected 10 but found 12

(indentation)

🔇 Additional comments (3)
pkg/operator/manager.go (1)

27-27: LGTM: correct import for controller-runtime health checks

Importing sigs.k8s.io/controller-runtime/pkg/healthz is appropriate for registering /healthz and /readyz endpoints.

pkg/bundle/deploy.go (2)

6235-6238: Args look correct; verify entrypoint supports subcommand and flag.

  • The container now invokes: operator run --health-probe-bind-address=:8081. That matches controller-runtime convention, but only if your binary’s root command exposes the run subcommand and recognizes the probe flag.
  • Please sanity-check the operator image you ship in all deployment channels accepts these args; otherwise the container may exit immediately.

Would you like a quick script to pull the image used in CI and validate the help/flags exposed by the binary? I can supply one.


6240-6241: Port naming and probe port reference are consistent.

Defining a named container port healthz: 8081 and referencing it from the probes is a good practice that makes refactors safer. No issues here.

Comment on lines 48 to 50
// Probe address from CLI flag (defaults to :8081)
probeAddr, _ := cmd.Flags().GetString("health-probe-bind-address")

Copy link

@coderabbitai coderabbitai bot Aug 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Verification agent

🧩 Analysis chain

Don’t ignore flag parsing errors; add a safe default for probe address

cmd.Flags().GetString(...) error is discarded. While it should succeed if the flag is defined, ignoring the error can mask regressions and leaves probeAddr empty (which disables the probe server) if the flag is missing or mistyped.

Apply this minimal diff to handle the error and ensure a sane default:

-	// Probe address from CLI flag (defaults to :8081)
-	probeAddr, _ := cmd.Flags().GetString("health-probe-bind-address")
+	// Probe address from CLI flag (defaults to :8081)
+	probeAddr, err := cmd.Flags().GetString("health-probe-bind-address")
+	if err != nil {
+		log.Warnf("Failed reading --health-probe-bind-address flag: %v (defaulting to :8081)", err)
+		probeAddr = ":8081"
+	}
+	if probeAddr == "" {
+		log.Warn("Empty --health-probe-bind-address disables probes; defaulting to :8081")
+		probeAddr = ":8081"
+	}

To double-check the flag wiring across the repo, run:


🏁 Script executed:

#!/bin/bash
# Verify the flag is defined and used in manifests
rg -n --type=go -C2 'health-probe-bind-address'
rg -n -C3 'health-probe-bind-address' -- 'deploy/**' 'deploy/*.yaml' 'pkg/bundle/**' 2>/dev/null || true

Length of output: 1109


Handle health-probe-bind-address flag parsing errors in pkg/operator/manager.go

  • Verified that --health-probe-bind-address is defined as a persistent flag in pkg/operator/operator.go with a default of :8081, and is referenced in the bundled manifests.
  • In pkg/operator/manager.go (around line 49), calling cmd.Flags().GetString(...)
    1. Discards any lookup error (e.g., if the flag name is mistyped or not registered on this command)
    2. May never pick up the persistent flag, since it lives in the parent command’s flag set
    3. Leaves probeAddr empty on error, inadvertently disabling health probes
  • To ensure the flag is read correctly and to guard against missing or empty values, switch to the persistent flag set, check the error, and fall back to the known default.

Apply this diff:

--- a/pkg/operator/manager.go
+++ b/pkg/operator/manager.go
@@ -46,4 +46,12 @@ func NewManagerCommand(...) {
-   // Probe address from CLI flag (defaults to :8081)
-   probeAddr, _ := cmd.Flags().GetString("health-probe-bind-address")
+   // Probe address from CLI flag (defaults to :8081)
+   probeAddr, err := cmd.PersistentFlags().GetString("health-probe-bind-address")
+   if err != nil {
+       log.Warnf("Failed to read --health-probe-bind-address: %v, defaulting to :8081", err)
+       probeAddr = ":8081"
+   }
+   if probeAddr == "" {
+       log.Warn("Empty --health-probe-bind-address disables probes; defaulting to :8081")
+       probeAddr = ":8081"
+   }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
// Probe address from CLI flag (defaults to :8081)
probeAddr, _ := cmd.Flags().GetString("health-probe-bind-address")
// Probe address from CLI flag (defaults to :8081)
probeAddr, err := cmd.PersistentFlags().GetString("health-probe-bind-address")
if err != nil {
log.Warnf("Failed to read --health-probe-bind-address: %v, defaulting to :8081", err)
probeAddr = ":8081"
}
if probeAddr == "" {
log.Warn("Empty --health-probe-bind-address disables probes; defaulting to :8081")
probeAddr = ":8081"
}
🤖 Prompt for AI Agents
In pkg/operator/manager.go around lines 48 to 50, the call to
cmd.Flags().GetString("health-probe-bind-address") ignores errors and reads the
local flagset (so it can miss the persistent flag), which can leave probeAddr
empty and disable probes; update to use
cmd.PersistentFlags().GetString("health-probe-bind-address"), check and handle
the returned error (log/return or fallback), and if the value is empty or an
error occurred, set probeAddr to the known default ":8081" so health probes
remain enabled.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

$ kubectl describe pod noobaa-operator-6fc879d987-wlkx9
    Args:
      operator
      run
      --health-probe-bind-address=:8081
    State:       Waiting
      Reason:    CrashLoopBackOff
    Last State:  Terminated
      Reason:    Error
      Message:   Error: unknown flag: --health-probe-bind-address
See 'noobaa operator run --help' for usage.

      Exit Code:    1
      Started:      Sun, 26 Oct 2025 16:25:56 +0200
      Finished:     Sun, 26 Oct 2025 16:25:56 +0200
    Ready:          False
    Restart Count:  2
    Limits:
      cpu:     250m
      memory:  512Mi
    Requests:
      cpu:      100m
      memory:   256Mi
    Liveness:   http-get http://:healthz/readyz delay=15s timeout=5s period=10s #success=1 #failure=3
    Readiness:  http-get http://:healthz/readyz delay=5s timeout=5s period=10s #success=1 #failure=3
    Startup:    http-get http://:healthz/healthz delay=0s timeout=5s period=10s #success=1 #failure=30
    Environment:
      OPERATOR_NAME:          noobaa-operator
      POD_NAME:               noobaa-operator-6fc879d987-wlkx9 (v1:metadata.name)
      WATCH_NAMESPACE:        default (v1:metadata.namespace)
      DEV_ENV:                true
      NOOBAA_CLI_DEPLOYMENT:  true

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Find where the Cobra commands are defined, especially the "run" command
rg -n --type=go -C5 'cobra\.Command.*run'

Length of output: 0


🏁 Script executed:

#!/bin/bash
# Look for flag definitions in the operator package
rg -n --type=go -C3 'PersistentFlags\(\)|Flags\(\)' pkg/operator/

Length of output: 6721


🏁 Script executed:

#!/bin/bash
# Find the operator.go file and examine how commands are structured
fd -t f 'operator.go' pkg/operator/ --exec cat {}

Length of output: 20129


@OdedViner, the root cause is clear from your error message: the --health-probe-bind-address flag is never defined in the Cobra command structure.

The deployment manifest passes --health-probe-bind-address=:8081, and pkg/operator/manager.go tries to read it, but CmdRun() in pkg/operator/operator.go never registers the flag, so the CLI rejects it as unknown.

Fix: Add the flag definition to CmdRun() in pkg/operator/operator.go:

// CmdRun returns a CLI command
func CmdRun() *cobra.Command {
	cmd := &cobra.Command{
		Use:   "run",
		Short: "Runs the noobaa-operator",
		Run:   RunOperator,
		Args:  cobra.NoArgs,
	}
	cmd.Flags().String("health-probe-bind-address", ":8081", "The address the health probe endpoint binds to")
	return cmd
}

This registers the flag with a default of :8081, allowing the manifest's --health-probe-bind-address=:8081 argument to be parsed successfully and the pod to start.


🧬 Code Graph Analysis Results
  • File: pkg/operator/manager.go
    • Summary (re health probe flag and address):
      • Description: The operator's RunOperator constructs a manager with a HealthProbeBindAddress set to healthProbeAddr. The variable healthProbeAddr is defined as ":8081", which means the operator exposes health probes on port 8081. If the deployment or binary version does not recognize the --health-probe-bind-address flag, this mapping indicates the code expects the health probe address to be provided via HealthProbeBindAddress on the controller-runtime manager, not as a separate CLI flag.
      • Relevant details:
        • healthProbeAddr is defined as ":8081" (default for the health probe endpoint).
        • The manager is created with HealthProbeBindAddress: healthProbeAddr, which wires /healthz and /readyz to that address.
      • Location notes:
        • Health probe address default and usage are in the RunOperator setup where the manager is instantiated, and the healthProbeAddr variable is defined at the top of the file.

@nimrod-becker
Copy link
Contributor

@dannyzaken @tangledbytes please take a look

@OdedViner OdedViner force-pushed the liveness_probe branch 2 times, most recently from 4e1b572 to 6095110 Compare August 31, 2025 14:09
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (3)
deploy/operator.yaml (2)

44-47: Trim trailing spaces in args to keep YAML clean and avoid linter failures.
Matches prior feedback; still present on "operator" and "run".

Apply:

           args:
-            - operator  
-            - run       
+            - operator
+            - run
            - --health-probe-bind-address=:8081

48-50: Fix ports block indentation.
Keep it aligned with other container-level keys.

-          ports:
-            - name: healthz
-              containerPort: 8081
+          ports:
+          - name: healthz
+            containerPort: 8081
pkg/operator/manager.go (1)

48-49: Don’t ignore flag errors; read from PersistentFlags and default safely.
Ignoring the error can disable probes if the flag isn’t found.

-	// Probe address from CLI flag (defaults to :8081)
-	probeAddr, _ := cmd.Flags().GetString("health-probe-bind-address")
+	// Probe address from CLI flag (defaults to :8081)
+	probeAddr, err := cmd.PersistentFlags().GetString("health-probe-bind-address")
+	if err != nil {
+		log.Warnf("Failed reading --health-probe-bind-address: %v (defaulting to :8081)", err)
+		probeAddr = ":8081"
+	}
+	if probeAddr == "" {
+		log.Warn("Empty --health-probe-bind-address disables probes; defaulting to :8081")
+		probeAddr = ":8081"
+	}

Run to confirm the flag is defined and referenced:

#!/bin/bash
rg -n --type=go -C2 'health-probe-bind-address'
🧹 Nitpick comments (1)
pkg/operator/manager.go (1)

104-111: Checks registration LGTM; consider clearer names and optional cache-ready check.
Minor: use names like "ping" instead of repeating endpoint names; optionally add a cache sync readiness check if needed.

-	if err := mgr.AddHealthzCheck("healthz", healthz.Ping); err != nil {
+	if err := mgr.AddHealthzCheck("ping", healthz.Ping); err != nil {
 		log.Fatalf("Failed to add health check: %s", err)
 	}
-	if err := mgr.AddReadyzCheck("readyz", healthz.Ping); err != nil {
+	if err := mgr.AddReadyzCheck("ping", healthz.Ping); err != nil {
 		log.Fatalf("Failed to add readiness check: %s", err)
 	}

Optional cache-ready example (if you want stricter readiness):

ready := atomic.Bool{}
_ = mgr.Add(manager.RunnableFunc(func(ctx context.Context) error {
    // mark ready after caches start
    ready.Store(true)
    <-ctx.Done()
    return nil
}))
_ = mgr.AddReadyzCheck("started", func(_ *http.Request) error {
    if !ready.Load() { return fmt.Errorf("not started") }
    return nil
})
📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 4e1b572 and 6095110.

📒 Files selected for processing (4)
  • deploy/operator.yaml (2 hunks)
  • pkg/bundle/deploy.go (3 hunks)
  • pkg/operator/manager.go (4 hunks)
  • pkg/operator/operator.go (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
  • pkg/operator/operator.go
  • pkg/bundle/deploy.go
🧰 Additional context used
🪛 YAMLlint (1.37.1)
deploy/operator.yaml

[warning] 45-45: wrong indentation: expected 10 but found 12

(indentation)


[error] 45-45: trailing spaces

(trailing-spaces)


[error] 46-46: trailing spaces

(trailing-spaces)


[warning] 49-49: wrong indentation: expected 10 but found 12

(indentation)

🔇 Additional comments (3)
deploy/operator.yaml (1)

76-82: Startup probe endpoint choice is fine.
Using /readyz here is acceptable; no change needed.

pkg/operator/manager.go (2)

27-27: Import healthz looks good.
Required for adding checks.


72-72: Wiring HealthProbeBindAddress into mgr is correct.
This enables /healthz and /readyz.

Comment on lines +60 to +77
readinessProbe:
httpGet:
path: /readyz
port: healthz
initialDelaySeconds: 5
periodSeconds: 10
timeoutSeconds: 5
failureThreshold: 3
livenessProbe:
httpGet:
path: /readyz
port: healthz
initialDelaySeconds: 15
periodSeconds: 10
timeoutSeconds: 5
failureThreshold: 3
startupProbe:
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Verification agent

🧩 Analysis chain

Use /healthz for liveness; /readyz for readiness (current YAML uses /readyz for liveness).
This avoids killing a live process just because it isn’t ready yet and matches the PR description.

           readinessProbe:
             httpGet:
               path: /readyz
               port: healthz
             initialDelaySeconds: 5
             periodSeconds: 10
             timeoutSeconds: 5
             failureThreshold: 3
           livenessProbe:
             httpGet:
-              path: /readyz
+              path: /healthz
               port: healthz
             initialDelaySeconds: 15
             periodSeconds: 10
             timeoutSeconds: 5
             failureThreshold: 3

🏁 Script executed:

#!/usr/bin/env bash
set -e
# Locate the operator.yaml file
OP_FILE=$(fd --type f operator.yaml || true)
echo "Found file: $OP_FILE"
[ -n "$OP_FILE" ] || { echo "operator.yaml not found"; exit 1; }
# Print lines 60–76 for verification
sed -n '60,76p' "$OP_FILE"

Length of output: 698


Use /healthz for livenessProbe
Swap the livenessProbe HTTP path to /healthz (keep readinessProbe on /readyz) to avoid killing a live process before it’s ready.

🤖 Prompt for AI Agents
In deploy/operator.yaml around lines 60 to 76, the livenessProbe is using the
/readyz path which can cause a live process to be killed before it is fully
ready; change the livenessProbe httpGet.path value from /readyz to /healthz
while leaving the readinessProbe on /readyz (no other probe changes required).

failureThreshold: 3
livenessProbe:
httpGet:
path: /readyz
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should all probes be the same?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we only perform ping, I suppose it's okay for them to be set to the same path. In the future, we might want to use more advanced probes, but for now, it's enough.

}

// Register health and readiness endpoints on mgr
if err := mgr.AddHealthzCheck("healthz", healthz.Ping); err != nil {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we use readyz for both probes, I think we can remove healthz

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Short: "Deployment using operator",
}
// health-probe flag available on operator and inherited by subcommands
cmd.PersistentFlags().String("health-probe-bind-address", ":8081", "HTTP address for health/readiness probes (e.g., :8081)")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it necessary to have a flag for the probe address? If there isn't a need to change it for some cases, we can have it defined in manager.go (e.g.: similar to the metrics address\port here)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When I moved the probe address to manager.go, the operator failed to start because the flag is still passed in the Deployment arguments. The pod crashes with the following error:

$ kubectl describe pod noobaa-operator-6fc879d987-wlkx9
    Args:
      operator
      run
      --health-probe-bind-address=:8081
    State:       Waiting
      Reason:    CrashLoopBackOff
    Last State:  Terminated
      Reason:    Error
      Message:   Error: unknown flag: --health-probe-bind-address
See 'noobaa operator run --help' for usage.

      Exit Code:    1
      Started:      Sun, 26 Oct 2025 16:25:56 +0200
      Finished:     Sun, 26 Oct 2025 16:25:56 +0200
    Ready:          False
    Restart Count:  2
    Limits:
      cpu:     250m
      memory:  512Mi
    Requests:
      cpu:      100m
      memory:   256Mi
    Liveness:   http-get http://:healthz/readyz delay=15s timeout=5s period=10s #success=1 #failure=3
    Readiness:  http-get http://:healthz/readyz delay=5s timeout=5s period=10s #success=1 #failure=3
    Startup:    http-get http://:healthz/healthz delay=0s timeout=5s period=10s #success=1 #failure=30
    Environment:
      OPERATOR_NAME:          noobaa-operator
      POD_NAME:               noobaa-operator-6fc879d987-wlkx9 (v1:metadata.name)
      WATCH_NAMESPACE:        default (v1:metadata.namespace)
      DEV_ENV:                true
      NOOBAA_CLI_DEPLOYMENT:  true

Copy link
Member

@dannyzaken dannyzaken Oct 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can remove it from operator.yaml

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (2)
deploy/operator.yaml (2)

48-50: Align ports list indentation under 'ports:'.

Re-align the sequence to satisfy yamllint’s expected indentation.

-          ports:
-            - name: healthz
-              containerPort: 8081
+          ports:
+          - name: healthz
+            containerPort: 8081

68-75: Use /healthz for livenessProbe (keep /readyz for readiness/startup).

Best practice is liveness→/healthz, readiness/startup→/readyz. With current ping checks it’s mostly equivalent, but /healthz avoids coupling liveness to readiness state.

           livenessProbe:
             httpGet:
-              path: /readyz
+              path: /healthz
               port: healthz
🧹 Nitpick comments (1)
pkg/bundle/deploy.go (1)

6270-6275: Optional: consider /healthz for startupProbe for consistency.

Startup can also target /healthz; keeping readiness on /readyz and liveness/startup on /healthz is a common pattern.

           startupProbe:
             httpGet:
-              path: /readyz
+              path: /healthz
               port: healthz
             periodSeconds: 10
             timeoutSeconds: 5
             failureThreshold: 30
📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 6095110 and f87a9a7.

📒 Files selected for processing (4)
  • deploy/operator.yaml (2 hunks)
  • pkg/bundle/deploy.go (3 hunks)
  • pkg/operator/manager.go (4 hunks)
  • pkg/operator/operator.go (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
  • pkg/operator/manager.go
  • pkg/operator/operator.go
🧰 Additional context used
🪛 YAMLlint (1.37.1)
deploy/operator.yaml

[warning] 45-45: wrong indentation: expected 10 but found 12

(indentation)


[error] 45-45: trailing spaces

(trailing-spaces)


[error] 46-46: trailing spaces

(trailing-spaces)


[warning] 49-49: wrong indentation: expected 10 but found 12

(indentation)

🔇 Additional comments (3)
pkg/bundle/deploy.go (3)

6237-6241: Add health probe bind address CLI flag (good).

Flag looks correct for controller-runtime; confirms the probe server will listen on :8081. Ensure the operator binary exposes this flag and uses it to start the health server.


6242-6244: Expose named health port (good).

Named port "healthz" matches the httpGet probe references; no issues.


6253-6261: Readiness probe config looks sensible.

/readyz path, timing, and thresholds are reasonable defaults.

Comment on lines 44 to 48
args:
- operator
- run
- --health-probe-bind-address=:8081
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Verification agent

🧩 Analysis chain

Trim trailing spaces and normalize args indentation.

Lines 45–46 have trailing spaces; fix to keep CI/yamllint green and avoid noisy diffs.

-          args:
-            - operator  
-            - run       
-            - --health-probe-bind-address=:8081
+          args:
+            - operator
+            - run
+            - --health-probe-bind-address=:8081

To verify no trailing spaces remain:


🏁 Script executed:

#!/usr/bin/env bash
grep -nE ' +$' deploy/operator.yaml || echo "No trailing spaces"

Length of output: 93


Trim trailing spaces in deploy/operator.yaml args
Lines 45–46 contain trailing spaces; remove them to keep CI/yamllint green.

🧰 Tools
🪛 YAMLlint (1.37.1)

[warning] 45-45: wrong indentation: expected 10 but found 12

(indentation)


[error] 45-45: trailing spaces

(trailing-spaces)


[error] 46-46: trailing spaces

(trailing-spaces)

🤖 Prompt for AI Agents
In deploy/operator.yaml around lines 44 to 47, the args entries ("operator" and
"run") include trailing spaces on lines 45–46; remove the trailing whitespace
characters from those two args so each arg is trimmed to exact tokens (e.g.,
"operator" and "run") to satisfy yamllint/CI and keep YAML clean.

Comment on lines +6262 to +6266
httpGet:
path: /readyz
port: healthz
initialDelaySeconds: 15
periodSeconds: 10
timeoutSeconds: 5
failureThreshold: 3
startupProbe:
Copy link

@coderabbitai coderabbitai bot Sep 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Liveness probe should hit /healthz, not /readyz.

controller-runtime exposes both /healthz and /readyz; liveness is expected to target /healthz. Pointing liveness to /readyz can mask unhealthy-but-ready states. Switch the path to /healthz.

           livenessProbe:
             httpGet:
-              path: /readyz
+              path: /healthz
               port: healthz
             initialDelaySeconds: 15
             periodSeconds: 10
             timeoutSeconds: 5
             failureThreshold: 3
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
httpGet:
path: /readyz
port: healthz
initialDelaySeconds: 15
periodSeconds: 10
timeoutSeconds: 5
failureThreshold: 3
startupProbe:
httpGet:
path: /healthz
port: healthz
initialDelaySeconds: 15
periodSeconds: 10
timeoutSeconds: 5
failureThreshold: 3
startupProbe:
🤖 Prompt for AI Agents
In pkg/bundle/deploy.go around lines 6262 to 6269 the livenessProbe httpGet path
is set to /readyz but should target /healthz; change the livenessProbe httpGet
path value from "/readyz" to "/healthz" so the liveness check uses the
controller-runtime health endpoint and does not mask unhealthy-but-ready states.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@OdedViner Thanks! Confirming the liveness probe now points to /healthz. As a quick sanity check, you can verify the live settings on a running pod with:
kubectl -n noobaa get deploy noobaa-operator -o jsonpath='{.spec.template.spec.containers[?(@.name=="noobaa-operator")].livenessProbe.httpGet.path}{"\n"}'

If there are multiple definitions of the operator Deployment (e.g., embedded YAML and any generated manifests), make sure they’re all aligned: liveness=/healthz and readiness/startup=/readyz. Nice work.


✏️ Learnings added
Learnt from: OdedViner
PR: noobaa/noobaa-operator#1684
File: pkg/bundle/deploy.go:6259-6266
Timestamp: 2025-10-26T14:17:07.989Z
Learning: For controller-runtime operators, use livenessProbe path /healthz and readiness/startup path /readyz (port :8081) to avoid masking unhealthy-but-ready states.

Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.


If you found this review helpful, would you consider giving us a shout-out on X?

Thank you for using CodeRabbit!

- integrate controller-runtime healthz and readyz endpoints into the namespace-scoped manager
- expose probe server on :8081 and register default checks
- update operator Deployment with liveness, readiness, and startup probes

Signed-off-by: Oded Viner <[email protected]>
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (2)
deploy/operator.yaml (2)

69-76: Consider using /healthz for livenessProbe.

Best practice is to use /healthz for liveness (to check if process is alive) and /readyz for readiness (to check if ready to serve traffic). Using /readyz for liveness can kill a live process that isn't ready yet.

However, as noted by dannyzaken in previous comments, since the current implementation only performs a ping check, using the same path for all probes is acceptable for now.


45-48: Remove trailing spaces from args entries.

Lines 46-47 still contain trailing spaces after "operator" and "run", which will cause yamllint failures in CI.

Apply this diff:

           args:
-            - operator  
-            - run       
+            - operator
+            - run
             - --health-probe-bind-address=:8081
🧹 Nitpick comments (1)
pkg/operator/manager.go (1)

35-38: Health probe address is hardcoded; consider reading the CLI flag.

The deployment manifest (deploy/operator.yaml) passes --health-probe-bind-address=:8081, but the code hardcodes healthProbeAddr = ":8081" and never reads the flag value. This means the CLI flag has no effect.

If the flag is intended to be configurable, read it from the command:

probeAddr := ":8081" // default
if cmd != nil {
    if val, err := cmd.Flags().GetString("health-probe-bind-address"); err == nil && val != "" {
        probeAddr = val
    }
}
healthProbeAddr = probeAddr

Otherwise, remove the unused flag from the deployment args.

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 3fa2e24 and 17b70f5.

📒 Files selected for processing (3)
  • deploy/operator.yaml (2 hunks)
  • pkg/bundle/deploy.go (2 hunks)
  • pkg/operator/manager.go (3 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • pkg/bundle/deploy.go
🧰 Additional context used
🧬 Code graph analysis (1)
pkg/operator/manager.go (1)
pkg/util/util.go (1)
  • Logger (903-905)
🔇 Additional comments (4)
deploy/operator.yaml (1)

49-51: LGTM!

The healthz port is correctly configured and properly indented.

pkg/operator/manager.go (3)

60-72: LGTM!

Health probe bind address is correctly wired into the namespace-scoped manager, enabling /healthz and /readyz endpoints.


78-86: Verify that cluster-scoped manager intentionally omits health probes.

The cluster-scoped manager (cmgr) is created without HealthProbeBindAddress, so it won't expose /healthz or /readyz endpoints. This is likely intentional since only one manager needs to serve probes for the pod, but please confirm this design choice is correct.


103-105: LGTM!

The readiness check is properly registered on the namespace-scoped manager using the standard healthz.Ping handler, with appropriate error handling.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (2)
pkg/operator/manager.go (1)

35-39: Probe address ignores CLI flag; wire to PersistentFlags with safe default

Currently a hardcoded healthProbeAddr is used; the --health-probe-bind-address flag is not read, so user/manifest config is ignored. Please read the persistent flag, handle errors, and fall back to the default.

Apply this minimal diff:

@@
 var (
   metricsHost           = "0.0.0.0"
   metricsPort     int32 = 8383
   healthProbeAddr       = ":8081"
   log                   = util.Logger()
 )
@@
-  // mgr => namespace scoped manager
-  mgr, err := manager.New(config, manager.Options{
+  // Resolve probe address from flag with safe default
+  probeAddr, err := cmd.PersistentFlags().GetString("health-probe-bind-address")
+  if err != nil {
+    log.Warnf("Failed to read --health-probe-bind-address: %v; defaulting to %s", err, healthProbeAddr)
+    probeAddr = healthProbeAddr
+  }
+  if probeAddr == "" {
+    log.Warnf("Empty --health-probe-bind-address disables probes; defaulting to %s", healthProbeAddr)
+    probeAddr = healthProbeAddr
+  }
+
+  // mgr => namespace scoped manager
+  mgr, err := manager.New(config, manager.Options{
@@
-    HealthProbeBindAddress: healthProbeAddr, // Serve /healthz and /readyz here
+    HealthProbeBindAddress: probeAddr, // Serve /healthz and /readyz here
   })
pkg/bundle/deploy.go (1)

6259-6265: Fix probe endpoints: liveness => /healthz, startup => /readyz.

  • Liveness is incorrectly checking /readyz; should hit /healthz.
  • Startup should gate on readiness (/readyz), not /healthz.
    This prevents masking unhealthy-but-ready states. Based on learnings.

Apply this diff:

           livenessProbe:
             httpGet:
-              path: /readyz
+              path: /healthz
               port: healthz
             initialDelaySeconds: 15
             periodSeconds: 10
             timeoutSeconds: 5
             failureThreshold: 3
           startupProbe:
             httpGet:
-              path: /healthz
+              path: /readyz
               port: healthz
             periodSeconds: 10
             timeoutSeconds: 5
             failureThreshold: 30

Also applies to: 6267-6272

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 17b70f5 and c804893.

📒 Files selected for processing (2)
  • pkg/bundle/deploy.go (2 hunks)
  • pkg/operator/manager.go (3 hunks)
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-10-26T14:17:07.989Z
Learnt from: OdedViner
PR: noobaa/noobaa-operator#1684
File: pkg/bundle/deploy.go:6259-6266
Timestamp: 2025-10-26T14:17:07.989Z
Learning: For controller-runtime operators, use livenessProbe path /healthz and readiness/startup path /readyz (port :8081) to avoid masking unhealthy-but-ready states.

Applied to files:

  • pkg/bundle/deploy.go
🧬 Code graph analysis (1)
pkg/operator/manager.go (1)
pkg/util/util.go (1)
  • Logger (903-905)
🔇 Additional comments (3)
pkg/operator/manager.go (2)

27-27: Import looks good

healthz import is correct for readiness/liveness wiring.


71-71: LGTM; will honor flags after wiring above

Using HealthProbeBindAddress on the namespace-scoped manager is correct. Once the flag wiring above is in, this will respect manifests/CLI.

pkg/bundle/deploy.go (1)

6234-6241: Health server wiring looks good.

--health-probe-bind-address=:8081 and the named port healthz are correctly configured.

Comment on lines +103 to +112
if err := mgr.AddReadyzCheck("readyz", healthz.Ping); err != nil {
log.Fatalf("Failed to add readiness check: %s", err)
}

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

Ready check uses Ping; OK, but ensure probes hit /readyz

Using healthz.Ping is fine. Since healthz was removed per discussion, verify liveness and startup probes in manifests target /readyz too.

To verify across the repo:


🏁 Script executed:

#!/bin/bash
# Confirm flag presence and probe paths
rg -n --type=go -C2 'health-probe-bind-address'
rg -n -C3 '(livenessProbe|readinessProbe|startupProbe).*path' deploy pkg/bundle 2>/dev/null || true

Length of output: 272


🏁 Script executed:

#!/bin/bash
# Search for all probe definitions (without path restriction)
rg -n 'livenessProbe|readinessProbe|startupProbe' --type=go --type=yaml --type=json 2>/dev/null || true

Length of output: 881


🏁 Script executed:

#!/bin/bash
# Extract probe definitions with paths from key files
echo "=== deploy/operator.yaml probes ==="
rg -n -A5 'readinessProbe:|livenessProbe:|startupProbe:' deploy/operator.yaml | head -40

echo ""
echo "=== pkg/bundle/deploy.go manager probes (lines 6250-6270) ==="
sed -n '6248,6275p' pkg/bundle/deploy.go

Length of output: 1443


🏁 Script executed:

#!/bin/bash
# Check for other health check additions in manager.go
echo "=== Health check registrations in manager.go ==="
rg -n 'Add.*Check' pkg/operator/manager.go -A2

echo ""
echo "=== Startup probe paths across all files ==="
rg -n -B2 -A2 'startupProbe:' deploy/ pkg/bundle/deploy.go 2>/dev/null | grep -A4 'path:'

Length of output: 503


Fix livenessProbe paths and resolve startup probe inconsistency

The probe configurations violate controller-runtime best practices and create misalignment:

  • livenessProbe paths are wrong: Both deploy/operator.yaml (line 71) and pkg/bundle/deploy.go (line 6263) incorrectly use /readyz instead of /healthz. This masks unhealthy-but-ready states.
  • startupProbe inconsistency: deploy/operator.yaml (line 79) uses /readyz, but pkg/bundle/deploy.go (line 6268) uses /healthz.
  • Missing check registrations: Only AddReadyzCheck is registered in manager.go (line 103). Liveness and startup checks are not registered in code despite probes expecting them.

Correct per controller-runtime pattern: livenessProbe → /healthz, readinessProbe/startupProbe → /readyz. Fix probe paths in both manifest files, and register liveness/startup checks in manager.go.

🤖 Prompt for AI Agents
In pkg/operator/manager.go around lines 103 to 106, only AddReadyzCheck is
registered; register the missing liveness and startup checks and use the
controller-runtime health endpoints: add mgr.AddHealthzCheck("healthz",
healthz.Ping) for the liveness check and mgr.AddStartupCheck("startup",
healthz.Ping) (or AddStartUpCheck if using that naming in your
controller-runtime version) for the startup check alongside the existing
AddReadyzCheck("readyz", healthz.Ping). Also update probe paths in the manifests
so livenessProbe -> /healthz and readinessProbe/startupProbe -> /readyz in both
deploy/operator.yaml (lines ~71 and ~79) and pkg/bundle/deploy.go (around lines
~6263 and ~6268) to match the registered checks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants