Skip to content

Latest commit

 

History

History
343 lines (280 loc) · 29.1 KB

File metadata and controls

343 lines (280 loc) · 29.1 KB

Changelog

All notable changes to this project will be documented in this file.

The format is based on Keep a Changelog and is generated by Changie.

v26.2.1-beta.2 - 2026-05-28

Added

  • Support importing existing Redpanda users into operator management and ongoing credential sync via spec.authentication.syncCredentials for external secret rotation (e.g. ESO).
  • V2 (cluster.redpanda.com/v1alpha2 Redpanda) Prometheus metrics — redpandas, redpanda_desired_nodes, redpanda_ready_nodes, and redpanda_misconfigured_clusters — mirroring the existing v1 ClusterMetricController so observability is at parity between the two operator modes.
  • Added rpk-k8s multicluster bundle for collecting cross-cluster operator diagnostics (per-peer pod, deployment, TLS, raft status, logs, multi-sample /metrics) into a single zip. Discovers peers from labelled kubeconfig cache Secrets given any one peer's kubeconfig. Includes a chart-level binding of the operator ServiceAccount to the metrics-reader ClusterRole so the bundle (and the existing ServiceMonitor) can scrape /metrics without 403.
  • Added end-to-end operator observability for the multicluster operator. (1) Multicluster raft metrics under operator_multicluster_raft_* — leader changes, sent/received message rates by type and peer, send latency / errors / drops, peer reachability, snapshot bookkeeping, plus collector-emitted gauges for current term, state, and per-peer send-queue length read on scrape from the transport's atomics so the Ready loop hot path stays write-free. (2) Reconcile-health metrics under operator_controller_*reconcile_steady_state_total (counter) and reconcile_last_success_timestamp_seconds (gauge), wrapper-emitted automatically for every controller via observability.Wrap(reconciler, controller, defaultRequeueTimeout). Both (Result{}, nil) and (Result{RequeueAfter: defaultRequeueTimeout}, nil) count as steady state so controllers using the periodic-requeue pattern (e.g. MulticlusterReconciler's defer-set RequeueAfter) register as healthy. (3) StretchCluster member-status metrics under operator_stretchcluster_*member_reachable, brokers / brokers_ready, replication_health, spec_drift, recorded by the MulticlusterReconciler alongside data it already computes (background reachability probe, NodePool fetch, checkSpecConsistency, admin API health check) — no new RPCs. New monitoring.rulesEnabled chart value ships a starter PrometheusRule with recording rules and alerts covering errored reconciles, runaway reconcile rate, stalled controllers, worker pool saturation, member unreachability, broker count skew, spec drift, and unhealthy replication. Comprehensive single-view Grafana dashboard at docs/operator-grafana-dashboard.json; canonical metric inventory at docs/operator-metrics.md.
  • Register the Redpanda Console controller in multicluster operator mode and teach Console.spec.cluster.clusterRef to resolve kind: StretchCluster so operator-managed Console deployments work in StretchCluster setups. The Console controller previously did not run when the operator was started via multicluster, which silently dropped any cluster.redpanda.com/v1alpha2 Console CR applied alongside a StretchCluster.
  • Added scrapeInterval and labels fields to the operator Helm chart's monitoring configuration. scrapeInterval sets the scrape interval on the ServiceMonitor endpoint (defaults to empty, preserving the Prometheus global default). labels adds custom labels to the ServiceMonitor metadata.
  • Added RedpandaBrokerPool CRD to manage pools in the Redpanda StretchCluster
  • Added persistentVolumeClaimRetentionPolicy to the Redpanda CRD (spec.clusterSpec.statefulset.persistentVolumeClaimRetentionPolicy) and to the NodePool CRD (spec.persistentVolumeClaimRetentionPolicy) so PVCs can be deleted automatically when brokers are decommissioned via scale-down or when the cluster is torn down. The Redpanda CRD value applies as the default for every NodePool (default and named); a NodePool's own value, if set, overrides the cluster-level default.

Changed

  • Moved per-K8s-cluster fields off StretchCluster.spec onto RedpandaBrokerPool.spec to enable heterogeneous broker pools (different TLS, listeners, external access, RBAC, ServiceAccount, monitoring, rack awareness, cluster domain, and logging per pool). Removed service entirely (use the new internalServiceAnnotations field on StretchCluster.spec to annotate the headless ClusterIP Service). auditLogging field type changed from *AuditLogging to *StretchAuditLogging. storage, resources, and imagePullSecrets remain legal on both specs and merge at render time — the pool's non-nil subfield wins, the cluster fills the rest (per-key for maps); imagePullSecrets does a pool-wins-if-non-empty override. The StatefulSets volumeClaimTemplates ObjectMeta now propagates pv.Annotations and pv.Labels, and persistentVolumeClaimRetentionPolicy is sourced from the pool spec. Existing manifests with the moved fields on StretchCluster.spec must relocate them to each RedpandaBrokerPool spec; field shapes are unchanged.
  • Bump default Redpanda version to 26.1.9

Fixed

  • Ensure that the field manager "helm" gets removed in migration job to clear out field managers that might be set when migrating from helm to the operator.

  • Fixed an issue with post-upgrade processing of cluster configuration that resulted in configuration errors due to removed properties across Redpanda versions.

  • Fixed issue with ballast file tuning where the tuning container didn't mount the proper directory needed for propagating the ballast file to the main container.

  • Removed more historic field managers that have caused problems with changes to StatefulSets.

  • Fixed StretchCluster partial deletion causing cross-cluster resource destruction. When a StretchCluster was deleted from a subset of clusters, the reconciler would destroy resources on surviving clusters and leave the cluster in an unrecoverable state. Added a partial deletion guard that blocks cleanup when the StretchCluster is still active on other clusters, and scoped cross-cluster sync operations (SyncAll, syncBootstrapUser, syncCA) to skip clusters where the StretchCluster is being deleted.

  • Fixed a hot reconcile loop on healthy Redpanda clusters where the operator would re-enqueue itself every ~2 seconds instead of settling into the 3-minute periodic requeue interval. The setStatusCondition helper treated a zero rate limit as "force an update" (because time.Since(anything) > 0 is always true), which bumped lastTransitionTime on every pass and triggered a status write for conditions that had not actually changed. Rate-limited heartbeats for License and Configuration conditions are preserved.

  • Fixed an issue where the operator failed to authenticate to the Redpanda cluster with SCRAM when bootstrapUser.secretKeyRef pointed at an externally-managed secret.

  • Fixed an issue where in-use features from an enterprise cluster did not have deterministic sort order and could cause reconciliation storms due to status changes.

  • StretchCluster admin API client now honours the Factory.WithAdminClientTimeout setting on all code paths (previously silently dropped) and the multicluster binary exposes a --cluster-connection-timeout flag to configure it.

  • Fixed a deadlock where an unhealthy cluster could not recover by adding more broker pods. Scale-up operations in node pools are no longer gated on cluster health; only scale-down (decommission) operations remain gated.

  • Fixed a misleading Cluster CR status condition when the operator could not resolve an external secret referenced from cluster config (e.g. an iceberg REST catalog OAuth2 client secret). Previously, the failure was silently downgraded to a warning, the unexpanded ${secrets.X} placeholder was pushed to the Redpanda admin API, and Redpanda's downstream validation error ("Must set both of iceberg_rest_catalog_client_id ...") surfaced as the condition message — obscuring the real cause (e.g. an AccessDeniedException from the cloud secret store). The reconcile now fails before pushing config, with the unresolved-secret warning as the actionable condition message. Shared code path between Operator v1 and v2.

v26.2.1-beta.1 - 2026-04-24

Added

  • Beta support for StretchCluster that allows deploying operator across multiple kubernetes clusters and creating Redpanda clusters out of Nodepools from different clusters. Requires a valid Redpanda enterprise license.
  • rpk k8s multicluster plugin for bootstrapping, and interacting with StretchClusters.
  • Multicluster-aware leader election so multiple operator replicas in the same Kubernetes cluster can take part in the global raft quorum while only the local lease holder votes.
  • Multicluster PVC unbinder controller that extends stuck-Pod / volume-affinity remediation across all provider clusters.
  • Kubeconfig cache on every raft member so a newly elected leader can engage peer clusters without needing live gRPC round-trips to every other operator at promotion time.
  • NodePool CRD is installed automatically when the operator is deployed in multicluster mode.
  • Per-peer operator Services to support Kubernetes cluster mesh (MCS / Cilium global service) setups for the multicluster raft transport.
  • Opt-in pandaproxy_client.use_localhost / schema_registry_client.use_localhost settings on v1 Cluster CRs to use localhost for internal Schema Registry and Pandaproxy clients.
  • rpk k8s multicluster bootstrap --loadbalancer flag that provisions a dedicated per-cluster peer LoadBalancer Service, waits for the provider to publish an address, and bakes that address into each peer's cert SANs — removing the previous redeploy cycle needed to bootstrap operators over public load balancers. On completion prints a ready-to-paste multicluster.peers helm-values block.

Changed

  • Attempts to unset redpanda.storage.mode on a Topic are now a no-op to avoid warn-level log spam in the broker.
  • Bumped Go to 1.26.1, controller-runtime to the matching release, and updated a range of dependencies (OpenTelemetry v1.43.0, grpc, containerd, golang.org/x/net, go-chi, buger/jsonparser) to address Snyk and govulncheck findings.

Fixed

  • Ensure that the field manager "helm" gets removed in migration job to clear out field managers that might be set when migrating from helm to the operator.
  • Fixed an issue with post-upgrade processing of cluster configuration that resulted in configuration errors due to removed properties across Redpanda versions.
  • Fixed issue with ballast file tuning where the tuning container didn't mount the proper directory needed for propagating the ballast file to the main container.
  • Removed more historic field managers that have caused problems with changes to StatefulSets.
  • Fixed hot reconcile loops on healthy Redpanda clusters driven by two independent root causes: the setStatusCondition helper treated a zero rate limit as "force an update" (because time.Since(anything) > 0 is always true), and non-deterministic map iteration over features.Features flipped the stored InUseFeatures value on every reconcile. InUseFeatures is now sorted before comparison, the zero-rate-limit case is handled correctly, and the condition heartbeat rate limit was stretched from 1 minute to 5 minutes. Rate-limited heartbeats for License and Configuration conditions are preserved.
  • Fixed an issue where the operator failed to authenticate to the Redpanda cluster with SCRAM when bootstrapUser.secretKeyRef pointed at an externally-managed secret.
  • Fixed Schema Registry ACL sync failures on v1alpha1 Cluster resources that configured SR via the legacy single schemaRegistry field instead of the schemaRegistryApi slice — SchemaRegistryInternalListener now checks both fields and the factory routes the no-SR case to the graceful skip path.
  • Fixed namespace-scoped operator deployments logging reconcile errors for resources in other namespaces; non-matching namespaces are now filtered before the initial fetch instead of entering exponential backoff.

v26.1.1 - 2026-03-31

Added

  • Added Internal boolean field to RedpandaRole spec to enable managing Redpanda internal roles (prefixed with "__") using standard Kubernetes resource names.
  • Adds group-based access control (GBAC) support to the Kubernetes operator both as a standalone Group CRD and allowing for Role-based principal binding.
  • Added Schema Registry ACL awareness to all CRDs which support setting ACLs.
  • Support for Kubernetes versions 1.32.x through 1.35.x. Per-PR tests validate against the minimum supported version (1.32.x) and nightly tests validate against the maximum supported version (1.35.x).
  • Added allowPrivilegeEscalation: false and runAsNonRoot: true to redpanda, redpanda-configurator, and sidecar container security contexts to address GKE security findings.
  • Namespace-scoped filtering — Controllers can now run in namespace-scoped mode with noise filtering.

Changed

  • Add two +kubebuilder:printcolumn annotations to the Topic struct, matching the same pattern used by the Redpanda CRD.

Fixed

  • Schema Registry ACL sync failures on v1alpha1 clusters
  • Fixed issue with elevated reconciliation rates for ShadowLinks in large cluster deployments.
  • Remove helm-controller field ownership to ensure old clusters deployed pre-Flux removal don't run into errors with accidentally merging fields that should otherwise be removed/overwritten.

v25.3.1 - 2025-12-10

Added

  • ShadowLink CRD for controlling 25.3 shadow link settings. See documentation for details.

Changed

  • Use the kube-system namespace by default for leader election when the operator is running in cluster-scoped mode.

v25.2.1 - 2025-12-02

Added

  • Roles can be declaratively managed using RedpandaRole CRD
  • Add experimental support for node pools. To enable node pool support you must install the experimental NodePool CRDs and run the controller with the --enable-v2-nodepools flag.
  • Added a new Console CRD for managing a Redpanda Console deployments. For examples, see acceptance/features/console.feature.
  • Added status.managedPrincipals field to RedpandaRole CRD to track whether the operator is managing role membership. The operator now properly reconciles membership changes when spec.principals is updated, including adding, removing, or clearing all principals.

Changed

  • By default, the operator now reconciles resources (Redpanda, Topic, etc) across all namespaces.

    The --namespace flag may be used to scope the operator's watches to a single namespace.

  • Client certificates are now named $FULLNAME-$CERT-client-cert.

Deprecated

  • The Redpanda console stanza (.spec.clusterSpec.console) is now deprecated in favor of the stand-alone Console CRD.
  • Deprecated various fields in multiple CRDs for kafka, adminAPI, and schemaRegistry under the static configuration of clusterSource so that various fields that were only specifiable via an in-cluster secret can now be pulled via either external secret provider, in-cluster secret, config map, or inlined value.
  • The entirety of the spec.clusterSpec.console block in the Redpanda CR is now deprecated and will be removed in the future. Any Redpanda CR that contains one will automatically be migrated to a standalone Console CR with a back reference to the parent Redpanda CR. Note that these will not be automatically deleted when the console stanza is removed or when the parent Redpanda CR is deleted.

Fixed

  • Fix a bug with the way the config-watcher sidecar syncs users. The Kubernetes mechanism for writing out a changed secret is involves re-creating a symlink in the secrets directory that points to the mounted secret. Previously the config-watcher only detected changes to the entire directory and could potentially miss syncs, this resyncs everything anytime the symlink is recreated.
  • mTLS client certificates are now generated per certificate, as required, instead of using a single and potentially invalid certificate.

v25.2.1-beta1 - 2025-08-18

Changed

  • By default, the operator now reconciles resources (Redpanda, Topic, etc) across all namespaces.

    The --namespace flag may be used to scope the operator's watches to a single namespace.

v25.1.1-beta3 - 2025-05-07

Added

  • Added scheduled sync of ghost broker decommissioner to ensure it's running, even if no watches trigger the reconciler.
  • v1 operator: ExternalSecretRefSelector is now provided for referring to external secrets in clusterConfiguration. This has an optional flag which is honoured if present - it turns errors into warnings if the secret can't be looked up.

Changed

  • [Chart] Moved all template rendering into entry-point.yaml to match the redpanda and console charts.

  • values.schema.json is now "closed" (additionalProperties: false)

    Any unexpected values will result in a validation error,previously they would have been ignored.

  • The redpanda operator's helm chart has been merged into the operator itself.

    Going forward the chart's version and appVersion will always be equal.

  • rbac.createRPKBundleCRs now defaults to true.

  • The operator will now populate .Statefulset.SideCars.Image, if unspecified, with it's own image.

    The image and tag may be controlled with pre-existing --configurator-base-image and --configurator-tag flags, respectively.

    The previous behavior was to defer to the default of the redpanda chart which could result in out of sync RBAC requirements or regressions of sidecar/initcontainer behavior, if using an older redpanda chart.

Deprecated

  • v1 operator: the clusterConfiguration field ExternalSecretRef is deprecated in favour of ExternalSecretRefSelector. Since this field was extremely new, it will be removed in the very near future.

Removed

  • Removed bundled FluxCD controllers, bundled FluxCD CRDs, and support for delegating control to FluxCD.

    Previously reconciled FluxCD resources (HelmRepository, HelmRelease) will NOT be garbage collected upon upgrading. If the operator is coexisting with a FluxCD installation, please take care to manually remove the left over resources.

    chartRef.useFlux: true and chartRef.chartVersion are no longer supported. The controller will log errors and abort reconcilation until the fields are unset. Ensure that both have been removed from all Redpanda resources before upgrading.

    All other chartRef fields are deprecated and are no longer referenced.

    helmRelease, helmReleaseReady, helmRepository, helmRepositoryReady, and upgradeFailures are no longer set on RedpandaStatus, similar to their behavior when useFlux: false was set.

  • gcr.io/kubebuilder/kube-rbac-proxy container is deprecated and has been removed from the Redpanda operator helm chart. The same ports will continue to serve metrics using kubebuilder's built in RBAC.

    Any existing prometheus rules don't need to be adjusted.

    For more details see: kubernetes-sigs/kubebuilder#3907

  • The V1 operator now requires a minimum Redpanda version of 23.2; all feature-gated behaviour that supported older versions is now enabled unconditionally.

  • The kube-prometheus-stack subchart has been removed.

    This integration was not being up kept and most use cases will be better served by deploying this chart themselves.

Fixed

  • Certificate reloading for webhook and metrics endpoints should now behave correctly.
  • The operator will restart the redpanda cluster on any change to the cluster configuration
  • Expanded the set of rules in both Roles and ClusterRoles to be appropriately in sync with the redpanda helm chart.
  • DeprecatedFullNameOverride was interpreted differently between rendering resources and creating kafka, admin and schema registry client. Now deprecated fullNameOverride will be used only if correct FullNameOverride is not provided and handled the same way for both client creation and render function.
  • The Redpanda license was not set by operator. Now it will be set in the first reconciliation. After initial setup the consequent license re-set will be reconciled after client-go cache resync timeout (default 10h).
  • The operator now unconditionally produces statefulsets that have environment variables available to the initContainer that are used for CEL-based config patching.

Previously it attempted to leave existing sts resources unpatched if it seemed like they had already been bootstrapped. With the adoption of CEL patching for node configuration, that left sts pods unable to restart.

  • The operator now unconditionally produces an environment for the initContainer that supports CEL-based patching.

This is required to ensure that a pre-existing sts can roll over to new configuration correctly.

v25.1.1-beta2 - 2025-04-24

Added

  • Added scheduled sync of ghost broker decommissioner to ensure it's running, even if no watches trigger the reconciler.

Changed

  • [Chart] Moved all template rendering into entry-point.yaml to match the redpanda and console charts.

  • values.schema.json is now "closed" (additionalProperties: false)

    Any unexpected values will result in a validation error,previously they would have been ignored.

  • The redpanda operator's helm chart has been merged into the operator itself.

    Going forward the chart's version and appVersion will always be equal.

  • rbac.createRPKBundleCRs now defaults to true.

Removed

  • Removed bundled FluxCD controllers, bundled FluxCD CRDs, and support for delegating control to FluxCD.

    Previously reconciled FluxCD resources (HelmRepository, HelmRelease) will NOT be garbage collected upon upgrading. If the operator is coexisting with a FluxCD installation, please take care to manually remove the left over resources.

    chartRef.useFlux: true and chartRef.chartVersion are no longer supported. The controller will log errors and abort reconcilation until the fields are unset. Ensure that both have been removed from all Redpanda resources before upgrading.

    All other chartRef fields are deprecated and are no longer referenced.

    helmRelease, helmReleaseReady, helmRepository, helmRepositoryReady, and upgradeFailures are no longer set on RedpandaStatus, similar to their behavior when useFlux: false was set.

  • gcr.io/kubebuilder/kube-rbac-proxy container is deprecated and has been removed from the Redpanda operator helm chart. The same ports will continue to serve metrics using kubebuilder's built in RBAC.

    Any existing prometheus rules don't need to be adjusted.

    For more details see: kubernetes-sigs/kubebuilder#3907

  • The V1 operator now requires a minimum Redpanda version of 23.2; all feature-gated behaviour that supported older versions is now enabled unconditionally.

  • The kube-prometheus-stack subchart has been removed.

    This integration was not being up kept and most use cases will be better served by deploying this chart themselves.

Fixed

  • Certificate reloading for webhook and metrics endpoints should now behave correctly.
  • The operator will restart the redpanda cluster on any change to the cluster configuration
  • Expanded the set of rules in both Roles and ClusterRoles to be appropriately in sync with the redpanda helm chart.
  • DeprecatedFullNameOverride was interpreted differently between rendering resources and creating kafka, admin and schema registry client. Now deprecated fullNameOverride will be used only if correct FullNameOverride is not provided and handled the same way for both client creation and render function.

v25.1.1-beta1 - 2025-04-10

Added

  • Added scheduled sync of ghost broker decommissioner to ensure it's running, even if no watches trigger the reconciler.

Changed

  • Bumped internal redpanda chart to v5.9.19. chartRef now defaults to v5.9.19. When useFlux is false, the equivalent of chart v5.9.19 will be deployed.

  • Bumped the internal chart version to v5.9.20.

  • [Chart] Moved all template rendering into entry-point.yaml to match the redpanda and console charts.

  • The redpanda operator's helm chart has been merged into the operator itself.

    Going forward the chart's version and appVersion will always be equal.

Removed

  • Removed bundled FluxCD controllers, bundled FluxCD CRDs, and support for delegating control to FluxCD.

    Previously reconciled FluxCD resources (HelmRepository, HelmRelease) will NOT be garbage collected upon upgrading. If the operator is coexisting with a FluxCD installation, please take care to manually remove the left over resources.

    chartRef.useFlux: true and chartRef.chartVersion are no longer supported. The controller will log errors and abort reconcilation until the fields are unset. Ensure that both have been removed from all Redpanda resources before upgrading.

    All other chartRef fields are deprecated and are no longer referenced.

    helmRelease, helmReleaseReady, helmRepository, helmRepositoryReady, and upgradeFailures are no longer set on RedpandaStatus, similar to their behavior when useFlux: false was set.

  • gcr.io/kubebuilder/kube-rbac-proxy container is deprecated and has been removed from the Redpanda operator helm chart. The same ports will continue to serve metrics using kubebuilder's built in RBAC.

Any existing prometheus rules don't need to be adjusted.

For more details see: kubernetes-sigs/kubebuilder#3907

  • The V1 operator now requires a minimum Redpanda version of 23.2; all feature-gated behaviour that supported older versions is now enabled unconditionally.

Fixed

  • Usage of tpl and include now function as expected when useFlux: false is set.

    {{ (get (fromJson (include "redpanda.Fullname" (dict "a" (list .)))) "r") }} would previously failure with fairly arcane errors.

    Now, the above example will correctly render to a string value. However, syntax errors and the like are still reported in an arcane fashion.

  • Toggling useFlux, in either direction, no longer causes the bootstrap user's password to be regenerated.

    Manual mitigation steps are available here.

  • Certificate reloading for webhook and metrics endpoints should now behave correctly.

  • Expanded the set of rules in both Roles and ClusterRoles to be appropriately in sync with the redpanda helm chart.

v2.3.8-24.3.6 - 2025-03-05

Fixed

  • Fixed the way that paths are handled for the config watcher routine in the sidecar process.

v2.3.6-24.3.3 - 2025-01-17

Added

  • Users in air-gapped environments that cannot access the official Redpanda Helm Chart repository (https://charts.redpanda.com/) can now specify an alternative Helm chart repository using the helm-repository-url flag. In the Redpanda Operator Helm chart, this flag is not exposed as an option in the Helm values. Instead, it must be set as an input in the additionalCmdFlags array.

    The given repository must include the following charts:

    • Redpanda
    • Console
    • Connectors
  • Added resources.limits and resources.requests as an alternative method of managing the redpanda container's resources.

    When both resources.limits and resources.requests are specified, the redpanda container's resources will be set to the provided values and all other keys of resources will be ignored. Instead, all other values will be inferred from the limits and requests.

    This allows fine grain control of resources. i.e. It is now possible to set CPU requests without setting limits:

    resources:
      limits: {} # Specified but no cpu or memory values provided
      requests:
        cpu: 5 # Only CPU requests

Changed

  • For any user that is mirroring configurator image (air-gapped environment) and changes entrypoint or wraps configurator with additional script the following constraint need to be meet:
    • set the following flags
      • to change the container repository set --configurator-base-image=my.repo.com/configurator flag
      • to change the container tag set --configurator-tag=XYZ flag
    • image needs to supports the entrypoint redpanda-operator configure as it is the default one

Fixed

  • Value's merging no longer writes files to disk which prevents the operator from eating disk space when the reconciliation loop is run in rapid succession
  • Fixed slice out of bounds panics when using the fs-validator and useFlux: false