Add NotificationsBusInstance API field #1591

fmount · 2025-09-05T09:09:36Z

This patch introduces the notificationsBusInstance (optional) parameter at API level.
When an override is not defined, it is propagated to the underlying storage components where this field is implemented.

Based on: #1402
Jira: https://issues.redhat.com/browse/OSPRH-15389

openshift-ci · 2025-09-05T09:09:48Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: fmount
Once this PR has been reviewed and has the lgtm label, please assign rabi for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

apis/bases/core.openstack.org_openstackcontrolplanes.yaml

apis/core/v1beta1/openstackcontrolplane_types.go

fmount · 2025-09-05T09:38:17Z

Note that I found #1424 that can be considered a follow up of this patch, where a dedicated rabbitmq instance is introduced.

gibizer

Would it be possible to include the propagation to the nova level field too? I hope it is not too big of a complexity increase to take.

gibizer · 2025-09-05T09:39:03Z

apis/core/v1beta1/openstackcontrolplane_types.go

+	// Bus Service instance used by all services that produce or consume notifications.
+	// Avoid colocating it with RabbitMQ services used for PRC.
+	// That instance will be pushed down for services, unless overriden in templates.
+	NotificationsBusInstance *string `json:"notificationsBusInstance,omitempty"`


OK so the usual considerations:

if this is nil then it means nothing to propagate to the service level values. If the service level value is also nil then it means notifications are disabled. And this is our default.

if this is set to the name of a rabbitmqcluster CR and the service level value is nil, then the rabbitmqcluster name is propagated to that service. This should be our normal way of enabling notififcations for ceilometer across the whole cluster.

if this is set to the name of a rabbitmqcluster CR and the service level value is also set to a different rabbitmqcluster CR name then the sevice level value is kept for that service. This is the complicated case where ceilometer is not needed or only partially needed, and for some reason this specific service needs to use an independent rabbitmqcluster.

We need to check if we need both "" and nil value here to mean different things. If not the we can either opt to use "" only, or define that "" and means the same for this field.

I believe the linked top-scope impl patch follows the above (except we had reached no agreement for empty values meaning)

OK so the usual considerations:

* if this is nil then it means nothing to propagate to the service level values. If the service level value is also nil then it means notifications are disabled. And this is our default.

Correct, this is consistent with the current implementation for storage operators.

* if this is set to the name of a rabbitmqcluster CR and the service level value is nil, then the rabbitmqcluster name is propagated to that service. This should be our normal way of enabling notififcations for ceilometer across the whole cluster.

Correct, this is an assumption consistent with the top-down propagation model.

* if this is set to the name of a rabbitmqcluster CR and the service level value is also set to a different rabbitmqcluster CR name then the sevice level value is kept for that service. This is the complicated case where ceilometer is not needed or only partially needed, and for some reason this specific service needs to use an independent rabbitmqcluster.

Correct, from an API perspective this is how we use to handle overrides at service level. I think the "complication" is something that might be handled by service operators, unless you envision some logic at the openstack-operator level. I think we might want to keep things easy here, and defer more logic (based on the input we receive) at service operator level.

We need to check if we need both "" and nil value here to mean different things. If not the we can either opt to use "" only, or define that "" and means the same for this field.

I think checking if that what the user sets is not an empty string is something we can catch from webhooks, so we can cover updates of the same field (while kubebuilder would help us only at CR creation time) and we can decide to default it to nil. However, I think that raising an error (usual webhook flow) would be better to have a human operator fix the input or remove the parameter entirely.

gibizer · 2025-09-05T09:41:57Z

pkg/openstack/cinder.go

+	// When no NotificationsBusInstance is referenced in the subCR (override)
+	// try to inject the top-level one if defined
+	if instance.Spec.Cinder.Template.NotificationsBusInstance == nil {
+		instance.Spec.Cinder.Template.NotificationsBusInstance = instance.Spec.NotificationsBusInstance


this inherits the "" value to the service level. We need to see if that is what we want

yes, how to handle empty values remained a gray area in the design. See in the #1402 descr:

Note about a special handling expected for an empty value by the services
that will be supporting this interface. It should provide backwards
compatibility during oscp and services CRDs upgrades.

There is no an empty value handling top scope (cannot disable notifications top-scope as a cluster-wide), however. It may only take a default value of a 'rabbitmq'. Use the service templates to override it for an empty value, if needed.

If you look at cinder/glance/manila implementation, for example [1], the parameter is the same, and at openstack-operator it realizes only a parameter inheritance, if required, so I don't see any problem w/ set/update/delete for storage CRs (basically it's the same thing we did for topologies).

[1] https://github.com/openstack-k8s-operators/manila-operator/blob/main/api/v1beta1/manila_types.go#L135

I think this situation is no longer possible as empty strings became prohibited in webhook

apis/bases/core.openstack.org_openstackcontrolplanes.yaml

softwarefactory-project-zuul · 2025-09-05T12:36:08Z

Build failed (check pipeline). Post recheck (without leading slash)
to rerun all jobs. Make sure the failure cause has been resolved before
you rerun jobs.

https://softwarefactory-project.io/zuul/t/rdoproject.org/buildset/95c8ffa28bb84df6841fd4a84713376c

✔️ openstack-k8s-operators-content-provider SUCCESS in 3h 25m 15s
✔️ podified-multinode-edpm-deployment-crc SUCCESS in 1h 41m 46s
✔️ cifmw-crc-podified-edpm-baremetal SUCCESS in 1h 31m 11s
✔️ adoption-standalone-to-crc-ceph-provider SUCCESS in 3h 06m 33s
❌ openstack-operator-tempest-multinode RETRY_LIMIT in 13m 16s

apis/core/v1beta1/openstackcontrolplane_webhook.go

fmount · 2025-09-07T12:28:54Z

Would it be possible to include the propagation to the nova level field too? I hope it is not too big of a complexity increase to take.

Of course, done. Added nova propagation as well.

pkg/openstack/glance.go

fmount · 2025-09-17T07:35:14Z

/retest

amoralej · 2025-09-17T09:44:34Z

Watcher section also supports NotificationsBusInstance. If merging this, It should be also covered for consistency.

Also, neutron supports it, although I'm not sure if there is any reason to skip it in this patch.

fmount · 2025-09-17T09:47:40Z

Watcher section also supports NotificationsBusInstance. If merging this, It should be also covered for consistency.

Also, neutron supports it, although I'm not sure if there is any reason to skip it in this patch.

Interesting, thanks @amoralej, I wasn't aware about the current status. I started with storage but I was going to extend this patch as needed. Let me check the current status and add both neutron and watcher!

auniyal61 · 2025-09-19T05:16:54Z

how existing flow works for nova and neutron is
if we take example of nova - right now user will create new rabbit, wire it with nova spec via notificationsBusInstance. doing so nova operator recognize new rabbit and creates a new transporturl and update confs so its aware of where to send notifications.

then consumer can listen/read on new rabbit.

can you please explain how this will be used from user perspective.

So I think this will works as single top-level switch to enable/disable external notifications from respected service operator.
user will create new-rabbit wire it to top-level notificationsBusInstance and for all supported services/operators, individual transporturl will be created and they will start publishing external notifications to single dedicated new-rabbit

is above correct ?

fmount · 2025-09-19T07:30:25Z

how existing flow works for nova and neutron is if we take example of nova - right now user will create new rabbit, wire it with nova spec via notificationsBusInstance. doing so nova operator recognize new rabbit and creates a new transporturl and update confs so its aware of where to send notifications.

then consumer can listen/read on new rabbit.

can you please explain how this will be used from user perspective.

So I think this will works as single top-level switch to enable/disable external notifications from respected service operator. user will create new-rabbit wire it to top-level notificationsBusInstance and for all supported services/operators, individual transporturl will be created and they will start publishing external notifications to single dedicated new-rabbit

is above correct ?

Correct. It perfectly fine to create a dedicated rabbitmq instance and make this parameter point to the new rabbitmq instance name. By doing this all the underlying services will point to it and emit notifications. From a service perspective this mechanism is not different from the regular rabbimq/transportURL request that is already implemented as a pattern across the board. This part is clearly handled in each service (with a dedicated NotificationBusInstanceReady condition [1] that is evaluated by each service).
What this patch does is:

prevent adding a rabbitmq name that refers to an instance that doesn't exist
propagate the top-level field to the underlying components if it is not present as an ovveride (which is yet another pattern we use to follow for most of the service field)

Any processing logic already exists in the service operators to properly reconcile and configure the services.

[1] https://github.com/openstack-k8s-operators/lib-common/blob/main/modules/common/condition/conditions.go#L72

softwarefactory-project-zuul · 2025-09-19T16:41:26Z

Build failed (check pipeline). Post recheck (without leading slash)
to rerun all jobs. Make sure the failure cause has been resolved before
you rerun jobs.

https://softwarefactory-project.io/zuul/t/rdoproject.org/buildset/885e5285a3054f3a83223613d66fb5c2

✔️ openstack-k8s-operators-content-provider SUCCESS in 2h 01m 03s
✔️ podified-multinode-edpm-deployment-crc SUCCESS in 1h 24m 05s
✔️ cifmw-crc-podified-edpm-baremetal SUCCESS in 1h 45m 42s
❌ adoption-standalone-to-crc-ceph-provider RETRY_LIMIT in 20m 25s
✔️ openstack-operator-tempest-multinode SUCCESS in 1h 39m 36s

fmount · 2025-09-20T11:15:57Z

/test openstack-operator-build-deploy-kuttl-4-18

fmount · 2025-09-20T11:16:08Z

recheck

stuggi · 2025-09-20T11:57:30Z

FYI, there is an issue with the adoption test, which make it to fail.

softwarefactory-project-zuul · 2025-09-20T13:24:54Z

Build failed (check pipeline). Post recheck (without leading slash)
to rerun all jobs. Make sure the failure cause has been resolved before
you rerun jobs.

https://softwarefactory-project.io/zuul/t/rdoproject.org/buildset/0551c0c5f8b54a11aa8cfb52af05c29f

✔️ openstack-k8s-operators-content-provider SUCCESS in 2h 07m 40s
✔️ podified-multinode-edpm-deployment-crc SUCCESS in 1h 19m 05s
✔️ cifmw-crc-podified-edpm-baremetal SUCCESS in 1h 26m 40s
❌ adoption-standalone-to-crc-ceph-provider RETRY_LIMIT in 15m 04s
❌ openstack-operator-tempest-multinode FAILURE in 1h 53m 16s

fmount · 2025-09-22T08:37:02Z

recheck

softwarefactory-project-zuul · 2025-09-22T10:27:04Z

Build failed (check pipeline). Post recheck (without leading slash)
to rerun all jobs. Make sure the failure cause has been resolved before
you rerun jobs.

https://softwarefactory-project.io/zuul/t/rdoproject.org/buildset/4840aa8f77ec4372b20e30509a9ccaf5

✔️ openstack-k8s-operators-content-provider SUCCESS in 1h 48m 57s
✔️ podified-multinode-edpm-deployment-crc SUCCESS in 1h 16m 11s
✔️ cifmw-crc-podified-edpm-baremetal SUCCESS in 1h 33m 42s
❌ adoption-standalone-to-crc-ceph-provider FAILURE in 1h 25m 44s
✔️ openstack-operator-tempest-multinode SUCCESS in 1h 35m 31s

fmount · 2025-09-23T08:28:22Z

rebased and good to go.

fmount · 2025-09-23T12:07:11Z

/test openstack-operator-build-deploy-kuttl

fmount · 2025-09-24T08:16:49Z

/test openstack-operator-build-deploy-kuttl

fmount · 2025-09-24T21:31:16Z

/retest

fmount · 2025-09-25T05:32:19Z

/test openstack-operator-build-deploy-kuttl

fmount · 2025-09-25T07:00:04Z

/test openstack-operator-build-deploy-kuttl

fmount · 2025-09-25T08:04:20Z

still can't get a cluster claim to run kuttl tests in Prow, retrying in a bit

This patch introduces the notificationsBusInstance parameters at API level. When an override is not defined, it is propagated to the underlying storage components where this field is implemented. Signed-off-by: Francesco Pantano <[email protected]>

Because we can't control user input, this patch introduces a webhook function to validate what we expect users to set as notificationsBusInstance top-level parameter. It also adds the webhook associated envTest to validate the use cases covered by the new function. Signed-off-by: Francesco Pantano <[email protected]>

Add comprehensive envTests covering the "notificationsBusInstance" parameter lifecycle and its propagation to implementing services. The tests validate three key scenarios: - Base use case: parameter is properly referenced and used - Override use case: service overrides with a custom/dedicated rabbitmq instance - Removal use case: services disable notifications when parameter is removed This demonstrates the flexibility of referencing non-default rabbit instances and ensures consistent interface implementation across different services. Signed-off-by: Francesco Pantano <[email protected]>

fmount · 2025-09-26T10:14:29Z

ok, finally green CI, I assume this is good to go. Can we land this patch at this point (/cc @abays @stuggi @amoralej @bogdando @gibizer )

bogdando

LGTM

amoralej · 2025-09-29T13:20:08Z

lgtm

fmount requested review from abays and stuggi September 5, 2025 09:09

openshift-ci bot requested review from fultonj and rabi September 5, 2025 09:09

fmount mentioned this pull request Sep 5, 2025

Common Notifications bus interface for services #1402

Closed

fmount requested a review from gibizer September 5, 2025 09:17

fmount commented Sep 5, 2025

View reviewed changes

apis/bases/core.openstack.org_openstackcontrolplanes.yaml Show resolved Hide resolved

fmount commented Sep 5, 2025

View reviewed changes

apis/core/v1beta1/openstackcontrolplane_types.go Show resolved Hide resolved

gibizer reviewed Sep 5, 2025

View reviewed changes

fmount force-pushed the notif branch from 26f9d21 to faacd50 Compare September 5, 2025 13:46

fmount commented Sep 5, 2025

View reviewed changes

apis/core/v1beta1/openstackcontrolplane_webhook.go Show resolved Hide resolved

fmount force-pushed the notif branch 3 times, most recently from a593d78 to 71ce410 Compare September 7, 2025 12:28

fmount force-pushed the notif branch from 71ce410 to dadc25e Compare September 7, 2025 12:34

ratailor reviewed Sep 8, 2025

View reviewed changes

pkg/openstack/glance.go Show resolved Hide resolved

fmount mentioned this pull request Sep 8, 2025

Fix notificationBusInstance openstack-k8s-operators/glance-operator#794

Open

fmount force-pushed the notif branch from dadc25e to 9e2e360 Compare September 16, 2025 10:08

fmount force-pushed the notif branch from 9e2e360 to 17419e0 Compare September 17, 2025 09:50

fmount requested review from amoralej and slawqo September 17, 2025 09:50

fmount mentioned this pull request Sep 19, 2025

Implementation of common notifications bus interface #1424

Draft

fmount force-pushed the notif branch 2 times, most recently from 2565159 to bb05a21 Compare September 19, 2025 14:00

stuggi added the do-not-merge/hold label Sep 22, 2025

stuggi removed the do-not-merge/hold label Sep 22, 2025

fmount force-pushed the notif branch from bb05a21 to bd9c83e Compare September 23, 2025 08:27

fmount added 3 commits September 25, 2025 13:25

fmount force-pushed the notif branch from bd9c83e to 4531db4 Compare September 25, 2025 11:25

fmount requested review from gibizer, bogdando and ratailor September 26, 2025 10:35

bogdando reviewed Sep 26, 2025

View reviewed changes

Add NotificationsBusInstance API field #1591

Are you sure you want to change the base?

Add NotificationsBusInstance API field #1591

Conversation

fmount commented Sep 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

openshift-ci bot commented Sep 5, 2025

Uh oh!

Uh oh!

Uh oh!

fmount commented Sep 5, 2025

Uh oh!

gibizer left a comment

Choose a reason for hiding this comment

Uh oh!

gibizer Sep 5, 2025

Choose a reason for hiding this comment

Uh oh!

gibizer Sep 5, 2025

Choose a reason for hiding this comment

Uh oh!

bogdando Sep 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

fmount Sep 5, 2025

Choose a reason for hiding this comment

Uh oh!

fmount Sep 5, 2025

Choose a reason for hiding this comment

Uh oh!

gibizer Sep 5, 2025

Choose a reason for hiding this comment

Uh oh!

bogdando Sep 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

fmount Sep 5, 2025

Choose a reason for hiding this comment

Uh oh!

bogdando Sep 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

softwarefactory-project-zuul bot commented Sep 5, 2025

Uh oh!

Uh oh!

fmount commented Sep 7, 2025

Uh oh!

Uh oh!

fmount commented Sep 17, 2025

Uh oh!

amoralej commented Sep 17, 2025

Uh oh!

fmount commented Sep 17, 2025

Uh oh!

auniyal61 commented Sep 19, 2025

Uh oh!

fmount commented Sep 19, 2025

Uh oh!

softwarefactory-project-zuul bot commented Sep 19, 2025

Uh oh!

fmount commented Sep 20, 2025

Uh oh!

fmount commented Sep 20, 2025

Uh oh!

stuggi commented Sep 20, 2025

Uh oh!

softwarefactory-project-zuul bot commented Sep 20, 2025

Uh oh!

fmount commented Sep 22, 2025

Uh oh!

softwarefactory-project-zuul bot commented Sep 22, 2025

Uh oh!

fmount commented Sep 23, 2025

Uh oh!

fmount commented Sep 23, 2025

fmount commented Sep 5, 2025 •

edited

Loading

bogdando Sep 5, 2025 •

edited

Loading

bogdando Sep 5, 2025 •

edited

Loading

bogdando Sep 19, 2025 •

edited

Loading

fmount commented Sep 25, 2025 •

edited

Loading

fmount commented Sep 26, 2025 •

edited

Loading