-
Notifications
You must be signed in to change notification settings - Fork 21
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reduce network interruption of quorum migration #1406
base: stackhpc/2023.1
Are you sure you want to change the base?
Conversation
We can reduce the potential network connectivity interruption caused by quorum migration by stopping Keystone and Neutron last and starting them first, at the expense of longer API downtime (because each kayobe invocation first generates configuration).
Any reason this is still a draft @priteau? It looks very helpful for OVS-based systems, would be good to get it in :) |
I set it to draft because I wanted to hear thoughts on the approach. The code change itself is ready. |
I think this approach is fine, although it does mean there will be a longer downtime on the non-critical services. |
@MoteHue didn't you try this recently? Can we merge it now? |
I haven't tried this change personally. I'd prefer that we change this split to be done via a flag, or lookup if OVS is in use. Otherwise we're just extending the outage window on OVN systems with no benefit. |
@grzegorzkoper You have an upcoming OVS upgrade right? Would you mind testing this change when you do that? |
@Alex-Welsh : I can discuss it with our client, if they don't mind I can run it like that. |
They prefer shorter networking downtime vs API downtime. I can test it next week. |
Potentially unrelated, but this is the approach we used for the quorum migration for OVS at Cambridge. https://gitlab.developers.cam.ac.uk/rcs/platforms/cloud-services/arcus-kayobe-config/-/merge_requests/537/diffs#699dcf399709f4a7d55ccaeed5717da26b459bb8 |
Tested, worked like a charm. 👍 |
We can reduce the potential network connectivity interruption caused by quorum migration by stopping Keystone and Neutron last and starting them first, at the expense of longer API downtime (because each kayobe invocation first generates configuration).