Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

hub online synchronization #82

Open
wants to merge 16 commits into
base: master
Choose a base branch
from

Conversation

rjmateus
Copy link
Member

@rjmateus rjmateus commented Nov 28, 2023

This is a RFC to improve content synchronization for HUB scenarios. It also relates to Inter Server Synchronization.

Rendered version

@rjmateus rjmateus marked this pull request as draft November 28, 2023 12:08
@rjmateus rjmateus force-pushed the hub_online_sync branch 6 times, most recently from b440755 to d4594dd Compare November 28, 2023 21:33
@rjmateus rjmateus marked this pull request as ready for review November 29, 2023 12:27
@mcalmer
Copy link
Contributor

mcalmer commented Nov 30, 2023

@rjmateus Would it make sense to add also all the other ISSv2 features to this RFC and plan possible replacements? This would give a full picture but we can still do the implementation in steps. It would not prevent us to start with channels first.
But in case we need for the channels a special feature which needs to be implemented anyways for the full replacement, it would be better to know this before.

@rjmateus
Copy link
Member Author

@mcalmer Good point Michael.
I will add that to the next steps-section, but with some detail in the solution.
This HUB online synchronization will not be able to fully replace ISSv2 because of the disconnected environments. However we may think about a solution using RMT to sync data, and then export to disconnected environment (but I'm not sure if I like it)

Copy link
Contributor

@admd admd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is lovely, and I appreciate it. I only have a few questions, but aside from that, everything appears satisfactory from my perspective.

accepted/0000-hub-online-synchronization.md Outdated Show resolved Hide resolved
accepted/0000-hub-online-synchronization.md Outdated Show resolved Hide resolved
Signed-off-by: Ricardo Mateus <[email protected]>
Signed-off-by: Ricardo Mateus <[email protected]>
Signed-off-by: Ricardo Mateus <[email protected]>
Signed-off-by: Ricardo Mateus <[email protected]>
Comment on lines 109 to 110
We can follow a similar approach to what exists on ISSv1. On the hub side we can define multiple peripheral servers to connect to by providing the FQDN and an authentication token.
On the peripheral side we also need to define the Hub server FQDN and an Authentication token.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hub will have peripheral and generate associated auth token, right?

Why do we need to provide peripheral FQDN? Wouldn't generic peripheral name (may be FQDN) and generated token be enough? Or do we approach this as username/pass scenario?

I am assuming that connection will be always from peripheral to the hub, right?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should be used for authentication from HUB to peripheral.
Communication will be bi-directional. Some cases will be peripheral calling hub (like synchronize software channels and calling SCC endpoints) other cases will be HUB calling the peripheral API like creating channels, pushing configuration channels, etc)

accepted/0000-hub-online-synchronization.md Outdated Show resolved Hide resolved
accepted/0000-hub-online-synchronization.md Outdated Show resolved Hide resolved

## Peripheral software channels creation

We need a mechanism to create the channels in the peripheral servers (vendor and CLM's) in the desired organization. The peripheral channel creation must be done automatically from the HUB server through an API. Since we are making special channel creation (defined next), those API methods should be available to server-to-server communication only.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be good to define a little bit more in detail, how this server-to-server API should look like.
This should also say something about the exiting API namespaces sync.master and sync.slave.

  • What namespace should be used for it?
  • one namespace or multiple?
  • design it with an outlook to the future and what else needs to be added in future to this API. E.g. activation keys, config channels, images, formulas, etc.
  • how should the authentication work?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey Michael. I added some clarification about API namespace and use cases. I didn't add any details about the exact API methods to develop because it looks to me like it's an implementation detail.
Could you have a look if it's more clear now? Thank you

Signed-off-by: Ricardo Mateus <[email protected]>
Copy link
Contributor

@aaannz aaannz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am missing one important section and that is failure scenarios:

  • what happens when peripheral is to be synced but is unavailable?

Seems like particularly in channel creation and CLM, where sync should be done automatically, this scenario can happen.
Since in this case connection direction is expected to be:

other cases will be HUB calling the peripheral API like creating channels, pushing configuration channels, etc)

So if peripheral is unavailable, do we keep track what was updated and what not? And how?

  • what happens when peripheral or hub crashes during the sync?

With ISSv2 everything was just one transaction so inconsistencies should not happen.
We should however check ACIDity of our APIs. And not only individual API calls, but sequences of them which we will use for the sync and define expected failure modes.


An implementation example is available from the community [link](https://github.com/uyuni-project/contrib/blob/main/os-image-tools/osimage-import-export.py).

Communication can be done from the HUB to the peripheral server to synchronize all the necessary data.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would like to point out not everyone would like to sync from HUB to peripheral.

Case being one of our users where they have different SUMAs for different environments - prod, qa, dev.
They build their images using SUMA in the dev environment, once image pass basic validation it is exported and imported to the qa SUMA using above mentioned script. Here image passes more thorough testing and once pass and maint. window open it is again moved to the prod. This process ensures no further changes to the image is done as the import/export do not modify image in any way.

Centrally managed hub network would help them with ensuring same configuration of the those SUMAs, however they would certainly need an ability to either:

  • sync images from the peripheral to another peripheral and to HUB (this can be done outside HUB arch by existing APIs)
  • prevent auto-syncing images from peripheral to HUB and/or overwriting peripheral images from HUB

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point in here Ondrej.
Let's move by parts, and I will start from the end.

The idea is for users to be able to define in the HUB if they were to synchronize all data or only a select set. This way they could control when an image lands on each peripheral server.

HUB server could also have the ability to build images. It will have all the channels, so they create a build host assigned to the HUB server.

Considering those two assumptions would make sense to build the image on HUB server and then transfer that to dev preipheral server and make all the necessary tests. After all dev test where made, we transfer the new image version to other environments (qua, prod) and make it available to all.

If this is not the case, then we can always use the script you mentioned or ISSv2, since it will stay around.

The goal for this solution is scalability only, but other use cases will stay around and may need different implementation and components.

@rjmateus
Copy link
Member Author

@aaannz I added a section about failing scenarios. Do you think is clear enough or should it be more completed?

@srbarrios srbarrios self-requested a review July 5, 2024 10:28
@cbosdo
Copy link

cbosdo commented Jul 25, 2024

Remember that a lazy repo-sync has been started. This may have an impact on the design.

@rjmateus
Copy link
Member Author

Remember that a lazy repo-sync has been started. This may have an impact on the design.

The design of this RFC should not have any impact. I would say is the other way around, the existence of this RFC and the possibility of having a chained SUSE manager server, that may not have the package synchronized is something that will impact the new reposync.

[alternatives]: #alternatives

- Create a new UI and use ISSv2 to synchronize data
- Solve the existing and known problems of ISSv2
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd not discard this point, in addition to implement ISSv3.
As ISSv2 it's going to be used for disconnected environments, we can still bring a better user experience to that use case. For example, can we consider some improvement around performing parallel sql queries?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In terms of performance, the main issues are in the import process. In there we cannot change it to run in parallel.
We can however change the way we do the transaction, and have one transaction per channel, instead of one transaction per export as we have now.
Another possibility is to not have a transition at all, but that can be risky if users start to use the channel during the import process, or if an error occurs during the import.

- All peripheral can start synchronizing at same time
- Can be problematic if we have several peripherals performing full synchronization at same time. However we can configure the peripherals to run repo-sync in different hours and spread the load
- This is only a problem in first sync, since subsequent sync only transfer differences

Copy link
Member

@srbarrios srbarrios Sep 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When synchronizing the channels from SCC service, in theory, we rely on a service with HA.
In that proposal, the peripherals will rely in a unique Hub instance through custom and vendor channels pointing to that machine, what happen if that goes down or.. the Hub disk burns? I would also consider how we can recover under these cases.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We rely on the HUB, but the repo-sync tool already has a retry mechanism. It will try to download/synch the content in the next iteration of mgr-sync taskomatic task.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@srbarrios is this stilla question for you?

@rjmateus rjmateus requested a review from cbbayburt October 10, 2024 15:52
Copy link
Member Author

@rjmateus rjmateus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First review iteration of alternative 2

accepted/0000-hub-online-synchronization.md Outdated Show resolved Hide resolved
accepted/0000-hub-online-synchronization.md Outdated Show resolved Hide resolved

On the peripheral side we call the API on the Hub and fill out the required configuration data needed locally on the peripheral server (e.g. auto generated mirror credential password).
We will also change the `scc_url` configuration on the Peripheral to point to the Hub Server.
We need to establish a secure connection between Hub and Peripheral Server.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rjmateus @mcalmer are we going to drop the existing way of Hub and peripheral servers way of work? What would be the hub_xmlrpc_api role, will that be gone too?

What path are we providing for the users who are working with the current approach.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hub_xmlrpc_api is for ISSv2. In worst case the user can stay in it.
I cannot say if this is compatible with ISSv3. I do not really know what this is doing.
@rjmateus might be able to answer this

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hub_xmlrpc_api will stay as is, since it's note dependent on the ISS version in use. This means it can work with any version of ISS, even the v3.
However, we may need to make some changes to one API that is providing the hub servers FQDN to the xml-rpc API, to instead of looking at the system entitlement it returns a list of configured peripherals

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And what about all different authentication methods https://documentation.suse.com/suma/5.0/en/suse-manager/specialized-guides/large-deployments/hub-auth.html ?
To support them, we will leave it up to the user to create the appropriate API users, ensuring that those users can be utilized to correctly make the API calls later on?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is correct. Users must exists in the peripheral servers to be able to call the API.
In 4.3 this config was possible with a salt formula, wich in 5.0 was transformed to a API call: https://github.com/SUSE/spacewalk/issues/22498

HOwever, I was trying to find the documentation for this and was unable. I ping Cedric and Vladimir to check if it's missing or if I'm blind.

@mcalmer mcalmer mentioned this pull request Oct 16, 2024
12 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants