-
Notifications
You must be signed in to change notification settings - Fork 23
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Could you use Resalloc to avoid the initial waiting for machines? #288
Comments
Resalloc might be missing some important features for multi-platform-controller, let us know if so (I'm the author of the project and one of the current maintainers) |
probably not going to happen, we are probably not going to integrate a Python project into MPC. Also the existing dynamic-pool capability we already have is probably good enough. If we are going to, at some point, incorporate something external it would probably be based on DRA or using Kata-containers with out-of-cluster hypervisors. |
This runs as a separate small service, containerized in OSH, e.g..
Has any of those tools resolved the "pre-allocation" of resources? |
Those tools have huge communities and flexible external provider support - some of those providers may support pre-allocated resource pools. But my point is is that we don't want to use our own tools and would rather adopt emerging K8s standards.
So one more service we would need to run, scale, monitor, write SOPs for, debug, etc. I suppose its also stateful, so we would also need to maintain some kind of a database for it? |
I appreciate this perspective! But there's a tool that solves the
Yes, sure. For the existing deployments (hundreds of maintained VMs in The scale of Konflux may be different, though, and the particular service
It depends on use-case; the database may be local (and created when pod starts). But then, if |
#283 seems related |
That is simply augmenting our IBM cloud driver to support the pooling logic we already have (Which already works very well on AWS). As I already mentioned, for now it seems good enough for what we need. |
/me is coming from the RPM build system world...
Allocation of the worker now takes >= 2 minutes according to my experience with Konflux instances. The problem varies depending on what RPM package is being built, but most of them are built in Mock in less than a minute. If we want to make the builds SLSA isolated, we'll need to have to run Mock twice (the second run would be faster), so lemme claim 2 minutes in total. The problem of allocating machines on demand gives us 2x2mins+ penalty for every given build (the majority of the task time is spent on VM allocation). What this ticket proposes is to make it a little-to-zero penalty. |
Well, for most cases I know the fully dynamic on-demand configuration is used - and that would indeed incur the penalty you mention. Its possible configure things differently to use the so called "dynamic pool" - but that kind of conversation probably belongs in the support channels for the team who is maintaining your cluster, and not in an issue for the open source controller project. BTM if you are using some isolation like mock, it may also be possible to run multiple pipelines on the same periodically-reallocated host to make things even faster. |
I forgot to react, mock doesn't really isolate, but we do mock-in-podman for this.
Don't you have documentation for this? The builds are often performance-consuming, and I'm still afraid that we can not let multiple users on the same machine at the same time for security reasons, but I'd like to understand how this works. It could be an option. |
IIRC mock can run its root environment inside a namepsace, not just a plain chroot - though depending on what you do, sometimes a chroot is enough. But all of that is besides the point.
See the "Dynamic Pool" section of the MPC architecture document: |
#197 is also strongly related here. When (hopefully not "if") copr switches to scheduling pods in Kubernetes to build RPMs, then all the logic that already exists for both vertical and horizontal pod autoscaling applies. resalloc has a high overlap with those today. |
@cgwalters there is some discussion going on about the possibility of completely replacing MPC for the Red Hat clusters, if you're interested, reach out to me on the internal slack so I can loop you in. |
Hmm. Last time I checked, running K8 on different arches (x86_64, aarch64, s390x, ppc64le) in several regions from two different cloud providers was non-trivial. Is it production-ready now? |
There's several levels of this. But to start there's: https://docs.openshift.com/container-platform/4.17/post_installation_configuration/configuring-multi-arch-compute-machines/multi-architecture-configuration.html That said if you dig in a lot of the discussion in #197 is about different techniques for doing Kubernetes-like things for builds (and general runtime) without necessarily running a persistent cluster - which this project heavily overlaps with (and copr/koji also overlap). |
Resalloc implements a machine pool pre-allocation: https://github.com/praiskup/resalloc
Used e.g. by Fedora Copr and OSH
The text was updated successfully, but these errors were encountered: