-
Notifications
You must be signed in to change notification settings - Fork 651
Converting image present in snapshotter fails #3764
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Thanks @ChrisBr Looking. |
Ok, @ChrisBr. Here is my understanding here. Of course I may be wrong or missing something - this part of things is not exactly trivial... So, tagging @lingdie and @AkihiroSuda to keep me in checks... So: Because containerd is lazily retrieving images, said images may be incomplete / missing layers on the host machine, even if they have ran. This is normally fine, though it is not in some circumstances (commit, save, convert) <- because these operations (obviously) require all of the content to be here. Since containerd does not currently have a solution for that problem, nerdctl has implemented logic (in #3435 and others) that will verify that images are complete before said operations, and if layers are missing, retrieve them from their origin (note that we have to extend that logic to This approach is obviously not bullet-proof. If there are indeed missing layers, and the remote image has vanished, or cannot be reached, then we (must) fail. I believe what is happening for you here is:
Unfortunately, there might not be anything that could be done inside nerdctl per-se. Please note though that my understanding of GKE snapshotter is very superficial. So, here are two things we could try to confirm above:
^ I would expect this to fail |
Thanks for the detailed write up @apostasie, really appreciated. This makes sense but is of course unfortunate for our use case 😿
I will try this tomorrow morning and will report back. Could you maybe clarify why it would work with nerdctl (if nerdctl were to use GKE snapshotter as well) and not with docker? |
If there would be a way with the GKE snapshotter to force download the base image, would you expect this work? Pseudo code
|
It would work with nerdctl, because nerdctl will force content download when you |
Mmmm... I am not 100% comfortable about the interactions between containerd and the GKE snapshotter, but I would say "yes it should work" cautiously. You should be able to check images status using
Should tell you which images are fully locally available, and which have missing layers. |
Ok did some testing this morning Using nerdctl like this works
Using docker save works as well even though image seems incomplete 🤔
Do you think we could pull the images with ctr / nerdctl to force them available (e.g. use overlay as snapshotter)? |
Something like that should work:
Then you do your normal stuff after that with docker / etc. Tagging with nerdctl will force fetch the missing layers. I appreciate none of this is cute... we do have a serious problem here. But so far, there is no good solution... |
Concerning. |
Any easy way to confirm?
Yeah appreciate your help, difficult corner case here ... 😿 |
I think so: save it with docker like you just did (possibly incomplete) - then remove everything and just use nerdctl (pull alpine, save alpine), then extract both archives and diff. |
Happy new year!
This seems to work and would unblock us. Thanks for the help! |
@apostasie I appreciate this might only be tangible related but we're seeing issues when a zstd converted image is not fully imported in the containerd content store. Here is an example of two manifests
Image 2 builds on top of image 1, so the layers should be all shared. If I pull image 1 first I end up with
Image 1 is fully available but image 2 is incomplete? I can run the image regardless so my gut feeling is that the content store is not able to make the connection that the layer is available but in a different compression format? If I only pull image 2, it will be fully available. Is there anything you can think of causing this? |
Hey @ChrisBr Sorry about the delay here. I took a break for a few weeks. |
@ChrisBr would you be able to provide step-by-step commands to reproduce the issue? |
Hey @ChrisBr - gentle ping on this ^ |
@apostasie sorry was a bit swamped recently. I tried to come up with a simple example but couldn't. I will try to spend a bit more time on this soon. Thanks for your help, really appreciated 🙇 |
I don't have reproducible steps yet but we're doing what you suggested earlier and always tag the image before converting but even this fails quite frequently.
The image does exist in the registry. Further more the image also exists locally because, on fallback, I do a Could you recommend some steps to debug this further because I'm really scratching my head here. I can't reproduce it but it does happen every ~50 builds so I can add debug code to our builder and should be able to get more information quite quickly. |
@ChrisBr even without a proper reproducer that would guarantee to trigger the bug, can you share the exact steps you are doing so that I can torture test it? (is that pull, tag, convert, push, then pull?) |
Yeah so basically it's this
Could it be that using the same tag trips it? |
Thanks. So, let's start with this. Is the following If yes, can you
|
@ChrisBr
They are all related to the overall issue of (likely) certain layers being garbage collected breaking operations like I believe your latest issue here might have been fixed by one of the above (most likely #4121). If you do get a chance, would love to see you try the latest nerdctl |
Actually - @AkihiroSuda suggesting we close this as resolved. This ticket has gone to a range of different places and different problems and questions, and it is no longer clear what it is about or what issue it describes - except of course @ChrisBr last mention of failed @ChrisBr if it turns out this is not fixed, I would owe you an apology (and a beer on me, the next time I am in Cambridge :-)) and would of course look at any new details on this. Cheers! |
Amazing, thank you! Any rough idea when you do the next patch release? |
@ChrisBr: About this here, I want to change my statement from "yeah, it's fixed!", to "cautiously, this should be fixed". The good news is that we have a much better grasp of what is happening when |
Description
Most likely related to #3435.
When using the GKE image streaming snapshotter, converting an image after we commit it locally from a running container (e.g. with docker), the converter tries to pull the image from remote which then fails (because we haven't pushed it).
The image is present in the snapshotter (as we can run it) but apparently not in the content store.
Steps to reproduce the issue
Describe the results you received and expected
I expect to be able to convert (tagging, committing, saving) the image.
What version of nerdctl are you using?
Are you using a variant of nerdctl? (e.g., Rancher Desktop)
None
Host information
No response
The text was updated successfully, but these errors were encountered: