Skip to content

Converting image present in snapshotter fails #3764

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
ChrisBr opened this issue Dec 16, 2024 · 25 comments
Closed

Converting image present in snapshotter fails #3764

ChrisBr opened this issue Dec 16, 2024 · 25 comments
Labels
bug Something isn't working

Comments

@ChrisBr
Copy link

ChrisBr commented Dec 16, 2024

Description

Most likely related to #3435.

When using the GKE image streaming snapshotter, converting an image after we commit it locally from a running container (e.g. with docker), the converter tries to pull the image from remote which then fails (because we haven't pushed it).

The image is present in the snapshotter (as we can run it) but apparently not in the content store.

Steps to reproduce the issue

> docker run --name foo alpine
> docker commit foo gcr.io/$repo/cbruckmayer:foo

# I can use the image with nerdctl
> nerdctl --address /var/run/docker/containerd/containerd.sock --namespace moby --snapshotter gcfs run -it --network none gcr.io/$repo/cbruckmayer:foo

# However, when trying to convert the image, it will try to pull it from remote which fails with not found

> nerdctl --address /var/run/docker/containerd/containerd.sock --namespace moby --snapshotter gcfs image convert --zstd --oci gcr.io/$repo/cbruckmayer:foo gcr.io/$repo/cbruckmayer:foo-zstd
gcr.io/$repo/cbruckmayer:foo: resolving      |--------------------------------------|
elapsed: 0.1 s                                total:   0.0 B (0.0 B/s)
INFO[0000] fetch failed after status: 404 Not Found      host=gcr.io
FATA[0000] failed to resolve reference "gcr.io/$repo/cbruckmayer:foo": gcr.io/$repo/cbruckmayer:foo: not found
root@docker-daemon-rzhtc:/app# nerdctl --debug-full --address /var/run/docker/containerd/containerd.sock --namespace moby --snapshotter gcfs image convert --zstd --oci gcr.io/$repo/cbruckmayer:foo gcr.io/$repo/cbruckmayer:foo-zstd
DEBU[0000] fetching                                      image="gcr.io/$repo/cbruckmayer:foo"
DEBU[0000] resolving                                     host=gcr.io
DEBU[0000] do request                                    host=gcr.io request.header.accept="application/vnd.docker.distribution.manifest.v2+json, application/vnd.docker.distribution.manifest.list.v2+json, application/vnd.oci.image.manifest.v1+json, application/vnd.oci.image.index.v1+json, */*" request.header.user-agent=containerd/2.0.0+unknown request.method=HEAD url="https://gcr.io/v2/$repo/cbruckmayer/manifests/foo"
DEBU[0000] fetch response received                       host=gcr.io response.header.accept-ranges=none response.header.content-type=application/json response.header.date="Fri, 13 Dec 2024 13:10:25 GMT" response.header.docker-distribution-api-version=registry/2.0 response.header.server="Docker Registry" response.header.vary=Accept-Encoding response.header.www-authenticate="Bearer realm=\"https://gcr.io/v2/token\",service=\"gcr.io\",scope=\"repository:$repo/cbruckmayer:pull\"" response.header.x-frame-options=SAMEORIGIN response.header.x-xss-protection=0 response.status="401 Unauthorized" url="https://gcr.io/v2/$repo/cbruckmayer/manifests/foo"
DEBU[0000] Unauthorized                                  header="Bearer realm=\"https://gcr.io/v2/token\",service=\"gcr.io\",scope=\"repository:$repo/cbruckmayer:pull\"" host=gcr.io
DEBU[0000] do request                                    host=gcr.io request.header.accept="application/vnd.docker.distribution.manifest.v2+json, application/vnd.docker.distribution.manifest.list.v2+json, application/vnd.oci.image.manifest.v1+json, application/vnd.oci.image.index.v1+json, */*" request.header.user-agent=containerd/2.0.0+unknown request.method=HEAD url="https://gcr.io/v2/$repo/cbruckmayer/manifests/foo"
DEBU[0000] fetch response received                       host=gcr.io response.header.accept-ranges=none response.header.content-type=application/json response.header.date="Fri, 13 Dec 2024 13:10:25 GMT" response.header.docker-distribution-api-version=registry/2.0 response.header.server="Docker Registry" response.header.vary=Accept-Encoding response.header.x-frame-options=SAMEORIGIN response.header.x-xss-protection=0 response.status="404 Not Found" url="https://gcr.io/v2/$repo/cbruckmayer/manifests/foo"
INFO[0000] fetch failed after status: 404 Not Found      host=gcr.io
FATA[0000] failed to resolve reference "gcr.io/$repo/cbruckmayer:foo": gcr.io/shopify-docker-images/cbruckmayer:foo: not found

Describe the results you received and expected

I expect to be able to convert (tagging, committing, saving) the image.

What version of nerdctl are you using?

> nerdctl --address /var/run/docker/containerd/containerd.sock --namespace moby version
Client:
 Version:	v2.0.2
 OS/Arch:	linux/amd64
 Git commit:	1220ce7ec2701d485a9b1beeea63dae3da134fb5
 buildctl:
  Version:

Server:
 containerd:
  Version:	1.7.24
  GitCommit:	88bf19b2105c8b17560993bee28a01ddc2f97182
 runc:
  Version:	1.2.2
  GitCommit:	v1.2.2-0-g7cb3632

Are you using a variant of nerdctl? (e.g., Rancher Desktop)

None

Host information

No response

@apostasie
Copy link
Contributor

Thanks @ChrisBr

Looking.

@apostasie
Copy link
Contributor

apostasie commented Dec 16, 2024

Ok, @ChrisBr. Here is my understanding here.

Of course I may be wrong or missing something - this part of things is not exactly trivial... So, tagging @lingdie and @AkihiroSuda to keep me in checks...

So:

Because containerd is lazily retrieving images, said images may be incomplete / missing layers on the host machine, even if they have ran.

This is normally fine, though it is not in some circumstances (commit, save, convert) <- because these operations (obviously) require all of the content to be here.

Since containerd does not currently have a solution for that problem, nerdctl has implemented logic (in #3435 and others) that will verify that images are complete before said operations, and if layers are missing, retrieve them from their origin (note that we have to extend that logic to tag as well).

This approach is obviously not bullet-proof. If there are indeed missing layers, and the remote image has vanished, or cannot be reached, then we (must) fail.

I believe what is happening for you here is:

  • docker (with containerd lazy pull) only retrieves part of the alpine image
  • docker commits that, results in image referencing missing layers
  • nerdctl convert:
    • verifies if the image is complete in the store
      func EnsureAllContent(ctx context.Context, client *containerd.Client, srcName string, options types.GlobalCommandOptions) error {
      // Get the image from the srcName
      imageService := client.ImageService()
      img, err := imageService.Get(ctx, srcName)
      if err != nil {
      return err
      }
      provider := containerdutil.NewProvider(client)
      snapshotter := containerdutil.SnapshotService(client, options.Snapshotter)
      // Read the image
      imagesList, _ := read(ctx, provider, snapshotter, img.Target)
      // Iterate through the list
      for _, i := range imagesList {
      err = ensureOne(ctx, client, srcName, img.Target, i.platform, options)
      if err != nil {
      return err
      }
      }
      return nil
      }
      func ensureOne(ctx context.Context, client *containerd.Client, rawRef string, target ocispec.Descriptor, platform ocispec.Platform, options types.GlobalCommandOptions) error {
      parsedReference, err := referenceutil.Parse(rawRef)
      if err != nil {
      return err
      }
      pltf := []ocispec.Platform{platform}
      platformComparer := platformutil.NewMatchComparerFromOCISpecPlatformSlice(pltf)
      _, _, _, missing, err := images.Check(ctx, client.ContentStore(), target, platformComparer)
      if err != nil {
      return err
      }
      if len(missing) > 0 {
    • image is NOT complete and missed layers - nerdctl tries to retrieve content from remote (using image name) - there is no such remote image
    • nerdctl refuses to try to convert, and exits

Unfortunately, there might not be anything that could be done inside nerdctl per-se.
We could "soft-error", but then what would happen is that the convert operation WILL fail after that, as layers are missing.
This is a situation where content is not there, and we have no idea / information on how to retrieve it.

Please note though that my understanding of GKE snapshotter is very superficial.

So, here are two things we could try to confirm above:

  1. redo your scenario, but using ONLY nerdctl all the way (for run and commit) - I would expect this to work
  2. using docker this time:
docker run --name foo alpine
docker commit foo gcr.io/$repo/cbruckmayer:foo
docker save gcr.io/$repo/cbruckmayer:foo -o /tmp/whatever.tar

^ I would expect this to fail

@ChrisBr
Copy link
Author

ChrisBr commented Dec 16, 2024

Thanks for the detailed write up @apostasie, really appreciated. This makes sense but is of course unfortunate for our use case 😿

redo your scenario, but using ONLY nerdctl all the way (for run and commit) - I would expect this to work
using docker this time:
using docker this time:

I will try this tomorrow morning and will report back. Could you maybe clarify why it would work with nerdctl (if nerdctl were to use GKE snapshotter as well) and not with docker?

@ChrisBr
Copy link
Author

ChrisBr commented Dec 16, 2024

If there would be a way with the GKE snapshotter to force download the base image, would you expect this work?

Pseudo code

> docker pull alpine --force
> docker run -it alpine --name foo
> docker commit foo gcr.io/cbruckmayer/foo:gzip
> nerdctl image convert --zstd --oci gcr.io/cbruckmayer/foo:gzip gcr.io/cbruckmayer/foo:zstd

@apostasie
Copy link
Contributor

apostasie commented Dec 16, 2024

Thanks for the detailed write up @apostasie, really appreciated. This makes sense but is of course unfortunate for our use case 😿

redo your scenario, but using ONLY nerdctl all the way (for run and commit) - I would expect this to work
using docker this time:
using docker this time:

I will try this tomorrow morning and will report back. Could you maybe clarify why it would work with nerdctl (if nerdctl were to use GKE snapshotter as well) and not with docker?

It would work with nerdctl, because nerdctl will force content download when you commit (at that time, we do know the image is alpine, and retrieving the missing layers will work). Further operations after that will recon the image is complete, and should succeed without trying to download anything.

@apostasie
Copy link
Contributor

If there would be a way with the GKE snapshotter to force download the base image, would you expect this work?

Pseudo code

> docker pull alpine --force
> docker run -it alpine --name foo
> docker commit foo gcr.io/cbruckmayer/foo:gzip
> nerdctl image convert --zstd --oci gcr.io/cbruckmayer/foo:gzip gcr.io/cbruckmayer/foo:zstd

Mmmm...

I am not 100% comfortable about the interactions between containerd and the GKE snapshotter, but I would say "yes it should work" cautiously.

You should be able to check images status using ctr:

ctr --address /run/user/501/containerd/containerd.sock images check

Should tell you which images are fully locally available, and which have missing layers.

@ChrisBr
Copy link
Author

ChrisBr commented Dec 17, 2024

Ok did some testing this morning

Using nerdctl like this works

> ctr --address /var/run/docker/containerd/containerd.sock --namespace moby images ^C
> ctr --address /var/run/docker/containerd/containerd.sock --namespace moby images check
> nerdctl --address /var/run/docker/containerd/containerd.sock --namespace moby --snapshotter gcfs run -it --network none alpine
docker.io/library/alpine:latest:                                                  resolved       |++++++++++++++++++++++++++++++++++++++|
index-sha256:21dc6063fd678b478f57c0e13f47560d0ea4eeba26dfc947b2a4f81f686b9f45:    done           |++++++++++++++++++++++++++++++++++++++|
manifest-sha256:2c43f33bd1502ec7818bce9eea60e062d04eeadc4aa31cad9dabecb1e48b647b: done           |++++++++++++++++++++++++++++++++++++++|
config-sha256:4048db5d36726e313ab8f7ffccf2362a34cba69e4cdd49119713483a68641fce:   done           |++++++++++++++++++++++++++++++++++++++|
elapsed: 0.9 s                                                                    total:  10.0 K (11.1 KiB/s)
/ # exit
> ctr --address /var/run/docker/containerd/containerd.sock --namespace moby images check
REF                             TYPE                                    DIGEST                                                                  STATUS           SIZE            UNPACKED
docker.io/library/alpine:latest application/vnd.oci.image.index.v1+json sha256:21dc6063fd678b478f57c0e13f47560d0ea4eeba26dfc947b2a4f81f686b9f45 incomplete (1/2) 581.0 B/3.5 MiB false
> nerdctl --address /var/run/docker/containerd/containerd.sock --namespace moby --snapshotter gcfs ps -a
CONTAINER ID    IMAGE                              COMMAND      CREATED           STATUS                       PORTS    NAMES
34b3acce6716    docker.io/library/alpine:latest    "/bin/sh"    25 seconds ago    Exited (0) 23 seconds ago             alpine-34b3a
03dd44670226    docker.io/library/alpine:latest    "/bin/sh"    5 minutes ago     Exited (0) 5 minutes ago              alpine-03dd4
baebcdac8bb8    docker.io/library/alpine:latest    "/bin/sh"    14 minutes ago    Exited (0) 14 minutes ago             alpine-baebc
> nerdctl --address /var/run/docker/containerd/containerd.sock --namespace moby --snapshotter gcfs commit 34b3acce6716 gcr.io/cbruckmayer/foo:gzip
docker.io/library/alpine:latest:                                                  resolved       |++++++++++++++++++++++++++++++++++++++|
index-sha256:21dc6063fd678b478f57c0e13f47560d0ea4eeba26dfc947b2a4f81f686b9f45:    exists         |++++++++++++++++++++++++++++++++++++++|
manifest-sha256:2c43f33bd1502ec7818bce9eea60e062d04eeadc4aa31cad9dabecb1e48b647b: exists         |++++++++++++++++++++++++++++++++++++++|
layer-sha256:38a8310d387e375e0ec6fabe047a9149e8eb214073db9f461fee6251fd936a75:    done           |++++++++++++++++++++++++++++++++++++++|
config-sha256:4048db5d36726e313ab8f7ffccf2362a34cba69e4cdd49119713483a68641fce:   exists         |++++++++++++++++++++++++++++++++++++++|
elapsed: 0.8 s                                                                    total:  2.0 Mi (2.5 MiB/s)
sha256:a1dea7e5ed24658076f7539e931c45c159f12708bc0e6f4265bbf28c76e6523c
> ctr --address /var/run/docker/containerd/containerd.sock --namespace moby images check
REF                             TYPE                                                 DIGEST                                                                  STATUS         SIZE            UNPACKED
docker.io/library/alpine:latest application/vnd.oci.image.index.v1+json              sha256:21dc6063fd678b478f57c0e13f47560d0ea4eeba26dfc947b2a4f81f686b9f45 complete (2/2) 3.5 MiB/3.5 MiB false
gcr.io/cbruckmayer/foo:gzip     application/vnd.docker.distribution.manifest.v2+json sha256:a783672836dc8bb9904e38608bc5a853cceba614c436322d2244c866f8be2117 complete (3/3) 3.5 MiB/3.5 MiB false

Using docker save works as well even though image seems incomplete 🤔

ctr  --address /var/run/docker/containerd/containerd.sock --namespace moby images check
> docker run -it --name foo alpine
Unable to find image 'alpine:latest' locally
latest: Pulling from library/alpine
Digest: sha256:21dc6063fd678b478f57c0e13f47560d0ea4eeba26dfc947b2a4f81f686b9f45
Status: Downloaded newer image for alpine:latest
/ # exit
> ctr  --address /var/run/docker/containerd/containerd.sock --namespace moby images check
REF                             TYPE                                    DIGEST                                                                  STATUS           SIZE            UNPACKED
docker.io/library/alpine:latest application/vnd.oci.image.index.v1+json sha256:21dc6063fd678b478f57c0e13f47560d0ea4eeba26dfc947b2a4f81f686b9f45 incomplete (1/2) 581.0 B/3.5 MiB false
> docker commit foo gcr.io/images/foo:gzip
sha256:708ca513ed0e0ca52ab1a78a1b2c9e46da70787fc858b847035c7ec67149dd39
> ctr  --address /var/run/docker/containerd/containerd.sock --namespace moby images check
REF                             TYPE                                       DIGEST                                                                  STATUS           SIZE            UNPACKED
docker.io/library/alpine:latest application/vnd.oci.image.index.v1+json    sha256:21dc6063fd678b478f57c0e13f47560d0ea4eeba26dfc947b2a4f81f686b9f45 incomplete (1/2) 581.0 B/3.5 MiB false
gcr.io/images/foo:gzip          application/vnd.oci.image.manifest.v1+json sha256:708ca513ed0e0ca52ab1a78a1b2c9e46da70787fc858b847035c7ec67149dd39 incomplete (2/3) 875.0 B/3.5 MiB false
> ctr  --address /var/run/docker/containerd/containerd.sock --namespace moby images check
REF                             TYPE                                       DIGEST                                                                  STATUS           SIZE            UNPACKED
docker.io/library/alpine:latest application/vnd.oci.image.index.v1+json    sha256:21dc6063fd678b478f57c0e13f47560d0ea4eeba26dfc947b2a4f81f686b9f45 incomplete (1/2) 581.0 B/3.5 MiB false
gcr.io/images/foo:gzip          application/vnd.oci.image.manifest.v1+json sha256:708ca513ed0e0ca52ab1a78a1b2c9e46da70787fc858b847035c7ec67149dd39 incomplete (2/3) 875.0 B/3.5 MiB false
> docker save gcr.io/images/foo:gzip -o /tmp/whatever.tar
> echo $?
0
> ls -lah /tmp/whatever.tar
-rw------- 1 root root 6.5K Dec 17 11:04 /tmp/whatever.tar
> ctr  --address /var/run/docker/containerd/containerd.sock --namespace moby images check
REF                             TYPE                                       DIGEST                                                                  STATUS           SIZE            UNPACKED
docker.io/library/alpine:latest application/vnd.oci.image.index.v1+json    sha256:21dc6063fd678b478f57c0e13f47560d0ea4eeba26dfc947b2a4f81f686b9f45 incomplete (1/2) 581.0 B/3.5 MiB false
gcr.io/images/foo:gzip          application/vnd.oci.image.manifest.v1+json sha256:708ca513ed0e0ca52ab1a78a1b2c9e46da70787fc858b847035c7ec67149dd39 incomplete (2/3) 875.0 B/3.5 MiB false

Do you think we could pull the images with ctr / nerdctl to force them available (e.g. use overlay as snapshotter)?

@apostasie
Copy link
Contributor

Do you think we could pull the images with ctr / nerdctl to force them available (e.g. use overlay as snapshotter)?

Something like that should work:

nerdctl --address /var/run/docker/containerd/containerd.sock --namespace moby --snapshotter gcfs pull alpine
nerdctl --address /var/run/docker/containerd/containerd.sock --namespace moby --snapshotter gcfs tag alpine this_is_a_cheat
nerdctl --address /var/run/docker/containerd/containerd.sock --namespace moby --snapshotter gcfs rmi this_is_a_cheat

Then you do your normal stuff after that with docker / etc.

Tagging with nerdctl will force fetch the missing layers.
Check with ctr that you get the expected results ^.

I appreciate none of this is cute... we do have a serious problem here. But so far, there is no good solution...

@apostasie
Copy link
Contributor

Using docker save works as well even though image seems incomplete 🤔

Concerning.
Does the saved tarball contain everything or just bits and pieces?

@ChrisBr
Copy link
Author

ChrisBr commented Dec 17, 2024

Does the saved tarball contain everything or just bits and pieces?

Any easy way to confirm?

I appreciate none of this is cute... we do have a serious problem here. But so far, there is no good solution...

Yeah appreciate your help, difficult corner case here ... 😿

@apostasie
Copy link
Contributor

Does the saved tarball contain everything or just bits and pieces?

Any easy way to confirm?

I think so: save it with docker like you just did (possibly incomplete) - then remove everything and just use nerdctl (pull alpine, save alpine), then extract both archives and diff.

@ChrisBr
Copy link
Author

ChrisBr commented Jan 2, 2025

Happy new year!

Tagging with nerdctl will force fetch the missing layers.
Check with ctr that you get the expected results ^.

This seems to work and would unblock us. Thanks for the help!

@ChrisBr
Copy link
Author

ChrisBr commented Jan 17, 2025

@apostasie I appreciate this might only be tangible related but we're seeing issues when a zstd converted image is not fully imported in the containerd content store.

Here is an example of two manifests

{
   "mediaType": "application/vnd.oci.image.manifest.v1+json",
   "schemaVersion": 2,
   "config": {
      "mediaType": "application/vnd.oci.image.config.v1+json",
      "digest": "sha256:a653ebdf175e3c0846e28189eae335027969d84f94c27c7efc87db6af809a07f",
      "size": 4958
   },
   "layers": [
      {
         "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
         "digest": "sha256:afad30e59d72d5c8df4023014c983e457f21818971775c4224163595ec20b69f",
         "size": 29751784
      },
      {
         "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
         "digest": "sha256:a39dfc0474809ba71b2a67763320551d9480bd6c64c1906cf12d53f8f32be43e",
         "size": 5333933
      },
      {
         "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
         "digest": "sha256:b83a7fd18e860b5333afb74b83ff8ae3fcc7f59f606ad67817c38f19be3178cd",
         "size": 161
      },
      {
         "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
         "digest": "sha256:900ebfb9ed48286d011a7610f01ce6c4a46c3f472fa66340a22b52958847df34",
         "size": 57549
      },
      {
         "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
         "digest": "sha256:5c9382e782b277ad5c8d6c0d02e6933fbdfa9a919f4e0324b7dcc4a3e70d850f",
         "size": 139607
      },
      {
         "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
         "digest": "sha256:0ad64049285a6de70a72ca2bb826476bd1707cae3728f2eae23a1e58480b2cc4",
         "size": 2526
      },
      {
         "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
         "digest": "sha256:4a1aefe53c353341b321e3e9cc834deba50aed15ae69ae00a9aae6e8e3a36b80",
         "size": 295659621
      },
      {
         "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
         "digest": "sha256:054615aa6bdf523fb56ae1cfd95ac2e1868cae2e2c3a06d842fe14ad084cad7f",
         "size": 3525
      },
      {
         "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
         "digest": "sha256:4f4fb700ef54461cfa02571ae0db9a0dc1e0cdb5577484a6d75e68dc38e8acc1",
         "size": 32
      },
      {
         "mediaType": "application/vnd.oci.image.layer.v1.tar+gzip",
         "digest": "sha256:fb0bf78a130cf9d4e1719b52e49124ce803c2dbe06212b92995956b024b0b2a4",
         "size": 43362773
      }
   ]
}
{
   "mediaType": "application/vnd.oci.image.manifest.v1+json",
   "schemaVersion": 2,
   "config": {
      "mediaType": "application/vnd.oci.image.config.v1+json",
      "digest": "sha256:ed5f9346c781bda382a338768c7cb62de81b6b692f9aacea1ae7e3f7acfd4455",
      "size": 5098
   },
   "layers": [
      {
         "mediaType": "application/vnd.oci.image.layer.v1.tar+zstd",
         "digest": "sha256:32fb2971d7a739f15281e47993b8260f63d6da29d3b34a05832574ee264a2391",
         "size": 27212784
      },
      {
         "mediaType": "application/vnd.oci.image.layer.v1.tar+zstd",
         "digest": "sha256:53d218ebab3a431688352a77cb535c091ef8d4cb3caa006a842750566632ce0f",
         "size": 5339416
      },
      {
         "mediaType": "application/vnd.oci.image.layer.v1.tar+zstd",
         "digest": "sha256:bbe2c8c2081503e5bad22bf2f781963424b2686801cccc365d08a485170346f9",
         "size": 145
      },
      {
         "mediaType": "application/vnd.oci.image.layer.v1.tar+zstd",
         "digest": "sha256:8d320c0135416ce62210304ee3ca90d5b9cdaf44acb3e9fb8775aa965c52bc00",
         "size": 37297
      },
      {
         "mediaType": "application/vnd.oci.image.layer.v1.tar+zstd",
         "digest": "sha256:c5f6bf11ea26e9c53c3e52d19f82c35d41e28c22c79ca617fe2709a07331719e",
         "size": 120349
      },
      {
         "mediaType": "application/vnd.oci.image.layer.v1.tar+zstd",
         "digest": "sha256:2529e3d3f8d67e42e1ddd0cae92a5295fab8d1d01571eba6efbf15e426a76e4d",
         "size": 2660
      },
      {
         "mediaType": "application/vnd.oci.image.layer.v1.tar+zstd",
         "digest": "sha256:7727c34b09768ca269bec406094c909fd033bdce23619e0e3ac2e699923fc382",
         "size": 262405003
      },
      {
         "mediaType": "application/vnd.oci.image.layer.v1.tar+zstd",
         "digest": "sha256:8fd3570bd61256aefca8007c49335582c25bb2b101df6b7d0a216aa3a837bde7",
         "size": 3676
      },
      {
         "mediaType": "application/vnd.oci.image.layer.v1.tar+zstd",
         "digest": "sha256:2c1ce468d9f3d941396801f6e3afc8921466650dd05430fe644cd3537713d27f",
         "size": 16
      },
      {
         "mediaType": "application/vnd.oci.image.layer.v1.tar+zstd",
         "digest": "sha256:86ef2d1442d99381ddf191f4fa9f64fa5858cf59f51cf01be9aed8d804d1fcb7",
         "size": 41961837
      },
      {
         "mediaType": "application/vnd.oci.image.layer.v1.tar+gzip",
         "digest": "sha256:be149482a2dba495770592a4597cc54a0b6fbc94e274190955d495e28516b5d6",
         "size": 46881257
      }
   ]
}

Image 2 builds on top of image 1, so the layers should be all shared.

If I pull image 1 first I end up with

> ctr --namespace moby --address /var/run/docker/containerd/containerd.sock images check 
$image_1  application/vnd.oci.image.manifest.v1+json sha256:b002bf56ca1f71382351ca7a1ccec74a36311375e39fc7715afa9264f3905c40 complete (11/11) 357.0 MiB/357.0 MiB true
$image_2 application/vnd.oci.image.manifest.v1+json sha256:c42dbdd58dc8c376bd060643559080d623c8891ed6f73a5f32feb186e30633cd incomplete (3/12) 84.7 MiB/366.2 MiB  true

Image 1 is fully available but image 2 is incomplete? I can run the image regardless so my gut feeling is that the content store is not able to make the connection that the layer is available but in a different compression format?

If I only pull image 2, it will be fully available.

Is there anything you can think of causing this?

@AkihiroSuda AkihiroSuda added bug Something isn't working and removed kind/unconfirmed-bug-claim Unconfirmed bug claim labels Jan 17, 2025
@apostasie
Copy link
Contributor

Hey @ChrisBr
Happy new year!

Sorry about the delay here. I took a break for a few weeks.
Will look into this ^.
Ping me again if I drop it.

@apostasie
Copy link
Contributor

@ChrisBr would you be able to provide step-by-step commands to reproduce the issue?

@apostasie
Copy link
Contributor

Hey @ChrisBr - gentle ping on this ^

@ChrisBr
Copy link
Author

ChrisBr commented Mar 12, 2025

@apostasie sorry was a bit swamped recently. I tried to come up with a simple example but couldn't. I will try to spend a bit more time on this soon.

Thanks for your help, really appreciated 🙇

@ChrisBr
Copy link
Author

ChrisBr commented Mar 12, 2025

I don't have reproducible steps yet but we're doing what you suggested earlier and always tag the image before converting but even this fails quite frequently.


GOOGLE_ARTIFACT_URL: resolved       \|++++++++++++++++++++++++++++++++++++++\|
--
  | manifest-sha256:30d9283be4b09e815ae6f7a40c3b51323362633952caf8b0aa768ee30d003d3d:                                              exists         \|++++++++++++++++++++++++++++++++++++++\|
  | layer-sha256:29792852efad9f3bfb4c16b75686c36c5b4ba1745f14eae2b4708599240e9758:                                                 done           \|++++++++++++++++++++++++++++++++++++++\|
  | layer-sha256:19977aed7e43227124a9e88632b0fff9046b62ddb967f6b9a2f8d1642ab28951:                                                 exists         \|++++++++++++++++++++++++++++++++++++++\|
  | layer-sha256:bbe2c8c2081503e5bad22bf2f781963424b2686801cccc365d08a485170346f9:                                                 exists         \|++++++++++++++++++++++++++++++++++++++\|
  | layer-sha256:56c3e70771d34bee01c7c246e1b5764a7e49ead2cf3fd03456315ac17024e207:                                                 exists         \|++++++++++++++++++++++++++++++++++++++\|
  | layer-sha256:1672007553af37c2c778673c97e3addbc3992b717a0074b3b91099153965af5b:                                                 exists         \|++++++++++++++++++++++++++++++++++++++\|
  | layer-sha256:8d320c0135416ce62210304ee3ca90d5b9cdaf44acb3e9fb8775aa965c52bc00:                                                 exists         \|++++++++++++++++++++++++++++++++++++++\|
  | layer-sha256:7d85047d7f6208370eb866fc1dbe09ef10606b44d27552ac83a817ad943f4bd2:                                                 done           \|++++++++++++++++++++++++++++++++++++++\|
  | layer-sha256:fb31efd7851ef6ddad4818c02b31694fce4764f55186456890935dd76b354aa4:                                                 exists         \|++++++++++++++++++++++++++++++++++++++\|
  | config-sha256:f8b016d4153b28c1ee378dc03dd227d4ec6d8740062745faa0acfae9df431b5d:                                                exists         \|++++++++++++++++++++++++++++++++++++++\|
  | layer-sha256:53c35f972586a6bdc9628fa651b90812638aaa94d39f7d53d28694495f9c7ab2:                                                 done           \|++++++++++++++++++++++++++++++++++++++\|
  | layer-sha256:8da06d370d07c679136deb19ad53534b489ea16ec6b2557b30480b32779efc01:                                                 exists         \|++++++++++++++++++++++++++++++++++++++\|
  | layer-sha256:c5f6bf11ea26e9c53c3e52d19f82c35d41e28c22c79ca617fe2709a07331719e:                                                 exists         \|++++++++++++++++++++++++++++++++++++++\|
  | layer-sha256:32fb2971d7a739f15281e47993b8260f63d6da29d3b34a05832574ee264a2391:                                                 exists         \|++++++++++++++++++++++++++++++++++++++\|
  | layer-sha256:50a8f944d1ae9dcafde7241b18c1e652f4559e8fa49c9a032d83d392b28bfad0:                                                 done           \|++++++++++++++++++++++++++++++++++++++\|
  | layer-sha256:2c1ce468d9f3d941396801f6e3afc8921466650dd05430fe644cd3537713d27f:                                                 exists         \|++++++++++++++++++++++++++++++++++++++\|
  | layer-sha256:4ef72c58b2a1227fe3389efe18188c66324545c62ab671bca3feaa4b9c63cde0:                                                 exists         \|++++++++++++++++++++++++++++++++++++++\|
  | layer-sha256:53d218ebab3a431688352a77cb535c091ef8d4cb3caa006a842750566632ce0f:                                                 exists         \|++++++++++++++++++++++++++++++++++++++\|
  | elapsed: 4.3 s                                                                                                                 total:  254.4  (59.2 MiB/s)
  | FATA[0004] image "GOOGLE_ARTIFACT_URL": not found

The image does exist in the registry. Further more the image also exists locally because, on fallback, I do a docker inspect which finds the image locally.

Could you recommend some steps to debug this further because I'm really scratching my head here.

I can't reproduce it but it does happen every ~50 builds so I can add debug code to our builder and should be able to get more information quite quickly.

@apostasie
Copy link
Contributor

@ChrisBr even without a proper reproducer that would guarantee to trigger the bug, can you share the exact steps you are doing so that I can torture test it? (is that pull, tag, convert, push, then pull?)

@ChrisBr
Copy link
Author

ChrisBr commented Mar 17, 2025

Yeah so basically it's this

docker pull $BASE_IMAGE
id=$(docker run $BASE_IMAGE)

nerdctl --namespace moby --address /var/run/docker/containerd/containerd.sock tag BASE_IMAGE BASE_IMAGE # tag the base so we ensure all layers exist
docker commit $id NEW_IMAGE_URL
nerdctl --namespace moby --address /var/run/docker/containerd/containerd.sock image convert --zstd --oci NEW_IMAGE_URL NEW_IMAGE_URL

Could it be that using the same tag trips it?

@apostasie
Copy link
Contributor

apostasie commented Mar 17, 2025

Thanks.

So, let's start with this.

Is the following ./thing.sh below faithful to what you are doing?

If yes, can you while true; do sudo ./thing.sh; done and confirm that it does break for you after some time?

#!/usr/bin/env bash
set -o errexit -o errtrace -o functrace -o nounset -o pipefail

################################################
# Replace matching your mileage
client="nerdctl"
ctr="ctr"
docker="docker"
address=/run/containerd/containerd.sock
image=ghcr.io/stargz-containers/alpine:3.13-org
################################################

namespace=moby
client_com=("$client" "--namespace" "$namespace" "--address" "$address")
ctr_com=("$ctr" "--namespace" "$namespace" "--address" "$address")


# Pre-cleanup
"$docker" rm -f $(docker ps -aq) >/dev/null 2>&1 || true
"$docker" rmi -f $(docker images -q) >/dev/null 2>&1 || true

# Repro
"$docker" pull --quiet "$image"
"$docker" run --net none --quiet --name containerized "$image"

"$docker" commit "containerized" "committed"
"${client_com[@]}" image convert --zstd --oci "committed" "converted" && echo SUCCESS || {
  echo FAIL
  exit 42
}

@apostasie
Copy link
Contributor

apostasie commented Apr 18, 2025

@ChrisBr
We recently merged:

They are all related to the overall issue of (likely) certain layers being garbage collected breaking operations like convert.

I believe your latest issue here might have been fixed by one of the above (most likely #4121).

If you do get a chance, would love to see you try the latest nerdctl main (or if you prefer to wait, the next patch release).
Also, if you ever see again digest content not found, I would love to hear from it.

@apostasie
Copy link
Contributor

apostasie commented Apr 19, 2025

Actually - @AkihiroSuda suggesting we close this as resolved.

This ticket has gone to a range of different places and different problems and questions, and it is no longer clear what it is about or what issue it describes - except of course @ChrisBr last mention of failed convert, which I believe is fixed now.

@ChrisBr if it turns out this is not fixed, I would owe you an apology (and a beer on me, the next time I am in Cambridge :-)) and would of course look at any new details on this. Cheers!

@ChrisBr
Copy link
Author

ChrisBr commented Apr 22, 2025

If you do get a chance, would love to see you try the latest nerdctl main (or if you prefer to wait, the next patch release).

Amazing, thank you! Any rough idea when you do the next patch release?

@apostasie
Copy link
Contributor

@ChrisBr:
@AkihiroSuda just opened #4149 - so, let's conservatively say a couple of weeks?

About this here, I want to change my statement from "yeah, it's fixed!", to "cautiously, this should be fixed".

The good news is that we have a much better grasp of what is happening when convert screws the pooch - the bad news is that there may still be (the same) lingering issue in dependencies (stargz/zstd) and we might have to do multiple careful reviews, deep in there. So, if it happens again (when you will run the new version), open a new ticket with as much detail as you can and I'll have a look.
Cheers!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants