Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issues with configuring mirror regisitries - response was http.StatusNotFound #5478

Open
killergoalie opened this issue Jan 25, 2025 · 13 comments
Labels

Comments

@killergoalie
Copy link

Hi I'm running into a not so fun issue with trying to configure ContainerD to use a mirror registry (well several or the _default) configs.

Currently running k0s 1.30.5 in an airgapped/network restricted setup, using the tar ball works without an issue. With RHEL 8.10 kernel is 4.18.0-553.16.1.el8_10.x86_64.

But trying to setup the registry mirror using either v1 or v2/v3 configs for ContainerD I just keep getting the same in the logs:

trying next host - response was http.StatusNotFound

It's also ignoring the skip_secure for some reason.

Full log (trying to deploy from registry.lab/nginxinc/nginx-unprivileged:v1.25.1):

Jan 25 19:29:34 worker0.home.lab k0s-v1.30.5+k0s.0-amd64[818854]: time="2025-01-25 19:29:34" level=info msg="time=\"2025-01-25T19:29:34.128143339Z\" level=info msg=\"PullImage \\\"registry.home.lab/nginxinc/nginx-unprivileged:v1.25.1\\\"\"" component=containerd stream=stderr
Jan 25 19:29:34 worker0.home.lab k0s-v1.30.5+k0s.0-amd64[818854]: time="2025-01-25 19:29:34" level=info msg="time=\"2025-01-25T19:29:34.185558658Z\" level=info msg=\"trying next host - response was http.StatusNotFound\" host=registry.home.lab" component=containerd stream=stderr
Jan 25 19:29:34 worker0.home.lab k0s-v1.30.5+k0s.0-amd64[818854]: time="2025-01-25 19:29:34" level=info msg="time=\"2025-01-25T19:29:34.195516406Z\" level=info msg=\"trying next host\" error=\"failed to do request: Head \\\"https://registry.home.lab/v2/nginxinc/nginx-unprivileged/manifests/v1.25.1\\\": tls: failed to verify certificate: x509: certificate is not valid for any names, but wanted to match registry.home.lab\" host=registry.home.lab" component=containerd stream=stderr
Jan 25 19:29:34 worker0.home.lab k0s-v1.30.5+k0s.0-amd64[818854]: time="2025-01-25 19:29:34" level=info msg="time=\"2025-01-25T19:29:34.197849125Z\" level=error msg=\"PullImage \\\"registry.home.lab/nginxinc/nginx-unprivileged:v1.25.1\\\" failed\" error=\"failed to pull and unpack image \\\"registry.home.lab/nginxinc/nginx-unprivileged:v1.25.1\\\": failed to resolve reference \\\"registry.home.lab/nginxinc/nginx-unprivileged:v1.25.1\\\": failed to do request: Head \\\"https://registry.home.lab/v2/nginxinc/nginx-unprivileged/manifests/v1.25.1\\\": tls: failed to verify certificate: x509: certificate is not valid for any names, but wanted to match registry.home.lab\"" component=containerd stream=stderr
Jan 25 19:29:34 worker0.home.lab k0s-v1.30.5+k0s.0-amd64[818854]: time="2025-01-25 19:29:34" level=info msg="time=\"2025-01-25T19:29:34.197931317Z\" level=info msg=\"stop pulling image registry.home.lab/nginxinc/nginx-unprivileged:v1.25.1: active requests=0, bytes read=0\"" component=containerd stream=stderr
Jan 25 19:29:34 worker0.home.lab k0s-v1.30.5+k0s.0-amd64[818854]: time="2025-01-25 19:29:34" level=info msg="E0125 19:29:34.198030  819011 remote_image.go:180] \"PullImage from image service failed\" err=\"rpc error: code = Unknown desc = failed to pull and unpack image \\\"registry.home.lab/nginxinc/nginx-unprivileged:v1.25.1\\\": failed to resolve reference \\\"registry.home.lab/nginxinc/nginx-unprivileged:v1.25.1\\\": failed to do request: Head \\\"https://registry.home.lab/v2/nginxinc/nginx-unprivileged/manifests/v1.25.1\\\": tls: failed to verify certificate: x509: certificate is not valid for any names, but wanted to match registry.home.lab\" image=\"registry.home.lab/nginxinc/nginx-unprivileged:v1.25.1\"" component=kubelet stream=stderr
Jan 25 19:29:34 worker0.home.lab k0s-v1.30.5+k0s.0-amd64[818854]: time="2025-01-25 19:29:34" level=info msg="E0125 19:29:34.198084  819011 kuberuntime_image.go:55] \"Failed to pull image\" err=\"failed to pull and unpack image \\\"registry.home.lab/nginxinc/nginx-unprivileged:v1.25.1\\\": failed to resolve reference \\\"registry.home.lab/nginxinc/nginx-unprivileged:v1.25.1\\\": failed to do request: Head \\\"https://registry.home.lab/v2/nginxinc/nginx-unprivileged/manifests/v1.25.1\\\": tls: failed to verify certificate: x509: certificate is not valid for any names, but wanted to match registry.home.lab\" image=\"registry.home.lab/nginxinc/nginx-unprivileged:v1.25.1\"" component=kubelet stream=stderr
Jan 25 19:29:34 worker0.home.lab k0s-v1.30.5+k0s.0-amd64[818854]: time="2025-01-25 19:29:34" level=info msg="E0125 19:29:34.198151  819011 kuberuntime_manager.go:1258] container &Container{Name:test2,Image:registry.home.lab/nginxinc/nginx-unprivileged:v1.25.1,Command:[],Args:[],WorkingDir:,Ports:[]ContainerPort{},Env:[]EnvVar{},Resources:ResourceRequirements{Limits:ResourceList{},Requests:ResourceList{},Claims:[]ResourceClaim{},},VolumeMounts:[]VolumeMount{VolumeMount{Name:kube-api-access-ss9l9,ReadOnly:true,MountPath:/var/run/secrets/kubernetes.io/serviceaccount,SubPath:,MountPropagation:nil,SubPathExpr:,RecursiveReadOnly:nil,},},LivenessProbe:nil,ReadinessProbe:nil,Lifecycle:nil,TerminationMessagePath:/dev/termination-log,ImagePullPolicy:IfNotPresent,SecurityContext:nil,Stdin:false,StdinOnce:false,TTY:false,EnvFrom:[]EnvFromSource{},TerminationMessagePolicy:File,VolumeDevices:[]VolumeDevice{},StartupProbe:nil,ResizePolicy:[]ContainerResizePolicy{},RestartPolicy:nil,} start failed in pod test2_default(72c75fa2-cf8f-4f7c-8477-0655278a45e7): ErrImagePull: failed to pull and unpack image \"registry.home.lab/nginxinc/nginx-unprivileged:v1.25.1\": failed to resolve reference \"registry.home.lab/nginxinc/nginx-unprivileged:v1.25.1\": failed to do request: Head \"https://registry.home.lab/v2/nginxinc/nginx-unprivileged/manifests/v1.25.1\": tls: failed to verify certificate: x509: certificate is not valid for any names, but wanted to match registry.home.lab" component=kubelet stream=stderr
Jan 25 19:29:34 worker0.home.lab k0s-v1.30.5+k0s.0-amd64[818854]: time="2025-01-25 19:29:34" level=info msg="E0125 19:29:34.198176  819011 pod_workers.go:1298] \"Error syncing pod, skipping\" err=\"failed to \\\"StartContainer\\\" for \\\"test2\\\" with ErrImagePull: \\\"failed to pull and unpack image \\\\\\\"registry.home.lab/nginxinc/nginx-unprivileged:v1.25.1\\\\\\\": failed to resolve reference \\\\\\\"registry.home.lab/nginxinc/nginx-unprivileged:v1.25.1\\\\\\\": failed to do request: Head \\\\\\\"https://registry.home.lab/v2/nginxinc/nginx-unprivileged/manifests/v1.25.1\\\\\\\": tls: failed to verify certificate: x509: certificate is not valid for any names, but wanted to match registry.home.lab\\\"\" pod=\"default/test2\" podUID=\"72c75fa2-cf8f-4f7c-8477-0655278a45e7\"" component=kubelet stream=stderr

Full log (trying to deploy from nginxinc/nginx-unprivileged:v1.25.1 idea being to let the registry config pull from the mirror):

Jan 25 19:34:57 worker0.home.lab k0s-v1.30.5+k0s.0-amd64[818854]: time="2025-01-25 19:34:57" level=info msg="time=\"2025-01-25T19:34:57.357676533Z\" level=info msg=\"PullImage \\\"nginxinc/nginx-unprivileged:v1.25.1\\\"\"" component=containerd stream=stderr
Jan 25 19:34:57 worker0.home.lab k0s-v1.30.5+k0s.0-amd64[818854]: time="2025-01-25 19:34:57" level=info msg="time=\"2025-01-25T19:34:57.413257882Z\" level=info msg=\"trying next host - response was http.StatusNotFound\" host=registry.home.lab" component=containerd stream=stderr
Jan 25 19:34:57 worker0.home.lab k0s-v1.30.5+k0s.0-amd64[818854]: time="2025-01-25 19:34:57" level=info msg="time=\"2025-01-25T19:34:57.415825726Z\" level=info msg=\"trying next host\" error=\"failed to do request: Head \\\"https://docker.io/v2/nginxinc/nginx-unprivileged/manifests/v1.25.1\\\": dial tcp: lookup docker.io on 172.16.0.4:53: server misbehaving\" host=docker.io" component=containerd stream=stderr
Jan 25 19:34:57 worker0.home.lab k0s-v1.30.5+k0s.0-amd64[818854]: time="2025-01-25 19:34:57" level=info msg="time=\"2025-01-25T19:34:57.416672534Z\" level=error msg=\"PullImage \\\"nginxinc/nginx-unprivileged:v1.25.1\\\" failed\" error=\"failed to pull and unpack image \\\"docker.io/nginxinc/nginx-unprivileged:v1.25.1\\\": failed to resolve reference \\\"docker.io/nginxinc/nginx-unprivileged:v1.25.1\\\": failed to do request: Head \\\"https://docker.io/v2/nginxinc/nginx-unprivileged/manifests/v1.25.1\\\": dial tcp: lookup docker.io on 172.16.0.4:53: server misbehaving\"" component=containerd stream=stderr
Jan 25 19:34:57 worker0.home.lab k0s-v1.30.5+k0s.0-amd64[818854]: time="2025-01-25 19:34:57" level=info msg="time=\"2025-01-25T19:34:57.416739187Z\" level=info msg=\"stop pulling image docker.io/nginxinc/nginx-unprivileged:v1.25.1: active requests=0, bytes read=0\"" component=containerd stream=stderr
Jan 25 19:34:57 worker0.home.lab k0s-v1.30.5+k0s.0-amd64[818854]: time="2025-01-25 19:34:57" level=info msg="E0125 19:34:57.416915  819011 remote_image.go:180] \"PullImage from image service failed\" err=\"rpc error: code = Unknown desc = failed to pull and unpack image \\\"docker.io/nginxinc/nginx-unprivileged:v1.25.1\\\": failed to resolve reference \\\"docker.io/nginxinc/nginx-unprivileged:v1.25.1\\\": failed to do request: Head \\\"https://docker.io/v2/nginxinc/nginx-unprivileged/manifests/v1.25.1\\\": dial tcp: lookup docker.io on 172.16.0.4:53: server misbehaving\" image=\"nginxinc/nginx-unprivileged:v1.25.1\"" component=kubelet stream=stderr
Jan 25 19:34:57 worker0.home.lab k0s-v1.30.5+k0s.0-amd64[818854]: time="2025-01-25 19:34:57" level=info msg="E0125 19:34:57.416981  819011 kuberuntime_image.go:55] \"Failed to pull image\" err=\"failed to pull and unpack image \\\"docker.io/nginxinc/nginx-unprivileged:v1.25.1\\\": failed to resolve reference \\\"docker.io/nginxinc/nginx-unprivileged:v1.25.1\\\": failed to do request: Head \\\"https://docker.io/v2/nginxinc/nginx-unprivileged/manifests/v1.25.1\\\": dial tcp: lookup docker.io on 172.16.0.4:53: server misbehaving\" image=\"nginxinc/nginx-unprivileged:v1.25.1\"" component=kubelet stream=stderr
Jan 25 19:34:57 worker0.home.lab k0s-v1.30.5+k0s.0-amd64[818854]: time="2025-01-25 19:34:57" level=info msg="E0125 19:34:57.417082  819011 kuberuntime_manager.go:1258] container &Container{Name:test2,Image:nginxinc/nginx-unprivileged:v1.25.1,Command:[],Args:[],WorkingDir:,Ports:[]ContainerPort{},Env:[]EnvVar{},Resources:ResourceRequirements{Limits:ResourceList{},Requests:ResourceList{},Claims:[]ResourceClaim{},},VolumeMounts:[]VolumeMount{VolumeMount{Name:kube-api-access-g2tqx,ReadOnly:true,MountPath:/var/run/secrets/kubernetes.io/serviceaccount,SubPath:,MountPropagation:nil,SubPathExpr:,RecursiveReadOnly:nil,},},LivenessProbe:nil,ReadinessProbe:nil,Lifecycle:nil,TerminationMessagePath:/dev/termination-log,ImagePullPolicy:IfNotPresent,SecurityContext:nil,Stdin:false,StdinOnce:false,TTY:false,EnvFrom:[]EnvFromSource{},TerminationMessagePolicy:File,VolumeDevices:[]VolumeDevice{},StartupProbe:nil,ResizePolicy:[]ContainerResizePolicy{},RestartPolicy:nil,} start failed in pod test2_default(690a0395-fc8e-40ff-a94f-c87748d05e1e): ErrImagePull: failed to pull and unpack image \"docker.io/nginxinc/nginx-unprivileged:v1.25.1\": failed to resolve reference \"docker.io/nginxinc/nginx-unprivileged:v1.25.1\": failed to do request: Head \"https://docker.io/v2/nginxinc/nginx-unprivileged/manifests/v1.25.1\": dial tcp: lookup docker.io on 172.16.0.4:53: server misbehaving" component=kubelet stream=stderr
Jan 25 19:34:57 worker0.home.lab k0s-v1.30.5+k0s.0-amd64[818854]: time="2025-01-25 19:34:57" level=info msg="E0125 19:34:57.417122  819011 pod_workers.go:1298] \"Error syncing pod, skipping\" err=\"failed to \\\"StartContainer\\\" for \\\"test2\\\" with ErrImagePull: \\\"failed to pull and unpack image \\\\\\\"docker.io/nginxinc/nginx-unprivileged:v1.25.1\\\\\\\": failed to resolve reference \\\\\\\"docker.io/nginxinc/nginx-unprivileged:v1.25.1\\\\\\\": failed to do request: Head \\\\\\\"https://docker.io/v2/nginxinc/nginx-unprivileged/manifests/v1.25.1\\\\\\\": dial tcp: lookup docker.io on 172.16.0.4:53: server misbehaving\\\"\" pod=\"default/test2\" podUID=\"690a0395-fc8e-40ff-a94f-c87748d05e1e\"" component=kubelet stream=stderr
Jan 25 19:34:57 worker0.home.lab k0s-v1.30.5+k0s.0-amd64[818854]: time="2025-01-25 19:34:57" level=info msg="E0125 19:34:57.930998  819011 pod_workers.go:1298] \"Error syncing pod, skipping\" err=\"failed to \\\"StartContainer\\\" for \\\"test2\\\" with ImagePullBackOff: \\\"Back-off pulling image \\\\\\\"nginxinc/nginx-unprivileged:v1.25.1\\\\\\\"\\\"\" pod=\"default/test2\" podUID=\"690a0395-fc8e-40ff-a94f-c87748d05e1e\"" component=kubelet stream=stderr

Current config:

/etc/k0s/containderd.d/registry.toml

version = 2

[plugins]

  [plugins."io.containerd.grpc.v1.cri"]

    [plugins."io.containerd.grpc.v1.cri".registry]
      config_path = "/etc/k0s/certs.d"

Tree for /etc/k0s/certs.d

[root@worker0 certs.d]# tree /etc/k0s/certs.d/
/etc/k0s/certs.d/
├── registry.home.lab
│   └── hosts.toml
├── _default
│   └── hosts.toml
├── docker.io
│   └── hosts.toml
├── quay.io
│   └── hosts.toml
└── registry.k8s.io
    └── hosts.toml

5 directories, 5 files

registry.home.lab hosts.toml config

server =  "https://registry.home.lab"

[host."https://registry.home.lab"]
  capabilities = ["pull", "resolve"]
  skip_verify = true
  override_path = true
  [host."https://registry.home.lab".header]
    authorization = "Basic XXXXXXXXXYYYY"

docker.io hosts.toml config

server =  "https://docker.io"

[host."https://registry.home.lab"]
  capabilities = ["pull", "resolve"]
  skip_verify = true
  override_path = true
  [host."https://registry.home.lab".header]
    authorization = "Basic XXXXXXXXYYYY"

_default hosts.toml config

server = "https://registry.home.lab"

[host."https://registry.home.lab"]
  capabilities = ["pull","resolve"]
  skip_verify = true
  override_path = true
  [host."https://registry.home.lab".header]
    authorization = "Basic XXXXXXXXYYYY"

This same setup is working with RKE2 and the mirror registry config with it's deployment.

Really throwing me for a loop. I'm sure I have something missing in the config just not sure what. I did try having :5000 at the end of the mirror registry but was getting the same response.

@twz123
Copy link
Member

twz123 commented Jan 25, 2025

Just to get the complete config, can you maybe also share the contents of /etc/k0s/containerd.toml and /run/k0s/containerd-cri.toml?

@killergoalie
Copy link
Author

containerd.toml

# k0s_managed=true
# This is a placeholder configuration for k0s managed containerd.
# If you wish to override the config, remove the first line and replace this file with your custom configuration.
# For reference see https://github.com/containerd/containerd/blob/main/docs/man/containerd-config.toml.5.md
version = 2
imports = [
	"/run/k0s/containerd-cri.toml",
]

run/k0s/containerd-cri.toml

version = 2

[plugins]
  [plugins."io.containerd.grpc.v1.cri"]
    cdi_spec_dirs = ["/etc/cdi", "/var/run/cdi"]
    device_ownership_from_security_context = false
    disable_apparmor = false
    disable_cgroup = false
    disable_hugetlb_controller = true
    disable_proc_mount = false
    disable_tcp_service = true
    drain_exec_sync_io_timeout = "0s"
    enable_cdi = false
    enable_selinux = false
    enable_tls_streaming = false
    enable_unprivileged_icmp = false
    enable_unprivileged_ports = false
    ignore_deprecation_warnings = []
    ignore_image_defined_volumes = false
    image_pull_progress_timeout = "5m0s"
    image_pull_with_sync_fs = false
    max_concurrent_downloads = 3
    max_container_log_line_size = 16384
    netns_mounts_under_state_dir = false
    restrict_oom_score_adj = false
    sandbox_image = "registry.k8s.io/pause:3.9"
    selinux_category_range = 1024
    stats_collect_period = 10
    stream_idle_timeout = "4h0m0s"
    stream_server_address = "127.0.0.1"
    stream_server_port = "0"
    systemd_cgroup = false
    tolerate_missing_hugetlb_controller = true
    unset_seccomp_profile = ""
    [plugins."io.containerd.grpc.v1.cri".cni]
      bin_dir = "/opt/cni/bin"
      conf_dir = "/etc/cni/net.d"
      conf_template = ""
      ip_pref = ""
      max_conf_num = 1
      setup_serially = false
    [plugins."io.containerd.grpc.v1.cri".containerd]
      default_runtime_name = "runc"
      disable_snapshot_annotations = true
      discard_unpacked_layers = false
      ignore_blockio_not_enabled_errors = false
      ignore_rdt_not_enabled_errors = false
      no_pivot = false
      snapshotter = "overlayfs"
      [plugins."io.containerd.grpc.v1.cri".containerd.default_runtime]
        base_runtime_spec = ""
        cni_conf_dir = ""
        cni_max_conf_num = 0
        container_annotations = []
        pod_annotations = []
        privileged_without_host_devices = false
        privileged_without_host_devices_all_devices_allowed = false
        runtime_engine = ""
        runtime_path = ""
        runtime_root = ""
        runtime_type = ""
        sandbox_mode = ""
        snapshotter = ""
        [plugins."io.containerd.grpc.v1.cri".containerd.default_runtime.options]
      [plugins."io.containerd.grpc.v1.cri".containerd.runtimes]
        [plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc]
          base_runtime_spec = ""
          cni_conf_dir = ""
          cni_max_conf_num = 0
          container_annotations = []
          pod_annotations = []
          privileged_without_host_devices = false
          privileged_without_host_devices_all_devices_allowed = false
          runtime_engine = ""
          runtime_path = ""
          runtime_root = ""
          runtime_type = "io.containerd.runc.v2"
          sandbox_mode = "podsandbox"
          snapshotter = ""
          [plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc.options]
            BinaryName = ""
            CriuImagePath = ""
            CriuPath = ""
            CriuWorkPath = ""
            IoGid = 0
            IoUid = 0
            NoNewKeyring = false
            NoPivotRoot = false
            Root = ""
            ShimCgroup = ""
            SystemdCgroup = false
      [plugins."io.containerd.grpc.v1.cri".containerd.untrusted_workload_runtime]
        base_runtime_spec = ""
        cni_conf_dir = ""
        cni_max_conf_num = 0
        container_annotations = []
        pod_annotations = []
        privileged_without_host_devices = false
        privileged_without_host_devices_all_devices_allowed = false
        runtime_engine = ""
        runtime_path = ""
        runtime_root = ""
        runtime_type = ""
        sandbox_mode = ""
        snapshotter = ""
        [plugins."io.containerd.grpc.v1.cri".containerd.untrusted_workload_runtime.options]
    [plugins."io.containerd.grpc.v1.cri".image_decryption]
      key_model = "node"
    [plugins."io.containerd.grpc.v1.cri".registry]
      config_path = "/etc/k0s/certs.d"
      [plugins."io.containerd.grpc.v1.cri".registry.auths]
      [plugins."io.containerd.grpc.v1.cri".registry.configs]
      [plugins."io.containerd.grpc.v1.cri".registry.headers]
      [plugins."io.containerd.grpc.v1.cri".registry.mirrors]
    [plugins."io.containerd.grpc.v1.cri".x509_key_pair_streaming]
      tls_cert_file = ""
      tls_key_file = ""

@jnummelin
Copy link
Member

hmm, I think containerd gets somehow tripped over this:

tls: failed to verify certificate: x509: certificate is not valid for any names, but wanted to match registry.home.lab

I see your config does set skip_verify = true which AFAIK should bypass this check. One way to test things out would be by providing a CA cert in the configs for the registry, this way there's no need to skip_verify, as long as the CA matches to what the server is offering. 😄

I did try having :5000 at the end of the mirror registry
So is the registry only listening on port 5000? If so, I believe you should include the port number in the configs for the host.

@killergoalie
Copy link
Author

So the CA Root is in the OS level trust record, does K0S not use the OS level ca root chain?

I'm going to burn and rebuild just to make sure I don't have anything else going on but it doesn't make a whole lot of sense to me, it looks like it's pulling in just part of the config.

@jnummelin
Copy link
Member

So the CA Root is in the OS level trust record, does K0S not use the OS level ca root chain?

Ah, didn't realize that is the case, my bad. Containerd should use the OS level CAs. Yu could of course test by adding the CA into the config directly, just to rule out things.

One thing I'm not sure of is that you use docker.io AND _default configs. So I'm thinking can containerd get now confused on which config to use. 🤔

You could also increase the containerd log level to see if that reveals anything how it handles the config. Edit the systems unit to include something like:

k0s <worker/controller> ... --logging containerd=debug

@twz123
Copy link
Member

twz123 commented Jan 28, 2025

Weird. If I understand the containerd logs correctly, containerd is trying to pull twice from registry.home.lab:

level=info msg="PullImage \"registry.home.lab/nginxinc/nginx-unprivileged:v1.25.1\"""
level=info msg="trying next host - response was http.StatusNotFound" host=registry.home.lab"

This was the first attempt. Apparently all went well, but the registry didn't find the tag.

level=info msg="trying next host" error="failed to do request: Head \"https://registry.home.lab/v2/nginxinc/nginx-unprivileged/manifests/v1.25.1\\\": tls: failed to verify certificate: x509: certificate is not valid for any names, but wanted to match registry.home.lab" host=registry.home.lab"

This was the second attempt in which TLS verification failed. Why would it try two hosts? Maybe one for the exact match on registry.home.lab, and the other for the _default host? Even then, why would the first succeed, and the second one fail. As far as I can see, the two config snippets are identical. One remarkable thing is the URL that has been tried: https://registry.home.lab/v2/nginxinc/nginx-unprivileged/manifests/v1.25.1. Note the /v2/ in there. According to the config, this shouldn't be there in the first place, as you've specified override_path = true in both places.

Could you try to curl (maybe add -k and some -v) that URL manually, both with and without the /v2/ path segment? I bet the server responses will be different. Maybe this will give further insights. We're missing something here ... 🤔

NB: I noticed that in the skip_verify example from the official docs, they're using http:// when defining the host. You said that the above config is working on RKE2, so this might just be a glitch in the containerd docs ...

It's also ignoring the skip_secure for some reason.

You mean skip_verify, right?

@killergoalie
Copy link
Author

killergoalie commented Jan 28, 2025

@jnummelin thanks for the command for debug, I'll get that going with my next test.

The docker.io and _default are from throwing the kitchen sink at this issue, I can roll them back if you think they are conflicting.

@twz123 yes skip_verify sorry for the typo.

NB: I noticed that in the skip_verify example from the official docs, they're using http:// when defining the host. You said that the above config is working on RKE2, so this might just be a glitch in the containerd docs ...

Yeah I noticed that also, not sure if this is a doc example short coming or a possible sign that skip_verify is expecting http? I'll try swapping https for http also i guess

Could you try to curl (maybe add -k and some -v) that URL manually, both with and without the /v2/ path segment? I bet the server responses will be different. Maybe this will give further insights. We're missing something here ... 🤔

So I test this:

  • without /v2 404 page not found
  • with /v2 lead me to realize I was pulling with the wrong tag
  • with /v2 with the correct tag 1.25.1 instead of v1.25.1 I get the following error but I think this is expected with curl:
    {"errors":[{"code":"MANIFEST_UNKNOWN","message":"OCI manifest found, but accept header does not support OCI manifests"}]}

With this knowledge I tried to do my kubctl run again and now i'm getting the following errors:

with registry.home.lab:

  Warning  Failed     7m6s (x6 over 8m29s)    kubelet            Error: ImagePullBackOff
  Normal   Pulling    6m55s (x4 over 8m29s)   kubelet            Pulling image "registry.home.lab/nginxinc/nginx-unprivileged:1.25.1"
  Warning  Failed     6m55s (x4 over 8m29s)   kubelet            Failed to pull image "registry.home.lab/nginxinc/nginx-unprivileged:1.25.1": failed to pull and unpack image "registry.home.lab/nginxinc/nginx-unprivileged:1.25.1": failed to resolve reference "registry.home.lab/nginxinc/nginx-unprivileged:1.25.1": failed to do request: Head "https://registry.home.lab/v2/nginxinc/nginx-unprivileged/manifests/1.25.1": tls: failed to verify certificate: x509: certificate is not valid for any names, but wanted to match registry.home.lab

with registry.home.lab:5000:

Normal   Scheduled  4m28s                  default-scheduler  Successfully assigned default/test2 to worker0.home.lab
  Warning  Failed     3m8s (x6 over 4m27s)   kubelet            Error: ImagePullBackOff
  Normal   Pulling    2m53s (x4 over 4m28s)  kubelet            Pulling image "registry.home.lab:5000/nginxinc/nginx-unprivileged:1.25.1"
  Warning  Failed     2m53s (x4 over 4m27s)  kubelet            Failed to pull image "registry.home.lab:5000/nginxinc/nginx-unprivileged:1.25.1": failed to pull and unpack image "registry.home.lab:5000/nginxinc/nginx-unprivileged:1.25.1": failed to resolve reference "registry.home.lab:5000/nginxinc/nginx-unprivileged:1.25.1": pull access denied, repository does not exist or may require authorization: authorization failed: no basic auth credentials

I think I did find one issue with the TLS cert for my registry. Appears my automation for handling this as snuck the port into the SAN, so i'll get that fixed. Next is to redeploy with the updated debugging.

EDIT: to me it looks like it's using part of the hosts.toml but not all of it, is it seems to parse to try from a different host. Doing another burn down and going to test with just the docker.io and mirror registry config.

@killergoalie
Copy link
Author

Here's some more logs with the containerd=debug

Jan 29 18:36:32 worker0.home.lab k0s-v1.30.5+k0s.0-amd64[2151]: time="2025-01-29 18:36:32" level=info msg="time=\"2025-01-29T18:36:32.942110500Z\" level=info msg=\"RunPodSandbox for &PodSandboxMetadata{Name:kube-proxy-m266j,Uid:f542d0e9-ecdc-459c-8f13-db6a616bb09e,Namespace:kube-system,Attempt:0,}\"" component=containerd stream=stderr
Jan 29 18:36:32 worker0.home.lab k0s-v1.30.5+k0s.0-amd64[2151]: time="2025-01-29 18:36:32" level=info msg="time=\"2025-01-29T18:36:32.942210636Z\" level=debug msg=\"Sandbox config &PodSandboxConfig{Metadata:&PodSandboxMetadata{Name:kube-proxy-m266j,Uid:f542d0e9-ecdc-459c-8f13-db6a616bb09e,Namespace:kube-system,Attempt:0,},Hostname:,LogDirectory:/var/log/pods/kube-system_kube-proxy-m266j_f542d0e9-ecdc-459c-8f13-db6a616bb09e,DnsConfig:&DNSConfig{Servers:[172.16.0.15],Searches:[home.lab],Options:[],},PortMappings:[]*PortMapping{},Labels:map[string]string{controller-revision-hash: 779b5d8575,io.kubernetes.pod.name: kube-proxy-m266j,io.kubernetes.pod.namespace: kube-system,io.kubernetes.pod.uid: f542d0e9-ecdc-459c-8f13-db6a616bb09e,k8s-app: kube-proxy,pod-template-generation: 1,},Annotations:map[string]string{kubernetes.io/config.seen: 2025-01-29T17:33:51.861529126Z,kubernetes.io/config.source: api,prometheus.io/port: 10249,prometheus.io/scrape: true,},Linux:&LinuxPodSandboxConfig{CgroupParent:/kubepods/besteffort/podf542d0e9-ecdc-459c-8f13-db6a616bb09e,SecurityContext:&LinuxSandboxSecurityContext{NamespaceOptions:&NamespaceOption{Network:NODE,Pid:CONTAINER,Ipc:POD,TargetId:,UsernsOptions:nil,},SelinuxOptions:nil,RunAsUser:nil,ReadonlyRootfs:false,SupplementalGroups:[],Privileged:true,SeccompProfilePath:,RunAsGroup:nil,Seccomp:&SecurityProfile{ProfileType:RuntimeDefault,LocalhostRef:,},Apparmor:nil,},Sysctls:map[string]string{},Overhead:&LinuxContainerResources{CpuPeriod:0,CpuQuota:0,CpuShares:0,MemoryLimitInBytes:0,OomScoreAdj:0,CpusetCpus:,CpusetMems:,HugepageLimits:[]*HugepageLimit{},Unified:map[string]string{},MemorySwapLimitInBytes:0,},Resources:&LinuxContainerResources{CpuPeriod:100000,CpuQuota:0,CpuShares:2,MemoryLimitInBytes:0,OomScoreAdj:0,CpusetCpus:,CpusetMems:,HugepageLimits:[]*HugepageLimit{},Unified:map[string]string{},MemorySwapLimitInBytes:0,},},Windows:nil,}\"" component=containerd stream=stderr
Jan 29 18:36:32 worker0.home.lab k0s-v1.30.5+k0s.0-amd64[2151]: time="2025-01-29 18:36:32" level=info msg="time=\"2025-01-29T18:36:32.942257824Z\" level=debug msg=\"generated id for sandbox name \\\"kube-proxy-m266j_kube-system_f542d0e9-ecdc-459c-8f13-db6a616bb09e_0\\\"\" podsandboxid=cec8eef522bc90b0c36b3b86ca41d997cc285d6444cbf302d987aa432dbc7d8d" component=containerd stream=stderr
Jan 29 18:36:32 worker0.home.lab k0s-v1.30.5+k0s.0-amd64[2151]: time="2025-01-29 18:36:32" level=info msg="time=\"2025-01-29T18:36:32.942310031Z\" level=debug msg=\"PullImage \\\"registry.k8s.io/pause:3.9\\\" with snapshotter overlayfs\"" component=containerd stream=stderr
Jan 29 18:36:32 worker0.home.lab k0s-v1.30.5+k0s.0-amd64[2151]: time="2025-01-29 18:36:32" level=info msg="time=\"2025-01-29T18:36:32.943680427Z\" level=debug msg=\"loading host directory\" dir=/etc/k0s/certs.d/_default" component=containerd stream=stderr
Jan 29 18:36:32 worker0.home.lab k0s-v1.30.5+k0s.0-amd64[2151]: time="2025-01-29 18:36:32" level=info msg="time=\"2025-01-29T18:36:32.943824384Z\" level=debug msg=resolving host=\"registry.home.lab:5000\"" component=containerd stream=stderr
Jan 29 18:36:32 worker0.home.lab k0s-v1.30.5+k0s.0-amd64[2151]: time="2025-01-29 18:36:32" level=info msg="time=\"2025-01-29T18:36:32.943851434Z\" level=debug msg=\"do request\" host=\"registry.home.lab:5000\" request.header.accept=\"application/vnd.docker.distribution.manifest.v2+json, application/vnd.docker.distribution.manifest.list.v2+json, application/vnd.oci.image.manifest.v1+json, application/vnd.oci.image.index.v1+json, */*\" request.header.user-agent=containerd/1.7.22 request.method=HEAD url=\"https://registry.home.lab:5000/pause/manifests/3.9?ns=registry.k8s.io\"" component=containerd stream=stderr
Jan 29 18:36:32 worker0.home.lab k0s-v1.30.5+k0s.0-amd64[2151]: time="2025-01-29 18:36:32" level=info msg="time=\"2025-01-29T18:36:32.949386476Z\" level=debug msg=\"fetch response received\" host=\"registry.home.lab:5000\" response.header.content-length=19 response.header.content-type=\"text/plain; charset=utf-8\" response.header.date=\"Wed, 29 Jan 2025 18:36:32 GMT\" response.header.docker-distribution-api-version=registry/2.0 response.header.x-content-type-options=nosniff response.status=\"404 Not Found\" url=\"https://registry.home.lab:5000/pause/manifests/3.9?ns=registry.k8s.io\"" component=containerd stream=stderr
Jan 29 18:36:32 worker0.home.lab k0s-v1.30.5+k0s.0-amd64[2151]: time="2025-01-29 18:36:32" level=info msg="time=\"2025-01-29T18:36:32.949423134Z\" level=info msg=\"trying next host - response was http.StatusNotFound\" host=\"registry.home.lab:5000\"" component=containerd stream=stderr
Jan 29 18:36:32 worker0.home.lab k0s-v1.30.5+k0s.0-amd64[2151]: time="2025-01-29 18:36:32" level=info msg="time=\"2025-01-29T18:36:32.949436319Z\" level=debug msg=resolving host=\"registry.home.lab:5000\"" component=containerd stream=stderr
Jan 29 18:36:32 worker0.home.lab k0s-v1.30.5+k0s.0-amd64[2151]: time="2025-01-29 18:36:32" level=info msg="time=\"2025-01-29T18:36:32.949449423Z\" level=debug msg=\"do request\" host=\"registry.home.lab:5000\" request.header.accept=\"application/vnd.docker.distribution.manifest.v2+json, application/vnd.docker.distribution.manifest.list.v2+json, application/vnd.oci.image.manifest.v1+json, application/vnd.oci.image.index.v1+json, */*\" request.header.user-agent=containerd/1.7.22 request.method=HEAD url=\"https://registry.home.lab:5000/v2/pause/manifests/3.9?ns=registry.k8s.io\"" component=containerd stream=stderr
Jan 29 18:36:32 worker0.home.lab k0s-v1.30.5+k0s.0-amd64[2151]: time="2025-01-29 18:36:32" level=info msg="time=\"2025-01-29T18:36:32.956826497Z\" level=debug msg=\"fetch response received\" host=\"registry.home.lab:5000\" response.header.content-length=148 response.header.content-type=\"application/json; charset=utf-8\" response.header.date=\"Wed, 29 Jan 2025 18:36:32 GMT\" response.header.docker-distribution-api-version=registry/2.0 response.header.www-authenticate=\"Basic realm=\\\"Realm\\\"\" response.header.x-content-type-options=nosniff response.status=\"401 Unauthorized\" url=\"https://registry.home.lab:5000/v2/pause/manifests/3.9?ns=registry.k8s.io\"" component=containerd stream=stderr
Jan 29 18:36:32 worker0.home.lab k0s-v1.30.5+k0s.0-amd64[2151]: time="2025-01-29 18:36:32" level=info msg="time=\"2025-01-29T18:36:32.956859639Z\" level=debug msg=Unauthorized header=\"Basic realm=\\\"Realm\\\"\" host=\"registry.home.lab:5000\"" component=containerd stream=stderr
Jan 29 18:36:32 worker0.home.lab k0s-v1.30.5+k0s.0-amd64[2151]: time="2025-01-29 18:36:32" level=info msg="time=\"2025-01-29T18:36:32.956874927Z\" level=info msg=\"trying next host\" error=\"pull access denied, repository does not exist or may require authorization: authorization failed: no basic auth credentials\" host=\"registry.home.lab:5000\"" component=containerd stream=stderr
Jan 29 18:36:32 worker0.home.lab k0s-v1.30.5+k0s.0-amd64[2151]: time="2025-01-29 18:36:32" level=info msg="time=\"2025-01-29T18:36:32.958005207Z\" level=error msg=\"RunPodSandbox for &PodSandboxMetadata{Name:kube-proxy-m266j,Uid:f542d0e9-ecdc-459c-8f13-db6a616bb09e,Namespace:kube-system,Attempt:0,} failed, error\" error=\"failed to get sandbox image \\\"registry.k8s.io/pause:3.9\\\": failed to pull image \\\"registry.k8s.io/pause:3.9\\\": failed to pull and unpack image \\\"registry.k8s.io/pause:3.9\\\": failed to resolve reference \\\"registry.k8s.io/pause:3.9\\\": pull access denied, repository does not exist or may require authorization: authorization failed: no basic auth credentials\"" component=containerd stream=stderr
Jan 29 18:36:32 worker0.home.lab k0s-v1.30.5+k0s.0-amd64[2151]: time="2025-01-29 18:36:32" level=info msg="time=\"2025-01-29T18:36:32.958024272Z\" level=info msg=\"stop pulling image registry.k8s.io/pause:3.9: active requests=0, bytes read=0\"" component=containerd stream=stderr

@twz123 Before you ask I did try curling my mirror registry for https://registry.home.lab:5000/v2/pause/manifests/3.9?ns=registry.k8s.io and got the following:

{
   "schemaVersion": 1,
   "name": "pause",
   "tag": "3.9",
   "architecture": "amd64",
   "fsLayers": [
      {
         "blobSum": "sha256:a3ed95caeb02ffe68cdd9fd84406680ae93d633cb16422d00e8a7c22955b46d4"
      },
      {
         "blobSum": "sha256:a3ed95caeb02ffe68cdd9fd84406680ae93d633cb16422d00e8a7c22955b46d4"
      },
      {
         "blobSum": "sha256:61fec91190a0bab34406027bbec43d562218df6e80d22d4735029756f23c7007"
      },
      {
         "blobSum": "sha256:a3ed95caeb02ffe68cdd9fd84406680ae93d633cb16422d00e8a7c22955b46d4"
      }
   ],
   "history": [
      {
         "v1Compatibility": "{\"architecture\":\"amd64\",\"config\":{\"User\":\"65535:65535\",\"Env\":[\"PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin\"],\"Entrypoint\":[\"/pause\"],\"WorkingDir\":\"/\",\"OnBuild\":null},\"created\":\"2022-10-13T18:35:41.370102426Z\",\"id\":\"37f70ae13cf73226bdc34226eb3c7f9b1cde72e58e88a05b89997fbaa61b5fc5\",\"moby.buildkit.buildinfo.v1\":\"eyJmcm9udGVuZCI6ImRvY2tlcmZpbGUudjAifQ==\",\"os\":\"linux\",\"parent\":\"66e30703831961d19d74bf917d26569243ff2f7b9989d9b9ea4008884bb6aa95\",\"throwaway\":true}"
      },
      {
         "v1Compatibility": "{\"id\":\"66e30703831961d19d74bf917d26569243ff2f7b9989d9b9ea4008884bb6aa95\",\"parent\":\"62b9fe89cf236aa9dc92fefa53c1d8df10b160a7a2bb9f428c529ca479f98a1f\",\"comment\":\"buildkit.dockerfile.v0\",\"created\":\"2022-10-13T18:35:41.370102426Z\",\"container_config\":{\"Cmd\":[\"USER 65535:65535\"]},\"throwaway\":true}"
      },
      {
         "v1Compatibility": "{\"id\":\"62b9fe89cf236aa9dc92fefa53c1d8df10b160a7a2bb9f428c529ca479f98a1f\",\"parent\":\"3690474eb5b4b26fdfbd89c6e159e8cc376ca76ef48032a30fa6aafd56337880\",\"comment\":\"buildkit.dockerfile.v0\",\"created\":\"2022-10-13T18:35:41.370102426Z\",\"container_config\":{\"Cmd\":[\"ADD bin/pause-linux-amd64 /pause # buildkit\"]}}"
      },
      {
         "v1Compatibility": "{\"id\":\"3690474eb5b4b26fdfbd89c6e159e8cc376ca76ef48032a30fa6aafd56337880\",\"comment\":\"buildkit.dockerfile.v0\",\"created\":\"2022-10-13T18:35:41.370102426Z\",\"container_config\":{\"Cmd\":[\"ARG ARCH\"]},\"throwaway\":true}"
      }
   ],
   "signatures": [
      {
         "header": {
            "jwk": {
               "crv": "P-256",
               "kid": "MT7S:M3WM:HD4Y:VAYZ:FXTM:U6XX:X5T2:QHOE:WLKV:CRON:2EDK:XP2L",
               "kty": "EC",
               "x": "cUPBC64H1FzLS-leYq73rx0C21-DSDIiBoKI65A4JTM",
               "y": "7Ja3xVHw0A0ykErWg4EKgAx1YbHlY9gBdvUvXGswCZU"
            },
            "alg": "ES256"
         },
         "signature": "b-45tyfZTHXfgLOO4FDTp6smwobJqxKIugtakZx-illcxic2IMq4JoDRcKKvK0Ecql-y2UALteyd4uB2VfuTWg",
         "protected": "eyJmb3JtYXRMZW5ndGgiOjIxNjIsImZvcm1hdFRhaWwiOiJDbjAiLCJ0aW1lIjoiMjAyNS0wMS0yOVQxOTo1MjozM1oifQ"
      }
   ]

@twz123
Copy link
Member

twz123 commented Jan 30, 2025

Given that you're now including the port, did you rename the respective config directory as well? IIUC it needs to be named registry.home.lab_5000_

https://github.com/containerd/containerd/blob/v1.7.25/remotes/docker/config/hosts.go#L291-L298

Did you remove override_path = true from the hosts.toml file? I don't think it's required, as your registry seems to respond quite okay with the /v2/ path segment.

msg="do request" host="registry.home.lab:5000" [...] request.method=HEAD url="https://registry.home.lab:5000/pause/manifests/3.9?ns=registry.k8s.io\""
msg="fetch response received" host="registry.home.lab:5000" [...] response.status="404 Not Found" url="https://registry.home.lab:5000/pause/manifests/3.9?ns=registry.k8s.io\""
msg="trying next host - response was http.StatusNotFound" host="registry.home.lab:5000""

That's the URL without /v2/ (supposedly because override_path = true), which is wrong, but which gets authorized via the config specified in the _default directory.

msg="do request" host="registry.home.lab:5000" [...] request.method=HEAD url="https://registry.home.lab:5000/v2/pause/manifests/3.9?ns=registry.k8s.io\""
level=debug msg="fetch response received" host="registry.home.lab:5000" [...] response.status="401 Unauthorized" url="https://registry.home.lab:5000/v2/pause/manifests/3.9?ns=registry.k8s.io\""
msg=Unauthorized header="Basic realm=\"Realm\"" host="registry.home.lab:5000""

That's the right URL, but the special config didn't kick in (there's no corresponding "loading host directory" log entry), and hence it gets the unauthorized response.

@killergoalie
Copy link
Author

So I think I got this sorted out. I'll have a follow up with notes once i'm done testing.

@jnummelin
Copy link
Member

So I think I got this sorted out.

Nice to hear 🎉 , I hope I'm not celebrating too early... 😂

I'll have a follow up with notes once i'm done testing.

I think a good example in the docs would be fantastic to have, care to submit a PR once you have all the things sorted out?

@killergoalie
Copy link
Author

killergoalie commented Feb 4, 2025

@jnummelin I'll try to get a PR submitted this week with some doc updates. Can I use this issue for that update or does the project like a specific issue to track commits and PRs?

@twz123
Copy link
Member

twz123 commented Feb 5, 2025

Can I use this issue for that update or does the project like a specific issue to track commits and PRs?

@killergoalie You can go ahead and use this issue 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants