rke2 server eats up all the memory #6370

harridu · 2024-07-18T06:14:40Z

Environmental Info:
RKE2 Version:

root@kube005c00:~# /usr/local/bin/rke2 -v
rke2 version v1.28.10+rke2r1 (b0d0d687d98f4fa015e7b30aaf2807b50edcc5d7)
go version go1.21.9 X:boringcrypto

Node(s) CPU architecture, OS, and Version:
Debian12 running inside kvm, 4 cores, 32 GByte memory, no swap

root@kube005c00:~# uname -a
Linux kube005c00.ac.aixigo.de 6.1.0-22-amd64 #1 SMP PREEMPT_DYNAMIC Debian 6.1.94-1 (2024-06-21) x86_64 GNU/Linux

Cluster Configuration:
3 controller nodes, 32 GByte RAM and 4 cores each, kvm
6 "real" worker nodes, 512 GByte RAM and 64 cores each
All Debian 12, RKE2 1.28.10, managed in Rancher 2.8.5

Describe the bug:
On the control plane nodes rke2 uses up quite a big chunk of memory. On the first control node I get

top - 07:34:13 up 2 days, 21:32,  1 user,  load average: 0.49, 0.62, 0.66
Tasks: 223 total,   1 running, 221 sleeping,   0 stopped,   1 zombie
%Cpu(s):  5.2 us,  2.3 sy,  0.0 ni, 91.2 id,  0.4 wa,  0.0 hi,  0.8 si,  0.1 st 
MiB Mem :  32094.8 total,   2515.0 free,  24040.7 used,   6008.8 buff/cache     
MiB Swap:      0.0 total,      0.0 free,      0.0 used.   8054.1 avail Mem 

    PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND                                         
    879 root      20   0   21.6g  20.4g  72544 S   0.7  65.0 226:22.84 /usr/local/bin/rke2 server                      
   2578 root      20   0 2603984   1.2g  81216 S   9.6   3.9      8,59 kube-apiserver --admission-control-config-file=+
   2314 root      20   0   11.0g 342968 191772 S   9.0   1.0      7,22 etcd --config-file=/var/lib/rancher/rke2/server+
    380 root      20   0  389584 298936 295608 S   0.0   0.9   1:09.86 /lib/systemd/systemd-journald                   
   1433 root      20   0 1356736 143084  41572 S   0.3   0.4  32:01.04 kube-scheduler --permit-port-sharing=true --aut+
   1082 root      20   0 1345244 118440  66816 S   3.0   0.4 179:23.01 kubelet --volume-plugin-dir=/var/lib/kubelet/vo+
   3677 root      20   0 2383748  94176  47724 S   1.7   0.3  94:27.17 calico-node -felix                              
   1045 root      20   0  791928  88868  49780 S   1.0   0.3  63:38.01 containerd -c /var/lib/rancher/rke2/agent/etc/c+
   4851 root      20   0 1347844  88592  62688 S   0.7   0.3  19:36.29 kube-controller-manager --flex-volume-plugin-di+
   1373 root      20   0 1286356  79868  40348 S   0.0   0.2  11:15.56 kube-proxy --cluster-cidr=10.42.0.0/16 --conntr+
   3681 root      20   0 1866344  72388  44660 S   0.0   0.2   1:06.10 calico-node -allocate-tunnel-addrs              
   3683 root      20   0 1866088  71560  42680 S   0.0   0.2   1:05.14 calico-node -status-reporter                    
   3676 root      20   0 1939820  68756  42200 S   0.0   0.2   0:35.75 calico-node -monitor-addresses                  
   3680 root      20   0 1866088  65992  41320 S   0.0   0.2   0:31.11 calico-node -monitor-token                      
   4948 root      20   0 1292736  59024  42068 S   0.3   0.2  13:33.89 cloud-controller-manager                        
    810 root      20   0 1275468  55068  32116 S   2.3   0.2  50:25.00 /usr/local/bin/rancher-system-agent sentinel    
   3523 nobody    20   0  746008  44024  21984 S   0.0   0.1   3:34.64 /bin/system-upgrade-controller

That is 20 GByte rss. On the other control plane nodes it is "just" 3 GByte. Still way too much for 3 days uptime. Memory usage increases over time, till the first control plane nodes runs into OOM.

The worker nodes seem fine.

Steps To Reproduce:
Setup a cluster using Rancher 2.8.5 and RKE2 1.28.10 and see it grow. If I use RKE2 on the command line to setup a cluster there is no such problem.

The text was updated successfully, but these errors were encountered:

harridu · 2024-07-18T06:27:18Z

PS: K3s has the same problem.

serhiynovos · 2024-07-18T07:58:51Z

@harridu I had exactly the same issue #6249

Do you store etcd snapshots on s3 storage ?

brandond · 2024-07-18T08:13:15Z

Please upgrade to v1.28.11+rke2r1

serhiynovos · 2024-07-18T08:15:54Z

@brandond BTW I still don't see option to upgrade rke2 to 2.18.11 from Rancher. Do you have any info when it will be available? Because I still during few weeks have to go and manually clear my bucket

harridu · 2024-07-18T09:17:07Z

@serhiynovos , yes, I am using a local minio to store a copy of the snapshots.

Update: S3 snapshots were on, but they are disabled right now. Only local storage.

serhiynovos · 2024-07-18T09:19:22Z

@harridu Please check your bucket. There should be a lot of snapshots. You can clean them manually and see if it will resolve the issue.

harridu · 2024-07-18T09:22:50Z

deleted. How comes the S3 storage is still in use, even though it is disabled in the GUI?

harridu · 2024-07-19T05:50:19Z

After removing all backups in S3 rke2 memory usage stays low, as it seems.

I still have no idea why it backups to S3 at all. If 1.28.11 provides some fixes, then please make it available in Rancher 2.8.5.

brandond · 2024-07-22T18:43:22Z

I still have no idea why it backups to S3 at all.

I'm not sure what you mean. Why does it back up to S3 when you configure s3 for backups?

serhiynovos · 2024-07-24T13:28:19Z

I still have no idea why it backups to S3 at all.

I'm not sure what you mean. Why does it back up to S3 when you configure s3 for backups?

@brandond i think @harridu means that on rancher ui he disabled s3 backups but rke2 still uploads them on s3 storage.

Btw finally got 1.28.11 version on Rancher. Issue with s3 is resolved

mikejoh · 2024-08-15T07:13:54Z

Please upgrade to v1.28.11+rke2r1

For us that are on v2.8.4 or v2.8.5 of Rancher without Prime don't have the option to pick 1.28.11+rke2r1, it's not part of the supported rke2 releases at least. Technically we could of course deploy 1.28.11 but we have had problems before when deploying a later version than the one specified for the specific Rancher release.

Any suggestions or input on this?

brandond · 2024-08-15T18:22:29Z

Upgrade to 2.9.0, or deal with running newer RKE2 releases that are not technically supported by the patch release of Rancher that you're on.

mikejoh · 2024-08-20T07:18:18Z

Upgrade to 2.9.0, or deal with running newer RKE2 releases that are not technically supported by the patch release of Rancher that you're on.

Thanks!

As a side note: We just noticed that when we in Rancher (the UI) check RKE2 versions we can select version 1.28.11 but in the release notes for the version we're on 2.8.4 that version is not mentioned: https://github.com/rancher/rancher/releases/tag/v2.8.4. Is the list of versions we can upgrade to dynamically updated perhaps?

brandond · 2024-08-20T10:44:35Z

It is, yes.

brandond · 2024-08-29T18:45:30Z

Closing as resolved in releases that have a fix for the s3 snapshot prune issue.

boris-stojnev · 2024-11-22T09:43:32Z

@brandond The memory issue is partially solved. Retention is working fine, it deletes old snapshots from both S3 and local, but the memory keeps increasing when etcd backup to s3 enabled.

It was noticed that on the etcd node leader RAM memory constantly increases by the size of db on a daily basis. It seems that rke2 caches the upload to s3 and never releases it.

After the backup snapshots to S3 are disabled memory immediately drops. Node was at 90% memory usage and dropped to 15%.

rke2 v1.30.4+rke2r1 with 3 nodes etcd cluster
Rancher version v2.9.2.
go version go1.22.5
Rocky Linux 9.4 kernel 5.14.0

brandond · 2024-11-22T10:56:16Z

It seems that rke2 caches the upload to s3 and never releases it.

What do you mean "caches it and never releases it". What lead you to this conclusion?

memory constantly increases by the size of db

What do you mean by "increases by the size of the db"? Can you provide figures demonstrating this?

boris-stojnev · 2024-11-22T11:16:40Z

Ignore my assumptions.
Here are the facts. If we take last 11 days,
on the Nov 9th at 00h: Ram used 8.2GiB
on the Nov the 21rt at 00h: Ram used 11.2GiB
that is total of 3072MiB increased for 11 days, and DB snapshot is around 100MiB per node.

serhiynovos · 2024-11-22T11:18:09Z

@boris-stojnev did you try to use latest stable 1.30.6 version? I had similar issue in previous versions but don't experience it anymore after upgrade

boris-stojnev · 2024-11-22T11:38:54Z

@serhiynovos No, I didn’t try it. I can’t upgrade at this point, but I’m not seeing anything related to etcd snapshots in the release notes. :-/

brandond · 2024-11-22T17:07:30Z

I would recommend upgrading to v1.30.6 or newer. As per the upstream release notes:

https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.30.md#changelog-since-v1305
Fixes a kubelet and kube-apiserver memory leak in default 1.29 configurations related to tracing. (kubernetes/kubernetes#126984, @dashpole) [SIG API Machinery and Node]

You are likely seeing this memory leak in core Kubernetes.

boris-stojnev · 2024-11-22T17:46:49Z

I'm going to follow up after I enabled it again, to buy some time untill next cycle upgrade.

On a side note, you should be consistent in defining snapshot retention. It says per node, for example 30 per node, so my expectation is to have 30 on each node - local and 90 in the s3 (for 3 node etcd cluster), but there are 30 in total on s3 which means I have only the last 10 on s3.

brandond · 2024-11-22T18:40:41Z

but there are 30 in total on s3 which means I have only the last 10 on s3.

Not really related at all to the issue under discussion here, see instead:

S3 Snapshot Retention policy is too aggressive #5216

sawmod · 2025-01-17T09:59:30Z

@brandond it is not the memory leak in core Kubernetes, do not mislead your users.

we are experiencing exactly the issue @boris-stojnev described on RKE2 only - 1.28.14

One of the RKE2 control plane nodes with ETCD snapshots configured to be sent to S3 every 6 hours:

no such problem is observed on RKE1 - 1.28.13.

So it is definitely RKE2/k3s related.

We will disable ETCD backups to S3 to try confirming the root cause.

brandond · 2025-01-17T10:27:31Z

Noone is misleading anyone. Please demonstrate specifically what you are seeing. How much memory does this node have? What is this showing utilization of? Percentages with no units and no specific detail on what is consuming memory provide zero useful information.

I have looked at many pprof profiles and have yet to see any leaks in the k3s/rke2 codebase. If you can provide a heap profile I'd love to be proven wrong.

Rke1 doesn't have a supervisor process like rke2 does so there's not really any comparison to be made.

sawmod · 2025-01-17T11:09:09Z

@brandond

What I can provide currently:

VMs (Proxmox):
each control-plane+etcd node has 16 GB RAM, 8 vCPUs (1 socket); ubuntu 22.04; 5 VMs in total.

Context:

~64 days ago we configured ETCD to be backed up to S3 for all clusters via the Rancher UI.

Exactly after that on multiple control-plane+etcd VMs in multple RKE2 clusters the system-upgrade-controller Job Pods were spawned (50+ of them), with most of them being in status ContainerStatusUnknown. Example from one of the affected clusters:

On the VMs that saw gradual RAM usage the system-upgrade-controller job Pod is still running with logs showing:

W1114 07:53:36.013987       1 client_config.go:617] Neither --kubeconfig nor --master was specified.  Using the inClusterConfig.  This might not work.
time="2024-11-14T07:53:38Z" level=info msg="Applying CRD plans.upgrade.cattle.io"
time="2024-11-14T07:53:38Z" level=info msg="Starting /v1, Kind=Node controller"
time="2024-11-14T07:53:38Z" level=info msg="Starting /v1, Kind=Secret controller"
time="2024-11-14T07:53:38Z" level=info msg="Starting batch/v1, Kind=Job controller"
time="2024-11-14T07:53:38Z" level=info msg="Starting upgrade.cattle.io/v1, Kind=Plan controller"

rke2-server.service is gradually eating up to 14+GB in these VMS until we see the OOM killed errors for the api server and scheduler:

reboots fixes the issue but the the rke2-server.service is exhibiting the same behavior afterwards.

While the context above is too little to provide enough information to pinpoint the issue exactly with RKE2/k3s and/or their components, e.g. system-upgrade-controller; it is definitely enough to show that it was not the memory leak issues that were introduced only in core 1.29 are to blame.

serhiynovos · 2025-01-17T13:22:06Z

@sawmod how many snapshots do you have uploaded to s3 ?

boris-stojnev · 2025-01-17T13:40:58Z

I’m still at v1.30.4 but what I can add is, there is no need to restart rke2 service.
Any change in etcd snapshot config will free up memory, for example changing retention number from 100 to 101 will do the trick.
The more snapshots you keep, the faster memory will be utilized.

serhiynovos · 2025-01-17T13:45:37Z

@boris-stojnev yes. If you have not big amount of snapshots it's running for months without any touches to rke2. I remember about 6 months ago I had issue with memory but there was a problem that rke2 did not clean up old s3 snapshots.

brandond · 2025-01-17T18:33:39Z

@sawmod Can you open a separate issue? Whatever's going on with your SUC pods there is not something I have seen before and does not sound like what anyone else has reported. Please fill out the issue template, and note whether or not this system is managed by Rancher (and if so what version).

brandond · 2025-01-17T18:53:53Z

@boris-stojnev

I’m still at v1.30.4 but what I can add is, there is no need to restart rke2 service.
Any change in etcd snapshot config will free up memory, for example changing retention number from 100 to 101 will do the trick.

RKE2 does not support dynamic config reloading. If you change the config via the rancher UI, the service will be restarted to apply the change.

100 snapshots is a lot. Note that you get 1 snapshot per node on disk, plus the snapshots on s3- so if you have 3 etcd nodes that is 400 snapshots total.

Anyone who wants to contribute meaningfully should:

Start the affected node with enable-pprof: true in the config.yaml
Wait for the consumed memory to grow
Collect a heap memory profile: kubectl get --server https://127.0.0.1:9345 --raw '/debug/pprof/heap?gc=1' 1>heap.pprof.gz
Attach the pprof file and output of top -c -o %MEM or ps auxfww

vChrisR · 2025-03-04T10:53:13Z

We are experiencing the same problem: rke2-server eats up all the ram when storing etcd snapshots on S3.
Attached are te pprof and a screenshot of top. The memory is eaten up slowly. This pprof was taken about 20 hours after the last rke2-server restart on this node. Snapshots are currently happening every hour and we store 18 snapshots.

If needed I can post another pprof here after another day to show usage increase over time.

heap.pprof.gz

vChrisR · 2025-03-04T13:37:40Z

A few hours later.....

brandond · 2025-03-04T20:29:57Z

@vChrisR

You've not said what version you're using
184mb doesn't seem particularly excessive.
The processes that top shows using the most memory are etcd and kube-apiserver, NOT the rke2-server supervisor process that you have profiled.

vChrisR · 2025-03-05T08:12:00Z

1: rke2 v1.30.7+rke2r1
2: to enable pprof I had to restart rke2-server so ram usage went way down. As I said, eating up the ram takes time so i will be a while before it shows excessive use. If you compare the pprof output in my first post to the screenshot in the second you'll notice that flate is already eating up 91Mb after just running about 20 hours.
3: see 2... It will get to the top of the list after a week or so.

I'll keep monitoring this and post another pprof once the ram usage has increases signifiacntly

brandond · 2025-03-05T08:39:27Z

1.30.7 is a couple months old. The transparent transport-level decompression that is calling gzip.NewReader was disabled in v1.30.10:

[Release-1.30] - Unable to restore etcd snapshot from Netapp StorageGrid S3 endpoint k3s-io/k3s#11663
Disable s3 transport transparent compression/decompression k3s-io/k3s#11604

Please upgrade before performing additional testing.

vChrisR · 2025-03-05T08:59:43Z

@brandond Thanks! I'll test the new version.

brandond closed this as completed Aug 29, 2024

rke2 server eats up all the memory #6370

rke2 server eats up all the memory #6370

Comments

harridu commented Jul 18, 2024

harridu commented Jul 18, 2024

serhiynovos commented Jul 18, 2024

brandond commented Jul 18, 2024

serhiynovos commented Jul 18, 2024

harridu commented Jul 18, 2024 • edited Loading

serhiynovos commented Jul 18, 2024

harridu commented Jul 18, 2024 • edited Loading

harridu commented Jul 19, 2024

brandond commented Jul 22, 2024

serhiynovos commented Jul 24, 2024

mikejoh commented Aug 15, 2024 • edited Loading

brandond commented Aug 15, 2024

mikejoh commented Aug 20, 2024 • edited Loading

brandond commented Aug 20, 2024

brandond commented Aug 29, 2024

boris-stojnev commented Nov 22, 2024

brandond commented Nov 22, 2024 • edited Loading

boris-stojnev commented Nov 22, 2024

serhiynovos commented Nov 22, 2024

boris-stojnev commented Nov 22, 2024

brandond commented Nov 22, 2024 • edited Loading

boris-stojnev commented Nov 22, 2024 • edited Loading

brandond commented Nov 22, 2024 • edited Loading

sawmod commented Jan 17, 2025 • edited Loading

brandond commented Jan 17, 2025 • edited Loading

sawmod commented Jan 17, 2025 • edited Loading

serhiynovos commented Jan 17, 2025

boris-stojnev commented Jan 17, 2025

serhiynovos commented Jan 17, 2025 • edited Loading

brandond commented Jan 17, 2025

brandond commented Jan 17, 2025 • edited Loading

vChrisR commented Mar 4, 2025

vChrisR commented Mar 4, 2025 • edited Loading

brandond commented Mar 4, 2025 • edited Loading

vChrisR commented Mar 5, 2025

brandond commented Mar 5, 2025 • edited Loading

vChrisR commented Mar 5, 2025

harridu commented Jul 18, 2024 •

edited

Loading

harridu commented Jul 18, 2024 •

edited

Loading

mikejoh commented Aug 15, 2024 •

edited

Loading

mikejoh commented Aug 20, 2024 •

edited

Loading

brandond commented Nov 22, 2024 •

edited

Loading

brandond commented Nov 22, 2024 •

edited

Loading

boris-stojnev commented Nov 22, 2024 •

edited

Loading

brandond commented Nov 22, 2024 •

edited

Loading

sawmod commented Jan 17, 2025 •

edited

Loading

brandond commented Jan 17, 2025 •

edited

Loading

sawmod commented Jan 17, 2025 •

edited

Loading

serhiynovos commented Jan 17, 2025 •

edited

Loading

brandond commented Jan 17, 2025 •

edited

Loading

vChrisR commented Mar 4, 2025 •

edited

Loading

brandond commented Mar 4, 2025 •

edited

Loading

brandond commented Mar 5, 2025 •

edited

Loading