Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to restore etcd snapshot from Netapp StorageGrid S3 endpoint #11603

Closed
brandond opened this issue Jan 14, 2025 · 5 comments
Closed

Unable to restore etcd snapshot from Netapp StorageGrid S3 endpoint #11603

brandond opened this issue Jan 14, 2025 · 5 comments
Assignees
Labels
area/etcd kind/upstream-issue This issue appears to be caused by an upstream bug

Comments

@brandond
Copy link
Member

brandond commented Jan 14, 2025

From rancher/rke2#6285 (reply in thread)

NetApp replied that they were able to reproduce the error in a lab environment. They inspected the network traffic and determined that the issue is caused by "the minio client library in RKE2 having a bug where gzip streamed data is terminated prematurely". They say that RKE2 sets Accept-Encoding: gzip but that the header is not honored by AWS or MinIO and thus they send non-compressed data back to RKE2. NetApp say that this is why it works on AWS but not in StorageGRID, and that StorageGRID always honors the Accept-Encording header.

It looks like minio has their own default transport which disables compression to work around this issue. We don't observe that same behavior when setting up the transports that was pass in to minio - which we do so that we can configure TLS options.
https://github.com/minio/minio-go/blob/v7.0.83/transport.go#L41-L64

We should disable compression in the http.Transport that we are passing to minio, as they do the same internally when constructing their default transports.

@fmoral2
Copy link
Contributor

fmoral2 commented Feb 13, 2025

this is validate here, since it was performed snapshots uploading and restoring from S3
#11609

@fmoral2 fmoral2 closed this as completed Feb 13, 2025
@github-project-automation github-project-automation bot moved this from To Test to Done Issue in K3s Development Feb 13, 2025
@lindhe
Copy link

lindhe commented Feb 13, 2025

@fmoral2 Just to clarify: did you validate that it worked with NetApp StorageGRID? Because it worked on other S3 implementations already before, this fix is just for StorageGRID.

@fmoral2
Copy link
Contributor

fmoral2 commented Feb 13, 2025

@fmoral2 Just to clarify: did you validate that it worked with NetApp StorageGRID? Because it worked on other S3 implementations already before, this fix is just for StorageGRID.

so we cant really validate there right since its a payed solution?

as per comments we added the fix from minio
https://github.com/k3s-io/k3s/pull/11604/files#diff-225890fd4fbceafdd32b01f99473cad767cfcd77f03ded1731c44166c0c7db18R175

and after the fix/update still working

@lindhe
Copy link

lindhe commented Feb 13, 2025

That's right, it's an enterprise solution with custom hardware and a paid license, so I don't expect you to be able to validate it.

I'm happy to try and validate this for you (since I posted the original issue), as soon as there's a build ready. I would prefer to have an RKE2 build, so I can keep my config the same as when I tested before. As soon as that has been released, I'll let you know.

@lindhe
Copy link

lindhe commented Mar 7, 2025

I tried it, and it works now!! 🚀

I first tried with v1.30.9 and it didn't work, then I tried v1.30.10 and it worked! At least for a single-node cluster. Will try out a HA cluster later today.

Thank you so much for this fix, @brandond !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/etcd kind/upstream-issue This issue appears to be caused by an upstream bug
Projects
Status: Done Issue
Development

No branches or pull requests

3 participants