Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Redis sentinel going crazy on cpu usage #1664

Open
applike-ss opened this issue Feb 14, 2025 · 4 comments · Fixed by #1674 · May be fixed by #1685
Open

Redis sentinel going crazy on cpu usage #1664

applike-ss opened this issue Feb 14, 2025 · 4 comments · Fixed by #1674 · May be fixed by #1685

Comments

@applike-ss
Copy link

applike-ss commented Feb 14, 2025

Describe the bug
The used redis sentinel image does seem to have an issue.
There are some occasions when one of the redis pods is being rescheduled that the redis-sentinel on the other pods go crazy on cpu usage, it suddenly goes to exactly 1, though it was idling before. This likely is because redis is single threaded and it goes into some endless loop?
The affected redis-sentinel containers write this log message when this happens:

1:X 14 Feb 2025 04:14:04.072 * +reboot slave ip1:6379 ip2 6379 @ argocd ip3 6379

I could imagine that this happens when the masters pod is being rescheduled, but I didn't check.
While surely this is not an issue that this project may resolve, maybe there is a fixed version that could be used as a new default version?
EDIT: I saw that the used redis version is 6.2.4, so maybe an upgrade to 6.2.17 (sha256:905c4ee67b8e0aa955331960d2aa745781e6bd89afc44a8584bfd13bc890f0ae) or even higher (if argo supports it) might help already?

To Reproduce
Didn't try to create reproduction steps, but should be easy by spawning an argo instance with this operator and start deleting redis ha server pods one after the other and waiting for the new one to get healthy.

Expected behavior
Rescheduled redis pods should not make the other redis pods use a lot more cpu

Screenshots

Image Image

Additional context

@applike-ss
Copy link
Author

applike-ss commented Feb 19, 2025

I'm also seeing this with redis 7.2.7 (public.ecr.aws/docker/library/redis:7.2.7-alpine)

@mfroembgen
Copy link
Contributor

Its fixed in the helm chart of argocd: argoproj/argo-cd#20645

@svghadi
Copy link
Collaborator

svghadi commented Mar 7, 2025

Reopening the issue. The fix results into a crashLoopBackOff state for redis when custom certificates are used with redis. Reverting the change for now to unblock the release.

@mfroembgen
Copy link
Contributor

@svghadi I added a PR that should fix crashLoopBackOff and handle redis custom certificates

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants