Redis sentinel going crazy on cpu usage #1664

applike-ss · 2025-02-14T07:12:45Z

Describe the bug
The used redis sentinel image does seem to have an issue.
There are some occasions when one of the redis pods is being rescheduled that the redis-sentinel on the other pods go crazy on cpu usage, it suddenly goes to exactly 1, though it was idling before. This likely is because redis is single threaded and it goes into some endless loop?
The affected redis-sentinel containers write this log message when this happens:

1:X 14 Feb 2025 04:14:04.072 * +reboot slave ip1:6379 ip2 6379 @ argocd ip3 6379

I could imagine that this happens when the masters pod is being rescheduled, but I didn't check.
While surely this is not an issue that this project may resolve, maybe there is a fixed version that could be used as a new default version?
EDIT: I saw that the used redis version is 6.2.4, so maybe an upgrade to 6.2.17 (sha256:905c4ee67b8e0aa955331960d2aa745781e6bd89afc44a8584bfd13bc890f0ae) or even higher (if argo supports it) might help already?

To Reproduce
Didn't try to create reproduction steps, but should be easy by spawning an argo instance with this operator and start deleting redis ha server pods one after the other and waiting for the new one to get healthy.

Expected behavior
Rescheduled redis pods should not make the other redis pods use a lot more cpu

Screenshots

Additional context

The text was updated successfully, but these errors were encountered:

applike-ss · 2025-02-19T07:21:52Z

I'm also seeing this with redis 7.2.7 (public.ecr.aws/docker/library/redis:7.2.7-alpine)

mfroembgen · 2025-02-24T07:53:55Z

Its fixed in the helm chart of argocd: argoproj/argo-cd#20645

svghadi · 2025-03-07T09:15:24Z

Reopening the issue. The fix results into a crashLoopBackOff state for redis when custom certificates are used with redis. Reverting the change for now to unblock the release.

mfroembgen · 2025-03-07T14:41:45Z

@svghadi I added a PR that should fix crashLoopBackOff and handle redis custom certificates

svghadi added the triage:required label Feb 14, 2025

mfroembgen mentioned this issue Feb 24, 2025

fix: Add PostStart lifecycle hook to reset Redis sentinel #1674

Merged

2 tasks

svghadi removed the triage:required label Feb 25, 2025

iam-veeramalla closed this as completed in #1674 Feb 26, 2025

svghadi reopened this Mar 7, 2025

mfroembgen linked a pull request Mar 7, 2025 that will close this issue

fix: Add PostStart lifecycle hook to reset Redis sentinel #1685

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Redis sentinel going crazy on cpu usage #1664

Redis sentinel going crazy on cpu usage #1664

applike-ss commented Feb 14, 2025 •

edited

Loading

applike-ss commented Feb 19, 2025 •

edited

Loading

mfroembgen commented Feb 24, 2025

svghadi commented Mar 7, 2025 •

edited

Loading

mfroembgen commented Mar 7, 2025

Redis sentinel going crazy on cpu usage #1664

Redis sentinel going crazy on cpu usage #1664

Comments

applike-ss commented Feb 14, 2025 • edited Loading

Additional context

applike-ss commented Feb 19, 2025 • edited Loading

mfroembgen commented Feb 24, 2025

svghadi commented Mar 7, 2025 • edited Loading

mfroembgen commented Mar 7, 2025

applike-ss commented Feb 14, 2025 •

edited

Loading

applike-ss commented Feb 19, 2025 •

edited

Loading

svghadi commented Mar 7, 2025 •

edited

Loading