-
-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Stream does not store messages until restart or leader election #6391
Comments
Does this happen on the latest 2.10.24 version?
You need to be very careful when using Core NATS subs on the same subjects as JetStream streams. If you can avoid doing this then all the better, but if you can't, those Core NATS subs must not send back acks/replies, otherwise you will break clients. You might find that stream |
@neilalexander we haven't tried it, as we're fighting with the problem from Sept 2024 and at that time 2.10.20 was the latest. Is there a chance updates contain fix? |
Dealing with the same issue right now. We are also using core nats with jetstream and some of the streams are getting stuck on producing messages with error |
@neilalexander we have updated to 2.10.24, but it tooks time to proof that it helped as issue happens once in 2-3 weeks. @markovichecha since you also have similar issue, may you share your version of Nats server & nats client? |
Observed behavior
We observe behaviour that some times (no time correlation) Streams stop storing messages.
So as result publisher service starts producing
nats: no response from stream
error log, because of some reason it gets "no responders" error, like there is no such stream or leader election is in progress.Lets call problematic subject
some.subject
, which is stored in jetstream.It was a part of big stream, but the only that subject in stream was problematic, all others subject were stored well in the same stream.
In order to investigate the problem we separated subject to independent stream, but problem still there
For that subject we also had core subscriber, which for a long time generated us "nats: timeout" and "nats: invalid jetstream response" instead of normal error (see reported issue in client repo). So once we removed the core subscriber, we were able to get
nats: no response from stream
error.Incident happens from times to times (aprox once in 2w)
Our observations:
When we need to fix the problem, we did nats restarts, but later we found that just enough to init leader election for the problematic stream and system starts behave well. Some times the problem is self-healing, but we suppose that is because of leader election process happened.
We appreciate if somebody can help us, because we have no guess what went wrong if not Nats server issue
Expected behavior
Stream stores published messages
Server and client version
server: v2.10.20
client: v1.37.0
Host environment
Terraform definitions for stream:
3 nodes cluster with similar configs:
Steps to reproduce
No response
The text was updated successfully, but these errors were encountered: