Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Service using shuttle was stuck for a day, without any errors. #2157

Closed
chetankashetti opened this issue Jul 11, 2024 · 1 comment
Closed
Labels
t-bug A fix for a bug with the current system

Comments

@chetankashetti
Copy link

chetankashetti commented Jul 11, 2024

What is the bug?
Shuttle service was stuck for a day without any error logs or exceptions.

How can it be reproduced?
We have 3 shards running for live subscription. out of them two were stuck, shard-0 and shard-2.

We observed we are no more receiving the data from shuttle, and when we saw the logs there were no error logs.
some of the metrics we looked at was hubs (cpu and memory) and service(cpu and memory) and RDS all look totally fine. in fact underutilised.
some of the screenshots indicating no interaction and kept hanging state for a while not sure if even connection was still there.
image
image

While it was stuck for a day, first action we did was to restart the pod. when we did that it started syncing from the eventId it was stuck. it took few hours to sync. but once it was live, observed that the cast i made an hour back didn't get indexed, ideally it should have indexed? because live stream holds data for 3 days. and it missed my cast, similarly might have missed others as well.

So, just to summarise we wanted to know couple of things

  1. Why service was stuck at an eventId, without any error. though health of components looks good.?
  2. Does live event subscription cover all events if it was stopped for a an hour or two or for a while(less than 3 days) ?

we are not able to reproduce the issue, but we have observed only once.
Additional context

@github-actions github-actions bot added the s-triage Needs to be reviewed, designed and prioritized label Jul 11, 2024
@sds
Copy link
Member

sds commented Oct 31, 2024

Thank you for the report, sorry for the delay in response. Shuttle has seen multiple improvements related to issues such as this since this was opened. If you're still seeing this on the latest version of shuttle, feel free to open a new ticket with the latest evidence + details you have, as it is likely a different issue at this point.

Thank you!

@sds sds closed this as completed Oct 31, 2024
@sds sds added t-bug A fix for a bug with the current system and removed s-triage Needs to be reviewed, designed and prioritized labels Oct 31, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
t-bug A fix for a bug with the current system
Projects
None yet
Development

No branches or pull requests

2 participants