Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

What can cause the queue to be blocked ? #29

Open
sabativi opened this issue Oct 16, 2024 · 2 comments
Open

What can cause the queue to be blocked ? #29

sabativi opened this issue Oct 16, 2024 · 2 comments

Comments

@sabativi
Copy link

Hello,

I am running my app on galaxy, jobs are becoming more and more important for the apps ( around 100 000 per days )

I can erase some, but I need to keep others, so that jobs_data collection is around 5GB and growing, can this be a problem ?

The size of the pending jobs keeps growing and I do not understand why there is no polling of jobs.

The jobs dominator is ok, the date is always closed to the current time.

I know this question is vague, I am trying to find a starting point

Thanks

@wildhart
Copy link
Owner

I can erase some, but I need to keep others, so that jobs_data collection is around 5GB and growing, can this be a problem ?

If you need to keep some completed jobs, then yes, the collection will grow indefinitely. I shouldn't think that's a problem though. Why do you need to keep some? If it's for logging purposes, then it may be an idea to make a separate collection for keeping logs.

Normally if you resolve a job with this.remove() then it is removed from the collection automatically.

  • How many different jobs names are you running?
  • Are some job names running but some don't, or do all jobs run for a while then all jobs get blocked?
  • Are they blocked forever, or does it eventually start running again?
  • Do you have any async jobs?

If you do have some async jobs, take a look at the async/await section of the docs. Do you have awaitAsync: true in your config? Are the async jobs completing properly or could they be stopping without resolving or take a long time to resolve? If both of these are "yes", then a un-resolved async job will block other jobs of the same name.

  • Do you have logging enalbed in Jobs.configure() (defaults to enabled)?

If so, can you report what's happening in the logs when the queue becomes blocked?

When the server is started, or when all due jobs are completed, the dominator should look for the next job and set a timer. This will appear in the logs as:

Jobs: queue.observer added {"due":"2024-10-16T19:00:00.000Z","name":"sendSignOutReminders","_id":"WakNMKX9cF7NWsWWE"} 0.28h

Then, when that job is executed you should see:

Jobs: executeJobs paused: []
Jobs: queue.observer stop undefined undefined
Jobs:   sendSignOutReminders
Jobs: setJobState WakNMKX9cF7NWsWWE executing 1

Then when it's completed (with this.remove()), it will remove the job, and if there are no other jobs due at the current time it will look for the next job:

Jobs:     Jobs.remove WakNMKX9cF7NWsWWE 1
Jobs:     Done job sendSignOutReminders result: remove
Jobs: queue.start paused: []
Jobs: queue.observer added {"due":"2024-10-16T19:00:00.000Z","name":"sendSignOutReminders","_id":"oNHB4uDwpKzTGwuHX"} 0.24h

Can you see logs like this?

  • Do you have multiple hosts in your galaxy setup?

The job queue can only run on one host at a time, and if the jobs host stops then it can take some time for one of the other host(s) to decide to take control of the queue. This is configured by the maxWait configuration parameter.

  • Are you sure the 'blocked' jobs have the correct due date - not some date in the future? i.e. can you query the jobs collection for state: 'pending' where the due date is in the past?
  • Do you have Jobs.start(...) or Jobs.stop(...) anywhere in your code?

If none of this helps, can you share some of your job definition and job schedulling code, plus Jobs.configure() if you have it?

@sabativi
Copy link
Author

Thank you so much for this detailed answer. Here are mines

I some of them because it is important for our business to demonstrate data that we handle, I can duplicate this in another collection but I do not think this can caused a problem as you mentionned

How many different jobs names are you running?

Around 200

Are some job names running but some don't, or do all jobs run for a while then all jobs get blocked?

No, either all are blocked or all are running

Are they blocked forever, or does it eventually start running again?

It depends, sometimes it takes ages

Do you have any async jobs?

Yes lot of them

Do you have logging enalbed in Jobs.configure() (defaults to enabled)?

I had not but I have now, I have lot of

Jobs queue.observer added {
due: 2024-10-23T10:00:22.513Z,
name: 'email-tracking',
 'Yt4HZWwRzJsDS4kXy'
} -0.00h

Do you have multiple hosts in your galaxy setup?

Are you sure the 'blocked' jobs have the correct due date - not some date in the future? i.e. can you query the jobs collection for state: 'pending' where the due date is in the past?

Yes all jobs except one should run ASAP

Do you have Jobs.start(...) or Jobs.stop(...) anywhere in your code?

No

Of course since this post everything works great, I will wait for next blocking to send you the logs

Thanks again

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants