fix: race condition with interruptibles #566
Open
+9
−5
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Note: this blocking / nonblocking threaded / forked code is hella confusing to me so I might have gotten some stuff wrong here.
The Problem
When running
TARGET_DB=sqlite bin/rails test --seed 38423
I consistently had failures like:I believe these failures are not sqlite exclusive but are more noticeable there cause it was not running in docker.
The test:
SolidQueue.on_scheduler_start {...}
pid = run_supervisor_as_fork(...)
terminate_process(pid)
JobResults
from the callbacks were createdInstead of the Scheduler receiving the
TERM
signal, gracefully shutting down and running callbacks it would not do anything in response to the signal, hit the timeout, receive aKILL
signal and not run the callbacks.Race Condition
It seems like the issue was
self_pipe
being slow:One process would call
run
and while it was creating the pipe another process would callstop
.This is a race condition. Both processes end up creating their own pipes and when
interrupt
writes to the pipe the does not go anywhere cause the other process has a different pipe.If this race condition happens Process A will sleep until it gets killed.
Solution
I ended up fixing this by getting rid of the lazy initialization of the pipe. That way any process that is accessing the same scheduler will have the same pipe.
Alternatively maybe there should be a Mutex around the setter for
self_pipe
.