-
Notifications
You must be signed in to change notification settings - Fork 65
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
make python GC always run on thread 1 #520
Conversation
""" | ||
function enable() | ||
ENABLED[] = true | ||
notify(ENABLED_EVENT) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could wait for the GC_finished event here to preserve the semantics that this function call only returns after all GC events are processed
Very cool thanks. Will take me a while to absorb - I've not used Julia's multithreading much. I'd be very interested to see some benchmarking to see how this affects performance in the single threaded case. Another worry is what happens if we're running from JuliaCall? An issue with JuliaCall is that Julia's event loop doesn't run while in Python-land (unless you call |
Do you have any good examples for a single-threaded python-object-heavy workload to benchmark? I think it's something that would benefit from a slightly realistic workload over just calling I also haven't used juliacall before so I'll have to figure out how to get that setup. |
task = Task(gc_loop) | ||
task.sticky = VERSION >= v"1.7" # disallow task migration which was introduced in 1.7 | ||
# ensure the task runs from thread 1 | ||
ccall(:jl_set_task_tid, Cvoid, (Any, Cint), task, 0) | ||
schedule(task) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in the Runtime: 15.873462 seconds (164.15 M allocations: 3.690 GiB, 24.07% gc time)
Python GC time: 2.328511 seconds
Python GC time (waiting): 0.000013 seconds
Next full GC: 2.217151 seconds (9.03% gc time)
Total: 18.104279 seconds (164.15 M allocations: 3.690 GiB, 22.21% gc time) with Julia 1.10, with This matches the performance I get on Runtime: 16.901158 seconds (164.15 M allocations: 3.690 GiB, 24.29% gc time)
Python GC time: 2.633090 seconds
Next full GC: 2.115611 seconds (9.28% gc time)
Total: 19.023249 seconds (164.15 M allocations: 3.690 GiB, 22.62% gc time) (obtained with these changes + script). The segfault in CI is concerning, it is on the pyjulia side which I have not looked into yet. |
I think the segfault was just since I forgot to acquire the GIL after refactoring to add the fast path, should be fixed.
I’ve added some off-by-default logging so we can see when this background task kicks in. I’ll try running from pyjulia and see if it seems to be behaving as expected. Btw do you mean the libuv event loop? I think the task runtime is separate from that but also I’m not really sure how it all works under the hood.
I’m happy to hop on a call if that would be helpful btw. |
I added a test for pyjulia and addressed the first point of #219 (comment) by turning on signal handling if Julia is multithreaded, since threads + no signal handling is unsupported in Julia and leads to crashes. I also added threads to the CI matrix to prevent regressions on this. |
Thanks for your work so far. Would be cool to implement and test the method from JuliaPy/PyCall.jl#883 too - the simplicity (and avoidance of anything async) is appealing.
|
Looks awesome! Regarding testing, it could be nice to try something like rr's chaos mode to assault the garbage collector and reveal any potential thread safety issues. (Edit: but this would take a while, whereas this PR already fixes some issues. So I don't think such testing is essential for the moment) However, it is tricky to start Julia with rr from within Python – see JuliaLang/julia#52689. So the test would have to be run from the Julia side of PythonCall.jl, and open Python from there, rather than the other direction. But maybe that would be good enough for testing this. I believe it can be done with julia --bug-report=rr-local,chaos |
Btw, @lassepe reports that this PR fixes the issues we had ran into here: avik-pal/Wandb.jl#27 |
I see what you mean, if I stick t = Threads.@spawn begin
while true
println("Background task running $(time())")
sleep(1)
end
end
Base.errormonitor(t) into the |
BTW I wonder if this is an issue with the status quo solution as well? If a finalizer doesn't run (e.g. since GC hasn't run yet), the refcount won't be decremented. Do GC/finalizers run "in the background" properly in a way the tasks don't? |
I think another solution here would be to acquire the GIL from non-thread-1 threads (with PyGILState_Ensure() and release) and do the decrement eagerly within the finalizer. I'm not sure how performant that would be but it might be worth a try. The other thing is we want to make sure we can't deadlock somehow. Do we currently hold the GIL the whole time from thread 1? |
Basically yes - thread 1 holds the GIL all the time. Using PythonCall from Julia, the GIL is acquired implicitly when the Python runtime is initialised and released when it is exited. It is never released anywhere else by PythonCall. Using JuliaCall from Python, the GIL is managed by Python as usual. Most of the time it is already acquired by the Python thread calling into PythonCall. Again, PythonCall never explicitly handles the GIL. (Of course, you could call some Python code - such as ctypes or multithreading - which itself releases the GIL. But then it's currently documented that you should only call back into PythonCall from thread 1. Hence the above statement about the GIL always being held on thread 1 is still true while in any PythonCall code. If we're careful about it, we may be able to relax this.) |
Yep that's true. However with the status quo, the Python objects do get freed when Julia's GC is run. With the new solution, they only get freed after GC is run AND the async task you added gets yielded to. GC gets called frequently (it's hard to completely avoid allocating when using PythonCall) but most things you might do in JuliaCall probably don't yield. |
If you call |
Some alternative approaches:
|
See #529 for a draft of an alternative approach without tasks. |
See #534 for a third approach to this. I think it's my favourite approach. |
Closing in favour of #529 |
closes #201
Here, we make the GC queue threadsafe (by using a Channel), so we can safely enqueue objects from any thread. Then we process it from a dedicated task which runs on thread 1, which is the thread in which we have initalized the correct state in the python runtime (https://docs.python.org/3/c-api/init.html#non-python-created-threads). We continue to respect
GC.enable()
andGC.disable()
but it should not be very useful anymore since it is no longer required to disable GC in threaded regions. (However users must still only interact with the python runtime via thread 1, unless they setup the right state).This was a bit complicated to test unfortunately, but if I have reasoned correctly (if!) there should be no races in the testing.
Notes:
__init__
during precompilation (will it hang precompilation? is it fine to not GC then)?Threads.Condition
orThreads.Event
before, hopefully I got it right