Lambda in main thread and Enumerator support #356

takahiro-blab · 2025-04-24T14:36:33Z

Change the lambda to be invoked on the main thread (closes Lambda producer should be run in the main thread #296)
Add support for an infinite Enumerator (without calling .to_a method) (closes Does not work with enumerators #174, closes Infinite series expansion #211)
Change the lambda to be able to stop operations not only returning Parallel::Stop, but also raising StopIteration.

Code summary:

If JobFactory has a lambda, it now has one queue (@runloop_queue)
Worker threads push their own queue to JobFactory's queue.
JobFactory pushes jobs to threads' queue in main (caller of Parallel singleton method) thread.
Parallel.in_threads method counts finished threads. After all worker threads have finished, last worker thread will call JobFactory#stopper. It stops the loop in caller thread.
If worker threads don't have their queue, (Ex. Parallel.work_in_ractors) JobFactory will operate as before.
If the source can be accessed by the index (like Array), JobFactory will operate as before.

Thanks to the clean design of the original code, these were possible with minimal changes.

Existing RSpec tests have been all passed in my environment.
(I have run test in CRuby 3.4.3 on x86_64 machine except pending tests.)

Limitations:

These change have not been benchmarked, so it's possible that it may introduce some performance degradation.
The code has only been tested with CRuby (MRI), and has not been verified on other Ruby implementations such as JRuby.

This implementation uses several Thread::Queue instances. While this may not be the most optimal approach, it helps address a few specific issues.

I have added some test codes, but I wasn't completely sure where the test should be placed in parallel_spec.rb, so I added it where it seemed to fit best. Feel free to move or adjust it if you think there's a better spot.

grosser · 2025-04-25T02:21:09Z

lib/parallel.rb

+        return if @stopped
+        item = runloop_enq(queue_for_thread)
+        return if item == Stop


why is it not like this ?

Suggested change

return if @stopped

item = runloop_enq(queue_for_thread)

return if item == Stop

item = runloop_enq(queue_for_thread)

@stopped = (item == Stop)

return if @stopped

@stopped will be set in JobFactory#runloop .

This JobFactory#next may be called from some threads at the same time.
So your @stopped = (item == Stop) needs exclusive control Mutex#synchronize .

Taking the value out of the @lambda and Setting the result to@stopped must be handled in the critical section or be handled in one thread, I think.
Otherwise, a certain thread may clear @stopped flag. This could be a bug.

So previous implementation of Parallel uses @mutex.synchronize .
This PR's code handles @lambda and check item == Stop in #runloop by one thread, so @mutex is given, but it's not used.

ah thx, yeah this is a tricky section :)
can you leave a bit of inline comment for the gotchas

I have added comment to this code.

grosser · 2025-04-25T02:21:38Z

lib/parallel.rb

      [item, index]
    end

+    def runloop


would this make sense ?

Suggested change

def runloop

def run_runloop

some method comments would help here too, what does it do exactly

You're right. Please feel free to change anything.

lib/parallel.rb

grosser · 2025-04-25T02:27:10Z

lib/parallel.rb

      array.respond_to?(:num_waiting) && array.respond_to?(:pop) && -> { array.pop(false) }
    end
+
+    def enum_wrapper(source)


maybe give some examples of what types this is trying to detect

You are asking about #enum_wrapper , aren't you?

This method aims converting Enumerator instance and objects including Enumerable to Method instance which calls #next , but objects which is accessible by [] method shouldn't be converted. It's because accessing by index is faster and it can avoid serializing problems, you know.

So, as first, checking [] method, if [] method is available, it returns false. Next, if #next method is available, returns Method instance.

For example:

enum_wrapper([1,2,3]) # -> false enum_wrapper(1..5) # -> Method ( (1..5).method(:next) ) enum_wrapper(Prime.to_enum) # -> Method (See infinite_sequece.rb test case)

grosser · 2025-04-25T02:27:48Z

lib/parallel.rb

      threads = []
-      count, = extract_count_from_options(options)
+      count, options = extract_count_from_options(options)
+      finished_monitor = options[:runloop] && Queue.new(1..(count - 1)) # Insert values, one less in count than the number of threads.


explain why 1 less

It's because the last thread must raise ThreadError by calling empty queue's #pop method.

For example, if there are 5 worker threads (when option[:count] is 5), Queue finished_monitor will have 4 values.
Each of five worker threads will execute finished_monitor.pop(true), then 4 of them can get value from finished_monitor.
But the last one thread don't get value, and a ThreadError exception will be raised.
So JobFactory#stopper will be called only once.
(I just realized that, perhaps, multiple calls of JobFactory#stopper may has no side effect... If so, these could be written more concisely without using Queue finished_monitor.)

In rescue section, the last one thread will call stopper.call, which stops JobFactory#runloop.

Queue#pop raises ThreadError when true is given as argument and the queue is empty.

ah got it, very complicated ... can you leave some short inline comment to explain a bit

I have added comment to this code with below question.

grosser · 2025-04-25T02:28:29Z

lib/parallel.rb

+              yield(i)
+            ensure
+              begin
+                finished_monitor&.pop(true) # This must be executed even if the worker thread is killed (by #work_in_processes).


explain why it needs to be executed

If this is not case, JobFacotry#runloop will block in queue = @runloop_queue.pop and it cannot come back from #runloop.
To main thread's surely finishing options[:runloop]&.call (JobFactory#stopper), each of worker threads must call finished_monitor&.pop(true) even if it is killed. So it's in the ensure section.
(Please look at above question)

(JobFactory#stopper pushes Stop JobFactory's @runloop_queue, so it make #runloop finish.)

And, this logic is also necessary for terminating operations by Ctrl+C or workers' throwing Parallel::Kill.
(When Parallel::Kill or Break is thrown, worker threads will be killed by UserInterruptHandler.kill in #work_in_processes)

thx, can you leave a bit of this inline for future archeologists :)

I have added comment with above question.

lib/parallel.rb

grosser · 2025-04-25T02:30:39Z

lib/parallel.rb


      UserInterruptHandler.kill_on_ctrl_c(workers.map(&:pid), options) do
-        in_threads(options) do |i|
+        in_threads(options.merge(runloop: job_factory.method(:runloop), stopper: job_factory.method(:stopper))) do |i|


can we reuse the options from line 408 ?

Sorry, where is line 408?
If you mean "thread_options = options.merge(runloop: job_factory.method(:runloop), stopper: job_factory.method(:stopper))" of above discussion, of course we can cut out this part to a variable and reuse it.

I have fixed and pushed the change in the same way as the other calling. Is this correct?

grosser · 2025-04-25T02:31:59Z

lib/parallel.rb

            loop do
              break if exception
-              item, index = job_factory.next
+              item, index = job_factory.next(queue_for_thread)


could the factory take care of the queue handling by using Thread.current[:parallel_queue] ||= Thread::Queue.new or so ?

Value of Thread#[] will vary by Fiber change, so it may be better to use Thread#thread_variable_get and #thread_variable_set. Whether or not, using thread local variables may take care of the queue more simply.

I have tried making the change which use Thread#thread_variable_get and #thread_variable_set for the queue handling, and I have pushed it.

spec/cases/infinite_sequence.rb

grosser · 2025-04-25T02:35:59Z

thanks, looks all very well thought out to fix these edge-cases :)

mostly looks good, but more comments would help make this easier to understand
runloop is kinda vague, if you have a better name that would be great

takahiro-blab · 2025-04-25T07:39:22Z

Thank you for your comments and reviews.
Handling threads is very difficult.

Certainly, runloop may be vague.
How about poploop, lambda_loop, feeder_loop?

Co-authored-by: Michael Grosser <[email protected]>

…style Co-authored-by: Michael Grosser <[email protected]>

…eads Co-authored-by: Michael Grosser <[email protected]>

Co-authored-by: Michael Grosser <[email protected]>

…variable_get

takahiro-blab · 2025-04-28T06:35:30Z

I have created a commit accepting some of the suggestions.
In addition, I have committed changes to Readme.md related to this pull request.

grosser · 2025-04-28T23:14:32Z

lib/parallel.rb

      [item, index]
    end

+    def runloop


Suggested change

def runloop

def consume_enumerator_queues

does this work ?

# consume items for from enumerator queues until they stop producing

I noticed the comment says "enumerator queues", but as far as I can tell, there's only one queue being consumed from in this method.
Does this enumerator queues mean JobFactory's @lambda, doesn't it?
Should this be singular instead?

lib/parallel.rb

grosser · 2025-04-28T23:21:41Z

would producer_queues or enumerator_queues make sense ?
and then consume_from_<x>_queues as method

Co-authored-by: Michael Grosser <[email protected]>

takahiro-blab · 2025-04-30T10:05:40Z

Sorry I am not very good at English,

would producer_queues or enumerator_queues make sense ?

but is this a reference to JobFactory's instance variable @runloop_queue ?

If so, @runloop_queue is a line where workers stand in to get a job. (Actually workers push their own queue to @runloop_queue )

So do these make sense?
Instance variable @runloop_queue -> @workers_line_queue
Method runloop -> distribute_work

grosser · 2025-05-01T04:52:01Z

maybe worker_queues ?

distribute_work -> consume_worker_queue ?

if the method will be used by everything then something general like distribute_work is fine,
but if it only deals with the enumerator queues, then I'd like to be specific and consistent (use a consistent prefix/suffix like enumerator_queues) so make it clear which part of the codebase belongs to that feature

grosser · 2025-05-01T04:52:40Z

basically make it easy to ignore a big chunk of the code if debugging something unrelated and make it easy to spot all the things related when debugging enum bugs

- @runloop_queue -> @worker_queues - runloop -> consume_woker_queues - stopper -> stop

takahiro-blab · 2025-05-01T10:01:52Z

I have renamed JobFactory's instance variable @runloop_queue and #runloop method.
And I have fixed where CI of Ruby 2.7 was failing.

grosser · 2025-05-03T17:48:44Z

thx for all the update ✨

I can make a separate PR to get rid of ruby 2.7 if that is causing issues, meant to do that for a while but never really broke anything.

Can you rename the last few runloop leftovers to worker_queue or something similar ?

Can it check queue.empty? instead of relying on an exception ? (because they are expensive and can lead to warnings)

takahiro-blab · 2025-05-04T14:01:43Z

Can it check queue.empty? instead of relying on an exception ? (because they are expensive and can lead to warnings)

Well, without exception, a mutex is necessary. Like this:

# In Parallel.in_threads method
  if options[:runloop]
    finished_monitor = Queue.new # In Ruby 3.0 or earlier, Queue#initialize doesn't receive initial values.
      (1..(count - 1)).each { |i| finished_monitor.push(i) }
  end
  runloop_stopper = options[:stopper]
  mutex = options[:runloop] ? Mutex.new : nil # ADD MUTEX

  Thread.handle_interrupt(Exception => :never) do
    Thread.handle_interrupt(Exception => :immediate) do
      count.times do |i|
        threads << Thread.new do
          yield(i)
        ensure
          mutex&.synchronize { # Add critical section
            if finished_monitor
              if finished_monitor.empty?
                runloop_stopper&.call
              else
                finished_monitor.pop
              end
            end
          }
        end
##### Omitted below #####

It's bacause some threads would check whether the queue is empty and call Queue#pop at the same time.
Without a mutex, after a thread checks whether the queue is empty, other thread may check and call Queue#pop. It will cause race condition. So a mutex is necessary. And the mutex overhead must be considered.
Queue#pop(true) is probably atomic in Ruby level, so additional mutex isn't necessary.

However, thinking about it, if a mutex is used, it's not necessary to use a Queue, which is thread-safe. A mutex and a simple counter variable are enough.
So, the above code can be rewritten like this:

# In Parallel.in_threads method
  counter = count - 1 # COUNTER
  runloop_stopper = options[:stopper]
  mutex = options[:runloop] ? Mutex.new : nil # ADD MUTEX

  Thread.handle_interrupt(Exception => :never) do
    Thread.handle_interrupt(Exception => :immediate) do
      count.times do |i|
        threads << Thread.new do
          yield(i)
        ensure
          mutex&.synchronize { # Add critical section
            if counter <= 0
                runloop_stopper&.call
            end
            counter -= 1
          }
        end
##### Omitted below #####

I haven't tested these two above. However I think my original implementation, which uses one queue and exception, is simpler code. There is no one answer because this is a subjective issue, isn't it?
Which do you think is better?

You are welcome to push any code you think is good, not just this one, also about the names of variables, to the topic branch.

takahiro-blab · 2025-05-04T14:43:57Z

After considering, I think using a mutex and a counter is good, so I have pushed the change.

grosser · 2025-05-07T17:01:02Z

lib/parallel.rb

+        # so it's not necessary to check for Stop here.
+        item = worker_queues_enq(queue_for_thread)
+        return if item == Stop


comment says that we don't need to check for stop but then we check for stop ?

This "check for stop" means assigning item == Stop to @stopped. This comment means "We must not assign check result to @stopped." The comment does not refer to the returning....

lib/parallel.rb

grosser · 2025-05-07T17:06:40Z

lib/parallel.rb

+    def consume_worker_queue
+      return unless @worker_queues
+
+      loop do


is this right ?

Suggested change

loop do

# every time a threads wants to start work, it adds a new queue, we pop the queue here until everything is done (stop)

# then push a new item into the queue for the thread to read and work on

loop do

grosser · 2025-05-07T17:09:09Z

lib/parallel.rb

+      Stop
+    end
+
+    def worker_queues_enq(queue_for_thread)


would this also work if we did:

def worker_queues_enq queue = Thread::Queue.new @worker_queues.push(queue) queue.pop # Wait until @lambda to give us an item to work on end

so the caller has less state to take care of

... also maybe method name wait_for_item ?

Your code also works. However, well, your code makes as many queues as jobs. Those queues remain in memory until they are collected by the GC. I do not consider creating a new queue for every jobs a good implementation.
Furthermore, when we look back on it later, we will wonder “Why does this code create a new queue here?”, wouldn't we think?

In a frank implementation, I would think the queues would be reused, but why make a queue every time?

makes sense to reuse the queues, was mostly trying to understand if that's how it is supposed to work :)

lib/parallel.rb

grosser · 2025-05-07T17:12:08Z

code is getting more obvious 👍 (or I read it too many times already :D)

Co-authored-by: Michael Grosser <[email protected]>

grosser · 2025-05-13T01:14:40Z

rubocop needs a small fix, otherwise looks good 🤞

takahiro-blab added 4 commits April 24, 2025 18:40

Invoke lambda in the main thread and add support for infinite sequences

36af69b

Add test for infinite sequence

28b74f9

Add test for stopping by StopIteration

5d5eb63

Add test for lambda called in same thread

9d9bd11

grosser reviewed Apr 25, 2025

View reviewed changes

lib/parallel.rb Show resolved Hide resolved

grosser reviewed Apr 25, 2025

View reviewed changes

lib/parallel.rb Outdated Show resolved Hide resolved

grosser reviewed Apr 25, 2025

View reviewed changes

lib/parallel.rb Outdated Show resolved Hide resolved

grosser reviewed Apr 25, 2025

View reviewed changes

spec/cases/infinite_sequence.rb Show resolved Hide resolved

takahiro-blab and others added 6 commits April 28, 2025 14:12

Add comment on JobFactory#runloop

756685f

Co-authored-by: Michael Grosser <[email protected]>

Rewrite JobFactory#stopper from endless (one-line) methods to legacy …

60ff3ac

…style Co-authored-by: Michael Grosser <[email protected]>

Extract merged options into a variable before calling Parallel.in_thr…

4823fcc

…eads Co-authored-by: Michael Grosser <[email protected]>

Add comment spec/cases/infinite_sequence.rb

3eda109

Co-authored-by: Michael Grosser <[email protected]>

Make thread queue handing use Thread#thread_variable_set and #thread_…

7e8a77c

…variable_get

Update Readme.md

20cb415

grosser reviewed Apr 28, 2025

View reviewed changes

lib/parallel.rb Show resolved Hide resolved

Add comment on JobFactory#enum_wrapper

7e76725

Co-authored-by: Michael Grosser <[email protected]>

takahiro-blab added 3 commits May 1, 2025 15:20

Rename JobFactory's some methods and variables.

5eea888

- @runloop_queue -> @worker_queues - runloop -> consume_woker_queues - stopper -> stop

Fix for support Ruby 3.0 and earlier.

ec0083b

Add comment

074eab7

Rename last few runloop leftovers to worker_queue

70208e6

Make use a mutex and a counter instead of a Queue in Parallel.in_threads

98a74e4

grosser reviewed May 7, 2025

View reviewed changes

lib/parallel.rb Outdated Show resolved Hide resolved

grosser reviewed May 7, 2025

View reviewed changes

lib/parallel.rb Outdated Show resolved Hide resolved

takahiro-blab and others added 2 commits May 12, 2025 18:58

Update comment of Parallel.in_threads in lib/parallel.rb

f3a9dd2

Co-authored-by: Michael Grosser <[email protected]>

Rename JobFactory#consume_worker_queue to consume_worker_queues

9b9f4bd

Co-authored-by: Michael Grosser <[email protected]>

-      loop do
+      # every time a threads wants to start work, it adds a new queue, we pop the queue here until everything is done (stop)
+      # then push a new item into the queue for the thread to read and work on
+      loop do

Lambda in main thread and Enumerator support #356

Are you sure you want to change the base?

Lambda in main thread and Enumerator support #356

Uh oh!

Conversation

takahiro-blab commented Apr 24, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

takahiro-blab Apr 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

grosser Apr 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

grosser commented Apr 25, 2025

Uh oh!

takahiro-blab commented Apr 25, 2025

Uh oh!

takahiro-blab commented Apr 28, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

grosser commented Apr 28, 2025

Uh oh!

takahiro-blab commented Apr 30, 2025

Uh oh!

grosser commented May 1, 2025

Uh oh!

grosser commented May 1, 2025

Uh oh!

takahiro-blab Apr 25, 2025 •

edited

Loading

grosser Apr 25, 2025 •

edited

Loading

takahiro-blab commented May 4, 2025 •

edited

Loading

takahiro-blab May 12, 2025 •

edited

Loading