You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The goal of this ticket is to implement a variant of testForkGrouping that performs work-stealing: rather than splitting test classes into fixed groups with each grouping running serially in a separate JVM, we want to spawn 1-or-more JVMs for each test group with the JVMs in each group performing work-stealing on the group's test classes, shutting down when that group's test classes have all been completed.
Currently, testForkGrouping only allows static allocation of test classes to various subprocesses. This means it cannot be easily turned on by default, since there will always be codebases with many fast test classes where forking is a negative due to the JVM startup overhead, and others with fewer slow test classes where the JVM startup overhead is less costly. A work-stealing test runner as described above would avoid this problem by letting fast test classes share the JVM, only forking as many parallel JVMs as necessary to use saturate Mill's thread count config (--jobs) which defaults to NUM_CORES. This would likely be self-tuning enough to turn it on by default, so everyone can benefit from running test classes in parallel regardless of the runtime characteristics of their test classes.
As described above, work stealing would only occur within each test group. Thus a user would still be able to use testForkGrouping to separate test classes that should never run in the same JVM, but if there are no such restrictions they could stick with the default "everything in one group" config and Mill's test runners will work-steal the test classes to complete them as soon as possible.
Hello, I've submitted my PR in an attempt to handle this. It's currently a PoC as it's using a simple cached thread pool. I want to clarify the direction first and then follow up with the work stealing thread pool implementation if everything is good to go. Please take a look and we can have some conversation about the approach
From the maintainer Li Haoyi: I'm putting a 1500USD bounty on this issue, payable by bank transfer on a merged PR implementing this.
See https://github.com/orgs/com-lihaoyi/discussions/6 for other bounties and the terms and conditions that bounties operate under
The goal of this ticket is to implement a variant of
testForkGrouping
that performs work-stealing: rather than splitting test classes into fixed groups with each grouping running serially in a separate JVM, we want to spawn 1-or-more JVMs for each test group with the JVMs in each group performing work-stealing on the group's test classes, shutting down when that group's test classes have all been completed.Currently,
testForkGrouping
only allows static allocation of test classes to various subprocesses. This means it cannot be easily turned on by default, since there will always be codebases with many fast test classes where forking is a negative due to the JVM startup overhead, and others with fewer slow test classes where the JVM startup overhead is less costly. A work-stealing test runner as described above would avoid this problem by letting fast test classes share the JVM, only forking as many parallel JVMs as necessary to use saturate Mill's thread count config (--jobs
) which defaults toNUM_CORES
. This would likely be self-tuning enough to turn it on by default, so everyone can benefit from running test classes in parallel regardless of the runtime characteristics of their test classes.As described above, work stealing would only occur within each test group. Thus a user would still be able to use
testForkGrouping
to separate test classes that should never run in the same JVM, but if there are no such restrictions they could stick with the default "everything in one group" config and Mill's test runners will work-steal the test classes to complete them as soon as possible.See earlier discussion in #4419
The text was updated successfully, but these errors were encountered: