Use process pool for test runner #4614

HollandDM · 2025-02-23T16:01:31Z

This PR introduce testEnableWorkStealing setting for TestModule. When enabled, tests in the module will be distributed by a work-stealing process pool, and run in parallel with each other.

Internally, when set testEnableWorkStealing to true, Mill will try to spawn at most --job test runner for each test groups. Each runner will compete to "steal" a test classes from shared folder to run. This increase throughput of the whole test group, as slow tests now don't block fast tests from running. Each test runner have it own working folder and log files used to communicate with the Mill parent process.

HollandDM · 2025-02-23T16:21:49Z

Local run on my machine yield similar test time when run scalalib.test.testForked with grouping on and off. But do I need some kind of benchmark for this?

lihaoyi · 2025-02-24T02:44:15Z

Let's go with the multiple-JVMs-one-thread-each architecture described in the original ticket, rather than the one-JVM-multiple-threads architecture you have in this PR.

Having each JVM be single-threaded running tests makes things like resource usage much more attributable and deterministic, since the resource footprint of each JVM can only be due to the one test it is currently running. Whereas if you put a bunch of tests into a single JVM and it crashes due to OOM, you have no idea which test is at fault.

In terms of performance, I think the goal should be:

Running heavyweight test classes like scalalib.test.testForked, it should have no significant performance difference with testForkGrouping enabled or disabled
Running lightweight test classes, say 1000 lightweight test classes that do nothing, it should be much more performant than having testForkGrouping configured to one-test-per-class, and comparable to testForkGrouping disabled (although maybe not quite the same, depending on whether the overhead of N different JVMs or whether the speedup from running N times parallel wins out)

HollandDM · 2025-02-24T03:06:49Z

Hmm, look like i misunderstood the ideal of the original ticket. From what you said, it seems like each test groups can now spawn multiple sub processes (with respect to total --jobs config), and they can perform work stealing between them, right?
Also, for the performance goal, number 1 is easy to setup. But for number 2, do we have existing test like this, or should I create new one, and add it in this PR also?

lihaoyi · 2025-02-24T03:28:29Z

@HollandDM that's right.

In terms of the (2) benchmarks, feel free to include it in this PR as an integration test. You can have the test code generate the necessary project files before running the test to avoid having to commit all the boilerplate code

HollandDM · 2025-02-28T06:10:45Z

@lihaoyi I've update this into multiple-JVMs-one-thread-each architect, with the work stealing capability.
The PR is still kind of like a PoC, to make sure the approach is correct, please verify it first.
I'll try clean up and pass all test cases in following commits

lihaoyi · 2025-03-03T05:52:19Z

@HollandDM can you write an english summary of how the implementation works? That would make it much easier to review vs. trying to reverse engineer your diff from scratch

lihaoyi · 2025-03-03T06:11:37Z

At a first glance, the IPC protocol looks far too complicated for what this task requires. For example:

In terms of stealing work, we can get around the whole locking-stealing-file protocol by writing each test group's test class names to files on disk, that each subprocess tries to atomically os.move to claim the class. Such a simple disk-based queue would be more than sufficient for the scales we're looking at (O(num-test-source-files)) and would have a lot fewer moving parts than the mem-mapped in-memory version you have here
Do we really need 8 different cluster states? Since the test module has to complete running every test class eventually anyway, each subprocess could just continually take work off its (possibly shared) queue until there is no more work, and then shut down. There should be no need for subprocesses ever to be blocked, stopped, or in other states: either it is actively running tests, or it's done when there are no more test classes to claim. We never add test classes to the queue that weren't there to start, so we don't need to worry about keeping a subprocess around idle to use it later
In terms of managing concurrency, if the parent Mill process runs each subprocess in a ctx.fork.async(dest, key, msg){...} block, it will automatically claim the async slot before each subprocess starts, and release it when the subprocess ends. The subprocess shouldn't need to worry about this: the fact that the subprocess exists means that a slot is claimed for it, and the fact that the subprocess hasn't exited means it is doing work running some test class. Once the subprocess exits, the Mill parent process can release the async slot, which using ctx.fork.async happens automatically

Before coding everything up, it's worth sketching out exactly what dataflows are necessary between the parent and worker processes. What I can think of are:

Parent needs to pass shared test class queue to workers (can be done via files on disk)
Workers need to signal completion to parent (can be done by exiting)
Workers need to pass test results to parent (can be done at the end via the same code path we currently use, writing JSON to a file on disk)
Workers need to pass name of currently-running test suite to parent (maybe write it to a file on disk the parent polls?)
Workers need to stream logs to parent (can be done via stdout/stderr just as it is done now)

I think those are the only ways in which the parent and worker processes need to interact? If we limit our implementation to these dataflows, that will greatly simplify the implementation:

Parent:

The parent assigns each test group a queue folder and writes that test group's classes as files within it
For i in NUM_CORES:
1. The parent loops through every test group, spawning a ctx.fork.async that first checks if the test queue folder is empty, exits early if there are no tests left, otherwise runs a single subprocess worker for that test group
While each ctx.fork.async is running, polls a file on disk assigned to each worker containing the running test class, and updates the status prompt via setPromptDetail
Once all ctx.fork.asyncs have completed, collect the test result files on disk, parse/aggregate them, and return them as the result of the test command

Worker:

List the files in the test queue folder
1. If the list is non-empty, try to claim a file with os.move(file, os.tmp())
  1. If you successfully claimed the file, then write the test class name to a file on disk for the parent to pick up, then run the test class associated with that file
  2. If you failed to claim the file, loop around and try again
2. If the list is empty, exit.

If we want more features in future that our implementation doesn't support we can refactor the code when the time comes.

HollandDM · 2025-03-03T07:24:41Z

Let's me explain the IPC protocol, as it is the heart of all of this implementation:
In this impl:

the host mill will decide: 1. when to spawn a subprocess; 2. when to signal the subprocess to stop; 3. what subprocess it gonna steal from.
the subprocess will signal cluster: 1. if it is running normally or slow (blocking), if it want to steal work
the subprocess will signal other process: 1. if it want to steal; 2. if it permit other to steal.

For host, at first the host spawns 1 subprocess to handle all the test cases. As time go, if any subprocess want to steal, the host will find another subprocess to be the victim, prioritize blocking one. The host also keep track of the state each subprocess is in at the moment: if more blocking than steal, the host will try spawn new one to accommodate the need to off load work, if more stealing and blocking, it will signal denial to the stealers.

For subprocesses, they periodically notify host their running state. If they're allow to steal from a victim, a 2nd channel will be used between them to coordinate the steal (write test to steal to disk, and signal other to check this). If they are denied from stealing, they can keep retry for sometimes, but ultimately they will stop.

Yes, this is quite a complicate work stealing system. The reason that I went this way are:

It is a work stealing system, the stealing happen without knowledge of the host, the host only coordinate the stealer - victim pair, the pain of coordinating stealing is off loaded to that pair of subprocesses.
This stealing happen in a batch, and if one test is blocked because its running to slow, other tests are guaranteed to be picked by other subprocesses. If we coordinated by using host as the test provider, unless we provide 1 test at a time (which I think is worse because the of the overload in I/O compare to running test), we can't guaranteed this behavior. This of course contains more moving part, and a test can be moved multiple times (but at most only O(log(num-test-source-files)))).
The number of states is quite overwhelming, but it is needed to convey the state of each parties, and because I don't want to use any locking mechanism while doing IPC, the number of states is expected to be bigger than using one.
In terms of managing concurrency, I did it like this to respect the --job parameter to the fullest. The host when decided to spawn new subprocess need to "ask" executor for a slot, and it can be denied. Spawning normally by ctx.fork.async (which is a decorated new Future[A]) will leave the number of subprocess for one group test to be as much as --jobs. At first I though this can lead to jobs * jobs subprocess to be spawn in the worst cases. But now after a closer look, I think this is not the case. So indeed the managing concurrency problem can be reduced to just use ctx.fork.async.

Your suggestion is way simpler, and should work fine. The only thing that I'm not too sure is how many test should we write on a single file for subprocess to pick up. If it is one then subprocess can scan and content for files on disk heavily. If it is more then quick test cases can be blocked by one slow test case, and I think work stealing is required in this situation, which will lead to something similar to this

lihaoyi · 2025-03-03T07:40:40Z

Got it. What you say makes sense, but lets go with the approach i suggested and see if we can make that work without the complexity of a full peer-to-peer work stealing model

The only thing that I'm not too sure is how many test should we write on a single file for subprocess to pick up. If it is one then subprocess can scanning and contending for files on disk heavily. If it is more then quick test cases can be blocked by one slow test case, and I think work stealing is required in this situation, which will lead to something similar to this

Most things work on the granularity of test classes, so that would be the natural unit of work here. In that case the workers can just pick work directly from the shared queue folder, until the folder is empty.

I don't think we need to worry too much about performance of the disk queue; we won't have more than a few thousand test classes per module, and the operations we need to do are cheap ones (just checking if the folder is, trying doing os.move on an arbitrary file to claim it).

HollandDM · 2025-03-05T02:05:19Z

@lihaoyi I updated the PR to use the simplify version you came up with.
The performance is also great, it seems that we indeed don't need the full blow work stealing cluster I designed originally.
The only down side for now, is that the number of awaiting threads will be super big if testGroup is used.
I'll try fix the CI in the following commit if we're good to go with this impl

lihaoyi · 2025-03-05T02:18:50Z

@HollandDM thanks will take a look

lihaoyi · 2025-03-05T02:25:01Z

testrunner/src/mill/testrunner/TestRunnerUtils.scala

+        if (files.nonEmpty) {
+          var shouldRetry = false
+          val offset = Random.nextInt(files.size)
+          try {


We can do val shouldRetry = try { to avoid the var here

lihaoyi · 2025-03-05T02:27:08Z

testrunner/src/mill/testrunner/TestRunnerUtils.scala

+      val taskQueue = mutable.Queue.empty[Task]
+
+      @tailrec
+      def stealFromSelectorFolder(): Boolean = {


Can we move stealFromSelectorFolder out of stealTasks into its own top-level method and make it return a Option[String] with the successfully stolen class name? That will help encapsulate things a bit more and keep the code easier to understand

lihaoyi · 2025-03-05T02:32:01Z

testrunner/src/mill/testrunner/TestRunnerUtils.scala

+            // we successfully stole a selector
+            val selectors = os.read.lines(stealFolder / s"selector-stolen")
+            val classFilter = TestRunnerUtils.globFilter(selectors)
+            val tasks = runner.tasks(


Is tasks always of length 1 here? If so we can add an assert and simplify the code accordingly

I'm not really sure if the tasks returned always has the same length as the taskdefs input.

lihaoyi · 2025-03-05T02:34:03Z

testrunner/src/mill/testrunner/TestRunnerUtils.scala

+      @tailrec
+      def stealTaskLoop(): Unit = {
+        if (taskQueue.nonEmpty) {
+          val next = taskQueue.dequeue().execute(
+            new EventHandler {
+              def handle(event: Event) = {
+                testReporter.logStart(event)
+                events.add(event)
+                testReporter.logFinish(event)
+              }
+            },
+            Array(new Logger {
+              def debug(msg: String) = ctx.log.outputStream.println(msg)
+              def error(msg: String) = ctx.log.outputStream.println(msg)
+              def ansiCodesSupported() = true
+              def warn(msg: String) = ctx.log.outputStream.println(msg)
+              def trace(t: Throwable) = t.printStackTrace(ctx.log.outputStream)
+              def info(msg: String) = ctx.log.outputStream.println(msg)
+            })
+          )
+
+          taskQueue.enqueueAll(next)
+          stealTaskLoop()
+        } else if (stealFromSelectorFolder()) {
+          stealTaskLoop()
+        } else {
+          ()
+        }
+      }


This @tailrec method can probably be more easily understood written as nested loops, something like:

while({ stealFromSelectorFolder() match{ case Some(className) => for(cls <- classes) executeTaskForCls(cls) true case None => false } })()

The inner loop is always over a fixed collection of elements while the outer loop runs until a condition is met, so having it be two separate loops makes these properties a bit more obvious than having them mixed together in one big tailrec function

This is more about the coding style used in these repositories. In my other open-source PRs, people tend to prefer tailrec over while loops. It looks like it's the other way around here. I'm happy to change all the tailrec to while loops if you'd like. I don't really have a preference.

lihaoyi · 2025-03-05T02:35:07Z

testrunner/src/mill/testrunner/TestRunnerUtils.scala

+    val events = new ConcurrentLinkedQueue[Event]()
+    val doneMessage = {
+
+      val taskQueue = mutable.Queue.empty[Task]


With the changes suggested for stealFromSelectorFolder and stealTaskLoop, we should be able to remove this mutable taskQueue as well

lihaoyi · 2025-03-05T02:37:07Z

scalalib/src/mill/scalalib/TestModuleUtil.scala

+      val selectorFolder = base / "selectors"
+      os.makeDir.all(selectorFolder)
+      selectors2.zipWithIndex.foreach { case (s, i) =>
+        os.write.over(selectorFolder / s"selector-$i", s)


How about we name the files according to the class name s rather than the index i? If we need to debug things being able to ls to see the queued test classes would be much easier than having to dig into each file individually

lihaoyi · 2025-03-05T03:24:40Z

scalalib/src/mill/scalalib/TestModuleUtil.scala

+    def runTestRunnerSubprocesses(selectors2: Seq[String], base: os.Path) = {
+      val selectorFolder = prepareTestSelectorFolder(selectors2, base)
+
+      val resultFutures = Range(0, selectors2.length).map { index =>
+        val indexStr = index.toString
+        Task.fork.async(base / indexStr, indexStr, s"Test process $indexStr") {
+          (indexStr, runTestRunnerSubprocess(index, base / indexStr, selectorFolder))
+        }
+      }


We can use some heuristics to optimize the way we spawn the worker processes here and avoid the large number of processes you mentioned:

We don't ever want to have more processes per group than testClassList.length, since they won't have anything to do even if every process takes one test class

We don't ever want to have more processes per group than ctx.jobs, since that's the parallelism limit

For each Task.fork.async we can check the selectors folder once before running runTestRunnerSubprocess, so that if that group's tests are all claimed before the process starts we can skip the process overhead.

There are probably some other heuristics we can add, but all together this should be enough to mitigate the problem with too many processes: any Task.fork.asyncs that do not have any work to do will be cheap and exit quickly without subprocess overhead, so even if we queue up a lot of them there shouldn't be much cost

The implementation already have the check before spawn, so it's true that the number of meaningful threads that spawn subprocess is small.
What I meant was that whew using testGroup, the server call runTestRunnerSubprocesses foreach test groups. These Future then only await for runTestRunnerSubprocess (the real test running work) to return. So we got a lot of awaiting Future hanging arounds, and these will all show up on the prompt

It's look exactly like the thing you saw in #4611

Got it. Since runTestRunnerSubprocesses already spawns fork.async blocks, what if we move the caller out into normal synchronous code? So instead of

Task.fork.async(Task.dest / folderName, paddedIndex, groupPromptMessage) { (folderName, runTestRunnerSubprocesses(testClassList, Task.dest / folderName)) }

Just

(folderName, runTestRunnerSubprocesses(testClassList, Task.dest / folderName))

That way there won't be a middle-layer of async blocks that do nothing except wait for their children, which should reduce the number of blocked tasks showing up in the command line prompt

This can make testGroup run sequentially with each other, rather than concurrently
But I understand your thinking now, I think I can make use of this, maybe I'll flatten all the work to the outer layer and fork from there

HollandDM · 2025-03-05T09:05:24Z

@lihaoyi Updated, I replaced the @tailrec with while loop. Also flatten out all subprocess calls to the outer most level and await all of them

HollandDM · 2025-03-06T01:33:02Z

~~CI failed massively, so I did some investigations.~~
It's seem that example test can be troublesome, because now we always have multiple test runner for a group, the order of output log can be non deterministic, this will make all example tests fail because they rely on deterministic output log.
One way to keep the same behavior is that for each subprocess in a group, we sort the log by the correct order (where to find it?, result of discoveredTestClasses maybe?). This can be done with 2 strategies:
~~- Defer all the log, after everything end, write the logs by the correct order. This delay the output at the end, and I don't think we want that~~
- Write the log in order as soon as possible, eg: if we received log in the order of: 1 4 2 6 3 5, then we will out put 4 times, [1], [2], [3, 4], [5, 6]. This will be more responsive, but require some kind of concurrent algo.

~~Another way to to change ExampleTester to check expectedLine by existence only, but I'm afraid this can affect too many area outside of scala test (kotlin, python, javascript, etc...).~~

~~Or we can update all related test to use -j 1, but this will required explicit knowledge of contributor every time they add new examples.~~

~~Also, in order to avoid some move contention, each subprocess currently is picking file totally random, so we will need to update that too~~

False alarm

lihaoyi · 2025-03-06T01:38:42Z

@HollandDM could you elaborate on the CI failures? AFAIK the example tests already check output in a non-ordered fashion: as long as each line in the example expected output appears in the actual output, the test passes, regardless of what order the actual output is in. So if your tests are failing, it seems like there must be some other issue there other than the difference in output ordering

HollandDM · 2025-03-06T01:43:26Z

Oh true, now I check the ode carefully, it does check existence. I guess looking at all the --- Expected output ---------- make me dizzy. Sorry for the false alarm.
Let me try fixing the CI

lihaoyi · 2025-03-07T15:31:59Z

scalalib/src/mill/scalalib/TestModuleUtil.scala

+      new Runnable {
+        override def run(): Unit = {


I think this can be shortened using the lambda syntax () => {...} without explicitly instantiating an anonymous subclass of Runnable

lihaoyi · 2025-03-07T15:34:01Z

testrunner/src/mill/testrunner/TestRunnerUtils.scala

+      stealFolder: os.Path,
+      selectorFolder: os.Path
+  ): Array[Task] = {
+    val stealLog = stealFolder / os.up / "steal.log"


This can now be written stealFolder / "../steal.log", or rather stealFolder / "../status.txt" since it's not really a log file but a snapshot of the current status

The file was appended while runner's working. So it can be use to check the order of test that was executed by the runner.

I also like "/ os.up" more, I think it more verbose and explicit

Got it, it makes sense then. We can leave it as is

Maybe have a comment somewhere explaining how this works?

lihaoyi · 2025-03-07T15:36:36Z

testrunner/src/mill/testrunner/TestRunnerUtils.scala

+    while (true) {
+      val files = os.list(selectorFolder)


I think we can simplify this further given the current design: we just need to run os.list once and then loop over all the tests to try and steal each one in order. Since the files in selectorFolder never grows, running os.list a second time will never pick up anything the first list didn't.

Let's also get rid of the Random.nextInt selection and just pick up the tests in order from top to bottom. That will give a bit more determinism to how things run, and although it still nondeterministic due to parallelism that would help people follow the order that tests are being run in

I did it like this because I want to check if we have any file left, and only steal form the left over.

The os.move already effectively checks whether the file is still left before taking it, so we don't need to run an entire os.list each time we steal a single file. We could add a os.exists check before os.move if we want to avoid throwing and catching an error in the common case

lihaoyi · 2025-03-07T15:44:17Z

scalalib/src/mill/scalalib/TestModuleUtil.scala

+      val paddedGroupIndex = mill.internal.Util.leftPad(groupIndex.toString, maxGroupLength, '0')
+      val paddedProcessIndex =
+        mill.internal.Util.leftPad(processIndex.toString, maxProcessLength, '0')
+      val processFolder = baseFolder / processIndex.toString


Let's call this baseFolder / s"worker-$processIndex", so if someone looks on disk it's a bit more obvious what is going on

lihaoyi · 2025-03-07T15:48:49Z

Right now the ctx.fork.async tasks all have no name. We should fix that for the default case when work stealing is disabled, and when work stealing is enabled let's give them simple names worker-$processIndex if there's one group and worker-$groupIndex-$processIndex if there's multiple groups. We don't need to list the test classes, since we don't know up front which test classes each worker will take, and the in-progress test will show up in the ticker on the right anyway

lihaoyi · 2025-03-07T15:49:46Z

I think this looks pretty good. Adding def testEnableWorkStealing = true in my manual testing does indeed make it work, though the missing labels when testEnableWorkStealing is not enabled needs to be fixed. Left some last comments, but once those are resolved I think the code is ready to merge.

Next steps are probably:

Update the PR description to accurately reflect the design of the PR after all the revisions and updates
Open a second PR backporting it to the 0.12.x branch. Looking at the code I think all the APIs changed are internal or private so there shouldn't be binary compat issues.

Once both PRs are ready we can merge it and close out the bounty

HollandDM · 2025-03-08T09:04:38Z

Right now the ctx.fork.async tasks all have no name. We should fix that for the default case when work stealing is disabled, and when work stealing is enabled let's give them simple names worker-$processIndex if there's one group and worker-$groupIndex-$processIndex if there's multiple groups. We don't need to list the test classes, since we don't know up front which test classes each worker will take, and the in-progress test will show up in the ticker on the right anyway

I think there is a bug in MultiLogger that causes this.
when creating new MultiLogger, it does not have its own message/keySuffix values, so when we call setPromptLine, it uses the default value of "".

I tried a simple fix like this:

  private[mill] override def message = logger1.message ++ logger2.message
  private[mill] override def keySuffix = logger1.keySuffix ++ logger2.keySuffix

And the log is good again. But I'm not sure this is the right way to do it.

Would you like me to include these fixes in this PR? If not, I think I can skip the problem because it'll be fixed in other PR

lihaoyi · 2025-03-08T09:45:57Z

@HollandDM your proposed fix looks reasonable. Lets include it in this PR

HollandDM · 2025-03-08T14:53:11Z

@lihaoyi updated, please check if you are happy with the current state of the PR.

In the mean time, I'll assume the PR is good to go, and create a back port for 0.12.x branch using this PR as the source

lihaoyi · 2025-03-08T15:09:13Z

example/javalib/testing/5-test-stealing/foo/test/src/foo/RandomTestsA.java

+  @Test
+  public void test1() throws Exception {
+    testGreeting("Storm", 38);
+  }
+
+  @Test
+  public void test2() throws Exception {
+    testGreeting("Bella", 25);
+  }
+
+  @Test
+  public void test3() throws Exception {
+    testGreeting("Cameron", 32);
+  }


Do we really need more than one test per file? If not we could consolidate each file to a single test case to greatly reduce the verbosity of these test examples

lihaoyi · 2025-03-09T01:40:51Z

example/scalalib/testing/5-test-stealing/build.mill

@@ -0,0 +1,38 @@
+// Test stealing is an opt-in, powerful feature that enables parallel test execution while maintaining complete reliability and debuggability.


while maintaining complete reliability and debuggability. is meaningless. You should write why people would turn this on: because it provides both the speedups from parallelism and the efficiency of re-using JVMs between tests, allowing it to be used on all sorts of test suites without downside. And we will likely make it the default in future

HollandDM · 2025-03-09T04:24:33Z

update test example to only have 1 test case, and fix some wording in adoc

HollandDM · 2025-03-09T15:41:32Z

@lihaoyi backport for 0.12.x readied at #4697

lihaoyi · 2025-03-10T00:09:35Z

I think this looks good, thanks @HollandDM! Let's get #4679 green and merged as well, and send me your international bank transfer details and I will send you the bounty

ported for 0.12.x from #4614 --------- Co-authored-by: autofix-ci[bot] <114827586+autofix-ci[bot]@users.noreply.github.com>

Turns on #4614 by default in 0.13.x, leaving the default as of in 0.12.x

…nc task and de-prioritizing subsequent ones (#4714) This is a follow up improvement to #4701 which uses an explicit priority queue rather than relying on ad-hoc LIFO ordering. By prioritizing the first async task and de-prioritizing subsequent async tasks, this encourages Mill to re-use the same async task JVM to do more work, minimizing JVM startup overhead When testing this out on the Netty example, `testParallelism` prioritizing the first testrunner JVM over all others does seem to make a material difference. I manually performed did ad-hoc benchmarks of the time taken to run the following command, basically running all the tests that we run in the example integration test: ``` /Users/lihaoyi/Github/mill/out/dist/launcher.dest/run 'codec-{dns,haproxy,http,http2,memcache,mqtt,redis,smtp,socks,stomp,xml}.test' + 'transport-{blockhound-tests,native-unix-common,sctp}.test' ``` - `priority = -1` (`fork.async`s always run before other tasks): ~20s - `priority = 1` (`fork.async`s always run after other tasks): ~20s - `priority = if (processIndex == 0) -1 else processIndex` (The first `fork.async` in a test module runs before all other tasks, subsequent `fork.async`s run after all other tasks): ~11s - As a comparison, `testForkGrouping` with 1-element groups: 50s Notably, this first-JVM priority has similar timings with `testParallelism = false` on this benchmark, where the previous approach had some penalty (though not as bad as `testForkGrouping`). With these improvements, the parallel test runner should be self-tuning enough to turn on by default (#4614) without needing manual configuration in most cases This also lays the foundation to other prioritization strategies later. For example, we may want to prioritize historically-slower tasks over faster tasks, all else being equal, to avoid long-running stragglers delaying a command from completing after everything else has finished. But such improvements can come later

HollandDM force-pushed the work-stealing-test-runner branch from c09aa01 to 20eccb3 Compare February 23, 2025 16:10

HollandDM mentioned this pull request Feb 23, 2025

Implement a dynamic work-stealing test fork runner (1500USD Bounty) #4590

Closed

HollandDM force-pushed the work-stealing-test-runner branch from 040daa7 to d96c887 Compare February 28, 2025 06:01

HollandDM force-pushed the work-stealing-test-runner branch 2 times, most recently from 6612052 to c9ca9c4 Compare February 28, 2025 16:27

HollandDM force-pushed the work-stealing-test-runner branch from c9ca9c4 to 33ba34b Compare March 5, 2025 02:00

lihaoyi reviewed Mar 5, 2025

View reviewed changes

HollandDM force-pushed the work-stealing-test-runner branch from b2edc51 to 8af8cbc Compare March 5, 2025 09:01

HollandDM force-pushed the work-stealing-test-runner branch from d346139 to 9ae5330 Compare March 5, 2025 13:40

lihaoyi reviewed Mar 7, 2025

View reviewed changes

HollandDM force-pushed the work-stealing-test-runner branch from 747de59 to 47dff00 Compare March 8, 2025 14:51

HollandDM requested a review from lihaoyi March 8, 2025 14:53

lihaoyi reviewed Mar 8, 2025

View reviewed changes

lihaoyi reviewed Mar 9, 2025

View reviewed changes

feedback 2

43225cf

HollandDM force-pushed the work-stealing-test-runner branch from 75f6fa8 to 43225cf Compare March 9, 2025 04:22

HollandDM requested a review from lihaoyi March 9, 2025 04:24

autofix-ci bot added 2 commits March 9, 2025 04:49

[autofix.ci] apply automated fixes

6c8eefb

[autofix.ci] apply automated fixes (attempt 2/3)

dfb7d14

HollandDM mentioned this pull request Mar 9, 2025

Use process pool for test runner (0.12.x) #4679

Merged

HollandDM marked this pull request as ready for review March 9, 2025 15:41

lihaoyi merged commit f733779 into com-lihaoyi:main Mar 10, 2025
41 of 42 checks passed

lihaoyi pushed a commit that referenced this pull request Mar 10, 2025

Use process pool for test runner (0.12.x) (#4679)

3bc4294

ported for 0.12.x from #4614 --------- Co-authored-by: autofix-ci[bot] <114827586+autofix-ci[bot]@users.noreply.github.com>

lihaoyi mentioned this pull request Mar 10, 2025

Turn on testEnableWorkStrealing by default #4681

Merged

lefou added this to the 0.13.0 milestone Mar 10, 2025

lihaoyi added a commit that referenced this pull request Mar 11, 2025

Turn on testEnableWorkStrealing by default (#4681)

c58cb61

Turns on #4614 by default in 0.13.x, leaving the default as of in 0.12.x

lihaoyi mentioned this pull request Mar 13, 2025

Use priority queue to run Mill executor tasks, prioritizing first async task and de-prioritizing subsequent ones #4714

Merged

		@@ -0,0 +1,38 @@
		// Test stealing is an opt-in, powerful feature that enables parallel test execution while maintaining complete reliability and debuggability.

Uh oh!

Use process pool for test runner #4614

Use process pool for test runner #4614

Uh oh!

Conversation

HollandDM commented Feb 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

HollandDM commented Feb 23, 2025

Uh oh!

lihaoyi commented Feb 24, 2025

Uh oh!

HollandDM commented Feb 24, 2025

Uh oh!

lihaoyi commented Feb 24, 2025

Uh oh!

HollandDM commented Feb 28, 2025

Uh oh!

lihaoyi commented Mar 3, 2025

Uh oh!

lihaoyi commented Mar 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

HollandDM commented Mar 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

lihaoyi commented Mar 3, 2025

Uh oh!

HollandDM commented Mar 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

lihaoyi commented Mar 5, 2025

Uh oh!

lihaoyi Mar 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

lihaoyi Mar 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

lihaoyi Mar 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

HollandDM Mar 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

HollandDM Mar 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

HollandDM commented Mar 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

HollandDM commented Mar 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

lihaoyi commented Mar 6, 2025

Uh oh!

HollandDM commented Mar 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

HollandDM commented Feb 23, 2025 •

edited

Loading

lihaoyi commented Mar 3, 2025 •

edited

Loading

HollandDM commented Mar 3, 2025 •

edited

Loading

HollandDM commented Mar 5, 2025 •

edited

Loading

lihaoyi Mar 5, 2025 •

edited

Loading

lihaoyi Mar 5, 2025 •

edited

Loading

lihaoyi Mar 5, 2025 •

edited

Loading

HollandDM Mar 5, 2025 •

edited

Loading

HollandDM Mar 5, 2025 •

edited

Loading

HollandDM commented Mar 5, 2025 •

edited

Loading

HollandDM commented Mar 6, 2025 •

edited

Loading

HollandDM commented Mar 6, 2025 •

edited

Loading

HollandDM Mar 8, 2025 •

edited

Loading

HollandDM Mar 8, 2025 •

edited

Loading

HollandDM commented Mar 8, 2025 •

edited

Loading

HollandDM commented Mar 8, 2025 •

edited

Loading