When using TCP transport, StopRun is never received #1187

CharliePoole · 2022-06-12T15:20:56Z

NOTE: I'm calling this a bug, because it is one, but some of the fallout may lead to major changes to our basic runner interfaces. So I'm looking for comments from @nunit/engine-team before I actually do anything.

Up to now, we have been dealing with the problem of cancelling a run using StopRun(true) as it relates to .NET Core not supporting killing threads. However, I just discovered an additional problem: when we use the TCP transport, the StopRun command can't be received while the test is running.

The reason for this is that the command loop, which receives and executes commands, runs those commands synchronously. So the command to stop the run won't be seen until the run has already completed. One fix for this - maybe the easiest - would be for the command-loop to create a task to run the tests when the Run command is received.

Thinking about it a bit more, I wondered what would happen if we used RunAsync instead of Run, so I did an experiment. It didn't work and I still saw synchronous Run commands being issued. I realized (or reminded myself) that MasterTestRunner translates RunAsync into a task, which basically calls Run. None of the lower-level runners RunAsync methods are ever actually used.

I'm sure we (or I) had a good reason for this back in the day, but right now it doesn't make much sense. We should either not have lower-level RunAsync methods or we should actually use them. That's one of the big decisions I think we should make as a team.

Of course, this also gets into the interfaces we use in 4.0, which we have never finalized. I'm currently thinking that we should eliminate RunAsync from ITestRunner and make Run an async method. The calling runner could then decide whether or not to await the call.
That gets beyond the scope of a bug, of course, but we should keep in mind what we ultimately want to have as we work this issue.

@rprouse @jnm2 @mikkelbu @manfred-brands What do you think? Where should we go with this?

The text was updated successfully, but these errors were encountered:

rprouse · 2022-06-13T01:47:22Z

We should either not have lower-level RunAsync methods or we should actually use them

I think we should use them. If I am calling RunAsync then I expect it to be async all the way down.

interfaces we use in 4.0, which we have never finalized. I'm currently thinking that we should eliminate RunAsync from ITestRunner and make Run an async method.

Personally, I'd prefer to keep the async method and deprecate or remove the run method. Normally I'm not a fan of naming methods XxxAsync, but changing the current synchronous Run method to async is a major change and many people don't read the docs. If it is going to be a breaking change, then I think we should be as explicit as possible and break them in the most obvious way possible.

CharliePoole · 2022-06-13T04:28:03Z

@rprouse When you say "keep the async method" do you man keep it as it is... i.e. not true async and therefore not awaitable?

On the general principal of not changing too much in this major release, considering the breaking changes we've already made, I feel that ship has pretty much sailed. :-)

CharliePoole · 2022-06-13T04:54:15Z

Bear in mind, we're talking about a major release, with lots breaking changes already in place. It's our third major upgrade in 22 years, so nobody can say we break things indiscriminately!

Assuming there won't be a 5.0 release for 7 to 10 years, I think now is the time to change anything we don't want to live with for that long.

That said, the particular thing we're talking about is only relevant to those who write runners. I can keep using the RunAsync pattern, which I'm used to from my work in the previous century. :-)

Regarding "async all the way down" I agree with you. It will need lots of testing. Regarding deprecating Run, I wouldn't want to do that because it's the simplest possible way to use the engine: you call it and it returns a result. For anyone using the engine from a script, it's ideal.

CharliePoole · 2022-06-13T15:04:56Z

BTW, one reason this problem exists in the first place is that the console runner never uses RunAsync. When I started working on the TestCentric GUI, I found I needed to use it in order to keep the display updating and responsive. That worked for a long time and it was only when I started dealing with cancelling a run that I discovered the negative impact of using Run in the lower level runners.

CharliePoole · 2022-06-13T15:16:06Z

@nunit/engine-team @OsirisTerje I'm wondering what the true priority of the issue ought to be for this project.

I made it "high" because my GUI needs it to be fixed if I'm ever going to stop having my own engine and use NUnit's. OTOH, it's hard to see how to show this is a problem when using the console runner or the adapter. Can anyone think of a way to make it visible in either of those contexts?

jnm2 · 2022-06-13T17:07:14Z

The priority seems high to me even if it's contrived or impossible to demonstrate the problem with the console runner or adapter. If my memory is right, the same issue will also come back in the console runner if it begins to support connecting to agents which the console doesn't start and stop. Possibly an agent running in another machine or in an Android emulator.

I'd like true async. 4.0 seems like an ideal time to clean up the interface.

It's also technically feasible to run the TCP message loop on a different thread from the thread on which ITestRunner calls are dispatched, like you're saying, and if that is somehow more practical in the short term then that seems like a valid option.

jnm2 · 2022-06-13T17:17:04Z

A side note on the GUI: I would be concerned about running the engine on the UI thread. I'd be vaguely afraid of side effects. One being that the responsiveness of the GUI is now at the mercy of the engine. All the way up to, maybe the engine or an addin or something sets some thread local or variable which now affects the GUI. I'd even be afraid that the engine might set static configuration settings in .NET, like supported TLS level or something, which subtly interact with the GUI. For my own peace of mind, I'd be tempted to have the GUI start the engine out of process unless the engine provides guarantees that it won't do anything that could possibly influence the runtime behavior of the host app, within reason. Anyway, only interacting with the engine on a background thread would be where I'd probably start, and then you don't have the concern about keeping the display updating and responsive.

CharliePoole · 2022-06-13T19:15:19Z

@jnm2

The current setup is that the thread used by the runner to invoke the engine is blocked in the agent, waiting for any commands to be executed. A thread is created, which connects to the caller either through remoting or TCP and waits to be told what to do. In the case of remoting, it's told via a call to a method - the magic is behind the scenes. For TCP, a separate command loop thread is started, which tries to read messages. That's where this issue arises.

In all cases, the action taken when a message is received is synchronous. (As a reminder, possible actions are Load, Reload, Unload, Explore, CountTestCases, Run, RunAsync and StopRun.) This is by design, since no actions were intended to be invoked in parallel with other actions.

What I missed in designing this is that StopRun, which is ony allowed while a Run is in process, cannot be received because the command loop calls Run directly. In my working code, that's fixed by creating a thread on which to perform the run. That allows StopRun to be received and call the corresponding engine commands on the executing runner.

Regarding the GUI, the engine is not run on the UI thread. However, the synchronicity of the command loop prevents messages from being sent to the GUI at certain times. For a short-term fix, I need to fix that WRT Run and RunAsync. Longer term, I think it will be better if each action listed above is performed on a separate thread, even if I don't currently plan to have more than one of them going at one time.

I have to solve the problem of a hard cancel for the GUI. But for the console and engine, I guess it depends on the consensus of the team. We've managed to expose a difference of opinion there. On top of that, see my last comment in nunit/nunit#3276. Let's wait to see what others think about both issues.

rprouse · 2022-06-14T21:16:45Z

@rprouse When you say "keep the async method" do you man keep it as it is... i.e. not true async and therefore not awaitable?

I meant keep the method named RunAsync but make it properly async. There will be many breaking changes, i'm just suggesting that they be as explicit as possible.

manfred-brands · 2022-06-18T09:40:05Z

If the TCP transport is expected to handle command whilst running, by definition the run must be asynchronous.
I presume we only support a single run at a time, so the TCP transport must reject all but stop until the run is finished.

CharliePoole · 2022-06-18T11:21:56Z

That's quite true from an implementation point of view. I think the discussion here relates to whether the call to the public API blocks or not. Currently, we have both types of calls for running tests and I think we need to retain both.

But on the agent side of the communication link, asynchronicity is needed if we want to support stopping an action. Currently, we only support stopping a test run but it's conceivable that we might eventually support cancelling an explore or load request as well, so I'd say that most operations should be run on a separate thread from the command loop.

CharliePoole · 2024-12-03T10:14:38Z

There seems to be no non-breaking way to improve the current implementation, so I have moved this to V4.

CharliePoole added Bug High Priority labels Jun 12, 2022

CharliePoole added this to the 3.19.0 milestone Sep 19, 2024

CharliePoole modified the milestones: 3.19.0, 4.0 Dec 3, 2024

CharliePoole modified the milestones: 4.0, 4.0.0-beta.1 Dec 29, 2024

CharliePoole added the V4 All issues related to V4 - use -label:V4 to get non-V4 issues label Feb 1, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

When using TCP transport, StopRun is never received #1187

When using TCP transport, StopRun is never received #1187

CharliePoole commented Jun 12, 2022

rprouse commented Jun 13, 2022

CharliePoole commented Jun 13, 2022

CharliePoole commented Jun 13, 2022

CharliePoole commented Jun 13, 2022

CharliePoole commented Jun 13, 2022

jnm2 commented Jun 13, 2022 •

edited

Loading

jnm2 commented Jun 13, 2022 •

edited

Loading

CharliePoole commented Jun 13, 2022

rprouse commented Jun 14, 2022

manfred-brands commented Jun 18, 2022

CharliePoole commented Jun 18, 2022

CharliePoole commented Dec 3, 2024

When using TCP transport, StopRun is never received #1187

When using TCP transport, StopRun is never received #1187

Comments

CharliePoole commented Jun 12, 2022

rprouse commented Jun 13, 2022

CharliePoole commented Jun 13, 2022

CharliePoole commented Jun 13, 2022

CharliePoole commented Jun 13, 2022

CharliePoole commented Jun 13, 2022

jnm2 commented Jun 13, 2022 • edited Loading

jnm2 commented Jun 13, 2022 • edited Loading

CharliePoole commented Jun 13, 2022

rprouse commented Jun 14, 2022

manfred-brands commented Jun 18, 2022

CharliePoole commented Jun 18, 2022

CharliePoole commented Dec 3, 2024

jnm2 commented Jun 13, 2022 •

edited

Loading

jnm2 commented Jun 13, 2022 •

edited

Loading