-
Notifications
You must be signed in to change notification settings - Fork 153
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
When using TCP transport, StopRun is never received #1187
Comments
I think we should use them. If I am calling
Personally, I'd prefer to keep the async method and deprecate or remove the run method. Normally I'm not a fan of naming methods XxxAsync, but changing the current synchronous |
@rprouse When you say "keep the async method" do you man keep it as it is... i.e. not true async and therefore not awaitable? On the general principal of not changing too much in this major release, considering the breaking changes we've already made, I feel that ship has pretty much sailed. :-) |
Bear in mind, we're talking about a major release, with lots breaking changes already in place. It's our third major upgrade in 22 years, so nobody can say we break things indiscriminately! Assuming there won't be a 5.0 release for 7 to 10 years, I think now is the time to change anything we don't want to live with for that long. That said, the particular thing we're talking about is only relevant to those who write runners. I can keep using the Regarding "async all the way down" I agree with you. It will need lots of testing. Regarding deprecating Run, I wouldn't want to do that because it's the simplest possible way to use the engine: you call it and it returns a result. For anyone using the engine from a script, it's ideal. |
BTW, one reason this problem exists in the first place is that the console runner never uses |
@nunit/engine-team @OsirisTerje I'm wondering what the true priority of the issue ought to be for this project. I made it "high" because my GUI needs it to be fixed if I'm ever going to stop having my own engine and use NUnit's. OTOH, it's hard to see how to show this is a problem when using the console runner or the adapter. Can anyone think of a way to make it visible in either of those contexts? |
The priority seems high to me even if it's contrived or impossible to demonstrate the problem with the console runner or adapter. If my memory is right, the same issue will also come back in the console runner if it begins to support connecting to agents which the console doesn't start and stop. Possibly an agent running in another machine or in an Android emulator. I'd like true async. 4.0 seems like an ideal time to clean up the interface. It's also technically feasible to run the TCP message loop on a different thread from the thread on which ITestRunner calls are dispatched, like you're saying, and if that is somehow more practical in the short term then that seems like a valid option. |
A side note on the GUI: I would be concerned about running the engine on the UI thread. I'd be vaguely afraid of side effects. One being that the responsiveness of the GUI is now at the mercy of the engine. All the way up to, maybe the engine or an addin or something sets some thread local or variable which now affects the GUI. I'd even be afraid that the engine might set static configuration settings in .NET, like supported TLS level or something, which subtly interact with the GUI. For my own peace of mind, I'd be tempted to have the GUI start the engine out of process unless the engine provides guarantees that it won't do anything that could possibly influence the runtime behavior of the host app, within reason. Anyway, only interacting with the engine on a background thread would be where I'd probably start, and then you don't have the concern about keeping the display updating and responsive. |
The current setup is that the thread used by the runner to invoke the engine is blocked in the agent, waiting for any commands to be executed. A thread is created, which connects to the caller either through remoting or TCP and waits to be told what to do. In the case of remoting, it's told via a call to a method - the magic is behind the scenes. For TCP, a separate command loop thread is started, which tries to read messages. That's where this issue arises. In all cases, the action taken when a message is received is synchronous. (As a reminder, possible actions are Load, Reload, Unload, Explore, CountTestCases, Run, RunAsync and StopRun.) This is by design, since no actions were intended to be invoked in parallel with other actions. What I missed in designing this is that StopRun, which is ony allowed while a Run is in process, cannot be received because the command loop calls Run directly. In my working code, that's fixed by creating a thread on which to perform the run. That allows StopRun to be received and call the corresponding engine commands on the executing runner. Regarding the GUI, the engine is not run on the UI thread. However, the synchronicity of the command loop prevents messages from being sent to the GUI at certain times. For a short-term fix, I need to fix that WRT Run and RunAsync. Longer term, I think it will be better if each action listed above is performed on a separate thread, even if I don't currently plan to have more than one of them going at one time. I have to solve the problem of a hard cancel for the GUI. But for the console and engine, I guess it depends on the consensus of the team. We've managed to expose a difference of opinion there. On top of that, see my last comment in nunit/nunit#3276. Let's wait to see what others think about both issues. |
I meant keep the method named |
If the TCP transport is expected to handle command whilst running, by definition the run must be asynchronous. |
That's quite true from an implementation point of view. I think the discussion here relates to whether the call to the public API blocks or not. Currently, we have both types of calls for running tests and I think we need to retain both. But on the agent side of the communication link, asynchronicity is needed if we want to support stopping an action. Currently, we only support stopping a test run but it's conceivable that we might eventually support cancelling an explore or load request as well, so I'd say that most operations should be run on a separate thread from the command loop. |
There seems to be no non-breaking way to improve the current implementation, so I have moved this to V4. |
NOTE: I'm calling this a bug, because it is one, but some of the fallout may lead to major changes to our basic runner interfaces. So I'm looking for comments from @nunit/engine-team before I actually do anything.
Up to now, we have been dealing with the problem of cancelling a run using
StopRun(true)
as it relates to .NET Core not supporting killing threads. However, I just discovered an additional problem: when we use the TCP transport, theStopRun
command can't be received while the test is running.The reason for this is that the command loop, which receives and executes commands, runs those commands synchronously. So the command to stop the run won't be seen until the run has already completed. One fix for this - maybe the easiest - would be for the command-loop to create a task to run the tests when the
Run
command is received.Thinking about it a bit more, I wondered what would happen if we used
RunAsync
instead ofRun
, so I did an experiment. It didn't work and I still saw synchronousRun
commands being issued. I realized (or reminded myself) thatMasterTestRunner
translatesRunAsync
into a task, which basically callsRun
. None of the lower-level runnersRunAsync
methods are ever actually used.I'm sure we (or I) had a good reason for this back in the day, but right now it doesn't make much sense. We should either not have lower-level
RunAsync
methods or we should actually use them. That's one of the big decisions I think we should make as a team.Of course, this also gets into the interfaces we use in 4.0, which we have never finalized. I'm currently thinking that we should eliminate
RunAsync
from ITestRunner and makeRun
an async method. The calling runner could then decide whether or not to await the call.That gets beyond the scope of a bug, of course, but we should keep in mind what we ultimately want to have as we work this issue.
@rprouse @jnm2 @mikkelbu @manfred-brands What do you think? Where should we go with this?
The text was updated successfully, but these errors were encountered: