Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Running tests in parallel sporadically fails with exit code 255 #705

Open
asendimitrov opened this issue Nov 5, 2019 · 10 comments
Open

Comments

@asendimitrov
Copy link

asendimitrov commented Nov 5, 2019

Hi, I'm using NUnit 24/7 at work and I'm sporadically hitting an issue when running tests in parallel.

NUnit Framework version: 3.12 (NuGet)
NUnit ConsoleRunner version: 3.10 (NuGet)
Console command: %nunit.console.path% --trace=Debug --inprocess --result=%nunit.result.path% %tests.dlls.list% --workers=10
ParallelizableScope: Children (on assembly level)

The issue occurs around 10% of the time on our TeamCity agents and the error is Process exited with code 255. It looks to be affected by the hardware of the box the tests are ran on - there's no hard proof of this - but I've never managed to reproduce it locally (the agents are VMs with weakish dual core processors whereas mine has 8).

Lowering the workers doesn't seem to affect as I've seen the same result with both 5 and not specified at all (would default to 2 when ran on TC).

I'm always using --inprocess ever since I first hit the issue in order to see the real reason why this happens but there's nothing specific either as an exception by the console or in the logs. Omitting --inprocess results in a SocketException issue (#255 )

What I've noticed is that this (mostly) occurrs when test number X is running, where X is equal to the number of workers. Here's an example with both 5 and 10:

5 workers:

[11:38:46]MyCompany.dll (23s)
[11:38:53]MyCompany.Tests.GetBalances.GetBalancesTests.Test1("1") (16s)
[11:38:53]MyCompany.Tests.GetBalances.GetBalancesTests.Test1("2") (16s)
[11:38:53]MyCompany.Tests.GetBalances.GetBalancesTests.Test1("3") (16s)
[11:38:53]MyCompany.Tests.GetBalances.GetBalancesTests.Test1("4") (16s)
[11:38:53]MyCompany.Tests.GetBalances.GetBalancesTests.Test1("5") (16s)
[11:39:10]Process exited with code 255
[11:39:10]Process exited with code 255 (Step: Run tests (Command Line))
[11:39:10]Step Run tests (Command Line) failed

logs:
5 workers - InternalTrace.5864.zip

10 workers:

[11:03:42]MyCompany.dll (11s)
[11:03:48]MyCompany.Tests.GetBalances.GetBalancesTests.Test1("1") (5s)
[11:03:48]MyCompany.Tests.GetBalances.GetBalancesTests.Test1("2") (5s)
[11:03:48]MyCompany.Tests.GetBalances.GetBalancesTests.Test1("3") (5s)
[11:03:48]MyCompany.Tests.GetBalances.GetBalancesTests.Test1("4") (5s)
[11:03:48]MyCompany.Tests.GetBalances.GetBalancesTests.Test1("5") (5s)
[11:03:48]MyCompany.Tests.GetBalances.GetBalancesTests.Test1("6") (5s)
[11:03:48]MyCompany.Tests.GetBalances.GetBalancesTests.Test2(50,111.07d,50.1d) (5s)
[11:03:48]MyCompany.Tests.GetBalances.GetBalancesTests.Test3("1", 50,41.5d) (5s)
[11:03:48]MyCompany.Tests.GetBalances.GetBalancesTests.Test3("2", 50,41.5d) (5s)
[11:03:48]MyCompany.Tests.GetBalances.GetBalancesTests.Test4(50,"Default") (5s)
[11:03:53]Process exited with code 255
[11:03:53]Process exited with code 255 (Step: Run tests (Command Line))
[11:03:53]Step Run tests (Command Line) failed

logs:
10 workers - InternalTrace.2948.zip

Please let me know if you need anything else. Thanks.

@ChrisMaddock
Copy link
Member

There's nothing in the logs to help here, unfortunately. Do you get anything additional logged when you run without --inprocess - other than the SocketException, of course?

@asendimitrov
Copy link
Author

asendimitrov commented Nov 6, 2019

I don't think anything unusual other than default SocketException appears in the logs in that case.
SocketException - InternalTrace.2136.zip

Example is from a build running with --workers=10 and without --inprocess

TeamCity logs:

[12:08:40]MyCompany.dll (36s)
[12:08:46]MyCompany.Tests.GetBalances.GetBalancesTests.Test1("1") (31s)
[12:08:46]MyCompany.Tests.GetBalances.GetBalancesTests.Test1("2") (31s)
[12:08:46]MyCompany.Tests.GetBalances.GetBalancesTests.Test1("3") (31s)
[12:08:46]MyCompany.Tests.GetBalances.GetBalancesTests.Test1("4") (31s)
[12:08:46]MyCompany.Tests.GetBalances.GetBalancesTests.Test1("5") (31s)
[12:08:46]MyCompany.Tests.GetBalances.GetBalancesTests.Test1("6") (31s)
[12:08:46]MyCompany.Tests.ReserveFunds.ReserveFundsTests.Test1("1","BalanceTooLow","The requested amount cannot be reserved due to insufficient funds.",BadRequest) (31s)
[12:08:46]MyCompany.Tests.ReserveFunds.ReserveFundsTests.Test1("2","BalanceTooLow","The requested amount cannot be reserved due to insufficient funds.",BadRequest) (31s)
[12:08:46]MyCompany.Tests.ReserveFunds.ReserveFundsTests.Test1("3","BalanceTooLow","The requested amount cannot be reserved due to insufficient funds.",BadRequest) (31s)
[12:08:46]MyCompany.Tests.ReserveFunds.ReserveFundsTests.Test1("4","BalanceTooLow","The requested amount cannot be reserved due to insufficient funds.",BadRequest) (31s)
[12:09:15]
[12:09:15]Unhandled Exception: NUnit.Engine.NUnitEngineException: Remote test agent exited with non-zero exit code 255
[12:09:15]   at NUnit.Engine.Services.TestAgency.OnAgentExit(Object sender, EventArgs e)
[12:09:15]   at System.Diagnostics.Process.OnExited()
[12:09:15]   at System.Diagnostics.Process.RaiseOnExited()
[12:09:15]   at System.Threading.ExecutionContext.RunInternal(ExecutionContext executionContext, ContextCallback callback, Object state, Boolean preserveSyncCtx)
[12:09:15]   at System.Threading.ExecutionContext.Run(ExecutionContext executionContext, ContextCallback callback, Object state, Boolean preserveSyncCtx)
[12:09:15]   at System.Threading._ThreadPoolWaitOrTimerCallback.PerformWaitOrTimerCallback(Object state, Boolean timedOut)
[12:09:17]
[12:09:17]Errors, Failures and Warnings
[12:09:17]
[12:09:17]1) Error : 
[12:09:17]System.Net.Sockets.SocketException : An existing connection was forcibly closed by the remote host
[12:09:17]--SocketException
[12:09:17]An existing connection was forcibly closed by the remote host
[12:09:17]
[12:09:17]Server stack trace: 
[12:09:17]   at System.Net.Sockets.Socket.Receive(Byte[] buffer, Int32 offset, Int32 size, SocketFlags socketFlags)
[12:09:17]   at System.Runtime.Remoting.Channels.SocketStream.Read(Byte[] buffer, Int32 offset, Int32 size)
[12:09:17]   at System.Runtime.Remoting.Channels.SocketHandler.ReadFromSocket(Byte[] buffer, Int32 offset, Int32 count)
[12:09:17]   at System.Runtime.Remoting.Channels.SocketHandler.Read(Byte[] buffer, Int32 offset, Int32 count)
[12:09:17]   at System.Runtime.Remoting.Channels.SocketHandler.ReadAndMatchFourBytes(Byte[] buffer)
[12:09:17]   at System.Runtime.Remoting.Channels.Tcp.TcpSocketHandler.ReadAndMatchPreamble()
[12:09:17]   at System.Runtime.Remoting.Channels.Tcp.TcpSocketHandler.ReadVersionAndOperation(UInt16& operation)
[12:09:17]   at System.Runtime.Remoting.Channels.Tcp.TcpClientSocketHandler.ReadHeaders()
[12:09:17]   at System.Runtime.Remoting.Channels.Tcp.TcpClientTransportSink.ProcessMessage(IMessage msg, ITransportHeaders requestHeaders, Stream requestStream, ITransportHeaders& responseHeaders, Stream& responseStream)
[12:09:17]   at System.Runtime.Remoting.Channels.BinaryClientFormatterSink.SyncProcessMessage(IMessage msg)
[12:09:17]
[12:09:17]Exception rethrown at [0]: 
[12:09:17]   at System.Runtime.Remoting.Proxies.RealProxy.HandleReturnMessage(IMessage reqMsg, IMessage retMsg)
[12:09:17]   at System.Runtime.Remoting.Proxies.RealProxy.PrivateInvoke(MessageData& msgData, Int32 type)
[12:09:17]   at NUnit.Engine.ITestEngineRunner.Run(ITestEventListener listener, TestFilter filter)
[12:09:17]   at NUnit.Engine.Runners.ProcessRunner.RunTests(ITestEventListener listener, TestFilter filter)
[12:09:17]
[12:09:17]Test Run Summary
[12:09:17]  Overall result: Failed
[12:09:17]  Test Count: 0, Passed: 0, Failed: 0, Warnings: 0, Inconclusive: 0, Skipped: 0
[12:09:17]  Start time: 2019-11-06 12:08:39Z
[12:09:17]    End time: 2019-11-06 12:09:15Z
[12:09:17]    Duration: 35.610 seconds
[12:09:17]
[12:09:17]Results (nunit3) saved as src/NUnitTestResult.xml
[12:09:17]
[12:09:17]NUnit.Engine.NUnitEngineUnloadException : Agent Process was terminated successfully after error.
[12:09:17]  ----> System.Net.Sockets.SocketException : No connection could be made because the target machine actively refused it 127.0.0.1:53782
[12:09:17]Process exited with code -100
[12:09:17]Process exited with code -100 (Step: Run tests (Command Line))
[12:09:17]Step Run tests (Command Line) failed

@ChrisMaddock
Copy link
Member

You're right, there's no extra clues here. I'm afraid I'm really not sure what would cause this, perhaps others on @nunit/engine-team might have some ideas?

@jnm2
Copy link
Collaborator

jnm2 commented Nov 9, 2019

The SocketException means the agent process died unceremoniously. What's the error message when the agent is not in the picture (use --inprocess)?

@ChrisMaddock
Copy link
Member

Nothing, it seems - see the top post. That's the bit that's confusing me! 🙂

@jnm2
Copy link
Collaborator

jnm2 commented Nov 9, 2019

Oh, sorry. This is a long shot but I wonder if it could possibly be fixed by #660. How about trying the latest from the prerelease feed?

@jnm2
Copy link
Collaborator

jnm2 commented Nov 9, 2019

The internal trace just stops, similar to what I was seeing with the .NET Framework race condition that #660 avoids.

(Btw, the race condition is fixed in the November .NET Framework updates.)

@asendimitrov
Copy link
Author

asendimitrov commented Nov 11, 2019

Doesn't seem to help in any way :(

Tried several times with NUnit.ConsoleRunner.3.11.0-dev-04648 and results are exactly the same:
Process breaks at the 10th test (with --workers=10) and logs contain just the SocketException or nothing at all if running with --inprocess.

@CharliePoole
Copy link
Member

@nunit/engine-team Doesn't the engine get distributed with access to source code for debugging? Maybe we should make --agent-debug available in release builds so people can step through it?

@asendimitrov Just to be sure... have you looked at the separate internal trace log produced for the agent process? Each run may give you three or four logs.

@asendimitrov
Copy link
Author

@CharliePoole thanks for tuning in. Yes, I've checked all 3 log files and there doesn't see to be a hint in either of them. Above I've shared a zip containing the nunit agent process log, and in it the last messages are:

12:08:40.441 Info  [ 5] NUnitFrameworkDriver: Loading MyCompany.dll - see separate log file
12:08:40.661 Info  [ 5] NUnitFrameworkDriver: Loaded MyCompany.dll
12:08:40.662 Info  [ 5] NUnitFrameworkDriver: Running MyCompany.dll - see separate log file

The process breaks after those timestamps, unless I'm missing anything?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants