Skip to content

[SUREFIRE-2049] Fix SHUTDOWN type lost during command serialization.#3270

Merged
olamy merged 1 commit intoapache:masterfrom
aghoussaini:fix/forcible-kill-on-fork-timeout
Feb 28, 2026
Merged

[SUREFIRE-2049] Fix SHUTDOWN type lost during command serialization.#3270
olamy merged 1 commit intoapache:masterfrom
aghoussaini:fix/forcible-kill-on-fork-timeout

Conversation

@aghoussaini
Copy link
Contributor

Problem

When forkedProcessTimeoutInSeconds is configured, the plugin correctly detects the timeout and sends a SHUTDOWN(KILL) command to the forked JVM through the binary command channel. However, the forked JVM receives SHUTDOWN(DEFAULT) instead of SHUTDOWN(KILL) due to a data format mismatch in the serialization. This causes the forked JVM to ignore the kill signal entirely — the build hangs for the full duration of the test instead of terminating after the configured timeout.

Root Cause

The Shutdown enum has two string representations:

Enum constant name() (Java enum name) getParam() (wire protocol value)
DEFAULT "DEFAULT" "testset"
EXIT "EXIT" "exit"
KILL "KILL" "kill"

The sending side and receiving side used different representations:

Sending side (plugin):
Command.toShutdown(Shutdown.KILL) stored shutdownType.name()"KILL", which was then serialized by CommandEncoder.sendShutdown("KILL") and sent over the binary command channel.

Receiving side (forked JVM):
CommandDecoder.toMessage() extracted the string "KILL" from the wire and passed it to Shutdown.parameterOf("KILL"), which iterates through enum values comparing against shutdown.param:

  • "testset".equals("KILL") → false
  • "exit".equals("KILL") → false
  • "kill".equals("KILL") → false (case mismatch)

Since nothing matched, parameterOf() fell through and returned Shutdown.DEFAULT. The forked JVM's ForkedBooter.createExitHandler() then entered the else branch (neither isKill() nor isExit()), which only dumps a thread trace — the JVM continues running.

Fix

  1. toShutdown(): Changed shutdownType.name() to shutdownType.getParam() so the wire protocol sends "kill" instead of "KILL"
  2. toShutdownData(): Changed Shutdown.valueOf(data) to Shutdown.parameterOf(data) so the reverse conversion is consistent

This makes the sending side consistent with what the receiving side's CommandDecoder already expects via Shutdown.parameterOf().

Verification

Tested with a project containing a single test that sleeps for 5 minutes with forkedProcessTimeoutInSeconds=5. For the record, it is the sample that was shared here: #2049.

  • Before fix: Build hangs for the full 5 minutes
  • After fix: Build completes in ~6 seconds with the message:
    Surefire is going to kill self fork JVM. Received SHUTDOWN {KILL} command from Maven shutdown hook.

Checklist

Following this checklist to help us incorporate your contribution quickly and easily:

  • Each commit in the pull request should have a meaningful subject line and body.
  • Write a pull request description that is detailed enough to understand what the pull request does, how, and why.
  • Run mvn clean install to make sure basic checks pass. A more thorough check will be performed on your pull request automatically.
  • You have run the integration tests successfully (mvn -Prun-its clean install).

If your pull request is about ~20 lines of code you don't need to sign an
Individual Contributor License Agreement if you are unsure
please ask on the developers list.

To make clear that you license your contribution under
the Apache License Version 2.0, January 2004
you have to acknowledge this by using the following check-box.

@aghoussaini
Copy link
Contributor Author

For the record, take a look at the old decode() method (pre-M5):

public static Command decode(DataInputStream is) throws IOException {
    ...
    String data = command.toDataTypeAsString(buffer);
    return new Command(command, data);   // stores raw string directly
}

The old protocol created new Command(command, data) directly with the raw decoded string. It never went through Shutdown.parameterOf(). The Command stored "KILL" (from shutdownType.name()), and toShutdownData() called Shutdown.valueOf("KILL") which worked perfectly because valueOf() matches enum names.

The new binary protocol (introduced in SUREFIRE-1847) added CommandDecoder which introduced Shutdown.parameterOf() in the decoding path — but parameterOf() matches on param strings ("kill"), not enum names ("KILL"). The mismatch was introduced at that point.

@aghoussaini
Copy link
Contributor Author

Would appreciate a second opinion. This is highly needed as our CIs often suffer from long hanging tests :(

@olamy olamy added the bug Something isn't working label Feb 24, 2026
@kevinnammour
Copy link
Contributor

kevinnammour commented Feb 24, 2026

Would appreciate a second opinion. This is highly needed as our CIs often suffer from long hanging tests :(

Indeed we're suffering from the same issue as well; and it's been like this since we upgraded to the latest version. Change looks good to me. do you think we can include it in the 3.5.6 milestone @olamy @Tibor17?

@aghoussaini
Copy link
Contributor Author

@olamy @elharo

If this essentially fixes a bug that's been present since 3.0.0-M5, should we cherry-pick it to surefire-3.5.x so that it can ship in 3.5.6?

@olamy
Copy link
Member

olamy commented Feb 24, 2026

Yes that's the goal.
once is merged I will cherrypick to the maintenance branch via a PR

@aghoussaini
Copy link
Contributor Author

@olamy

Sounds great :) What's the ETA on that generally?

@olamy
Copy link
Member

olamy commented Feb 25, 2026

@olamy

Sounds great :) What's the ETA on that generally?

you PR looks great. I'm ready to merge. I'm just waiting a bit to see if there any other comments (valuable, interesting or nitpicking...)

@olamy olamy merged commit 5687822 into apache:master Feb 28, 2026
14 checks passed
@github-actions github-actions bot added this to the 3.5.6 milestone Feb 28, 2026
olamy pushed a commit to olamy/maven-surefire that referenced this pull request Feb 28, 2026
@olamy
Copy link
Member

olamy commented Feb 28, 2026

@aghoussaini thanks for the fix. backport PR to 3.5.x #3289

olamy added a commit that referenced this pull request Mar 1, 2026
…3270) (#3289)

Co-authored-by: Amine Ghoussaini <54037376+aghoussaini@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[SUREFIRE-1722] JVM is not killed after forkedProcessTimeoutInSeconds has elapsed

3 participants