Skip to content

Supervisor::terminate() removes repository entry before stopping processes, causing false orphan detection #1740

@JoshSalway

Description

@JoshSalway

Description

Supervisor::terminate() calls SupervisorRepository::forget() before terminating its worker processes. During the shutdown window, the still-running workers become invisible to ProcessInspector::monitoring() and can be incorrectly flagged as orphans by horizon:purge, which force-kills them — potentially interrupting jobs mid-execution.

The code (src/Supervisor.php)

public function terminate($status = 0)
{
    $this->working = false;

    // Repository entry removed HERE — workers are now invisible
    app(SupervisorRepository::class)->forget($this->name);

    // But processes are terminated AFTER
    $this->processPools->each(function ($pool) {
        $pool->processes()->each(function ($process) {
            $process->terminate();
        });
    });

    // Waits for graceful shutdown (can take minutes for long-running jobs)
    if ($this->shouldWait()) {
        while ($this->processPools->map->runningProcesses()->collapse()->count()) {
            sleep(1);
        }
    }

    $this->exit($status);
}

The problem

  1. forget() removes the supervisor from Redis
  2. ProcessInspector::monitoring() builds the "known process" list from SupervisorRepository::all() — this supervisor's PID is now missing
  3. Worker processes are still running (not yet terminated, or terminating gracefully)
  4. horizon:purge calls ProcessInspector::orphaned() which computes current PIDs - known PIDs
  5. The still-running workers appear as orphans and get SIGTERM/SIGKILL

When shouldWait() is true (graceful shutdown), the window can be significant — workers processing long-running jobs could be waiting minutes to finish, during which they're flagged as orphans.

How this was found

During a codebase review examining the interaction between Supervisor::terminate(), ProcessInspector, and horizon:purge. The comment in the code says "mark this supervisor as terminating" but forget() doesn't mark it — it completely removes it.

Suggested fix

Move forget() to after processes have stopped:

public function terminate($status = 0)
{
    $this->working = false;

    $this->processPools->each(function ($pool) {
        $pool->processes()->each(function ($process) {
            $process->terminate();
        });
    });

    if ($this->shouldWait()) {
        while ($this->processPools->map->runningProcesses()->collapse()->count()) {
            sleep(1);
        }
    }

    app(SupervisorRepository::class)->forget($this->name);

    $this->exit($status);
}

Impact

During deployments or restarts, workers can be force-killed mid-job instead of completing gracefully. This is especially problematic when fast_termination is disabled (the default) and jobs take a long time to process.

Related symptoms:

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions