Skip to content

Non-atomic HKEYS/HDEL in RedisProcessRepository::orphaned() can miss orphaned processes #1750

@JoshSalway

Description

@JoshSalway

Description

RedisProcessRepository::orphaned() uses a non-atomic read-modify-write pattern on the {master}:orphans hash. If two horizon:purge commands overlap (e.g., cron fires while a previous run is still executing), one can delete entries the other intends to preserve.

The code

public function orphaned($master, array $processIds)
{
    $time = CarbonImmutable::now()->getTimestamp();

    // Step 1: Read all current orphan keys
    $shouldRemove = array_diff($this->connection()->hkeys(
        $key = "{$master}:orphans"
    ), $processIds);

    // Step 2: Delete keys no longer in the orphan list
    if (! empty($shouldRemove)) {
        $this->connection()->hdel($key, ...$shouldRemove);
    }

    // Step 3: Add/preserve current orphans
    $this->connection()->pipeline(function ($pipe) use ($key, $time, $processIds) {
        foreach ($processIds as $processId) {
            $pipe->hsetnx($key, $processId, $time);
        }
    });
}

The race

Between HKEYS (step 1) and HDEL (step 2), another process can modify the hash:

  1. Process A calls orphaned($master, [PID-1, PID-2, PID-3])
  2. Process B calls orphaned($master, [PID-1, PID-2, PID-4]) (PID-3 died, PID-4 started)
  3. Both read HKEYS -> [PID-1, PID-2, PID-3, PID-4, PID-5]
  4. A computes: delete [PID-4, PID-5] (not in A's list)
  5. B computes: delete [PID-3, PID-5] (not in B's list)
  6. A executes HDEL for PID-4 — removes an entry B wants to preserve
  7. B executes HDEL for PID-3 — removes an entry A wants to preserve
  8. HSETNX in step 3 can re-add some entries, but the timestamps are reset, disrupting orphanedFor() expiry tracking

When this happens

  • horizon:purge is scheduled via cron with no withoutOverlapping() protection
  • If a purge run takes longer than the cron interval, two instances overlap
  • Multi-server deployments where multiple servers run horizon:purge against the same Redis

Impact

  • Orphaned processes can be missed during cleanup, accumulating over time
  • orphanedFor() returns stale data because timestamps get reset by HSETNX
  • No errors are logged — the orphan hash silently drifts from reality

Suggested fix

A Lua script that atomically reads the current hash, computes the diff, deletes stale entries, and adds new ones — all in a single EVAL. This follows the existing pattern used by LuaScripts::updateMetrics().

Part of #1749 (non-atomic Redis patterns overview).

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions