Description
RedisProcessRepository::orphaned() uses a non-atomic read-modify-write pattern on the {master}:orphans hash. If two horizon:purge commands overlap (e.g., cron fires while a previous run is still executing), one can delete entries the other intends to preserve.
The code
public function orphaned($master, array $processIds)
{
$time = CarbonImmutable::now()->getTimestamp();
// Step 1: Read all current orphan keys
$shouldRemove = array_diff($this->connection()->hkeys(
$key = "{$master}:orphans"
), $processIds);
// Step 2: Delete keys no longer in the orphan list
if (! empty($shouldRemove)) {
$this->connection()->hdel($key, ...$shouldRemove);
}
// Step 3: Add/preserve current orphans
$this->connection()->pipeline(function ($pipe) use ($key, $time, $processIds) {
foreach ($processIds as $processId) {
$pipe->hsetnx($key, $processId, $time);
}
});
}
The race
Between HKEYS (step 1) and HDEL (step 2), another process can modify the hash:
- Process A calls
orphaned($master, [PID-1, PID-2, PID-3])
- Process B calls
orphaned($master, [PID-1, PID-2, PID-4]) (PID-3 died, PID-4 started)
- Both read HKEYS ->
[PID-1, PID-2, PID-3, PID-4, PID-5]
- A computes: delete
[PID-4, PID-5] (not in A's list)
- B computes: delete
[PID-3, PID-5] (not in B's list)
- A executes HDEL for PID-4 — removes an entry B wants to preserve
- B executes HDEL for PID-3 — removes an entry A wants to preserve
- HSETNX in step 3 can re-add some entries, but the timestamps are reset, disrupting
orphanedFor() expiry tracking
When this happens
horizon:purge is scheduled via cron with no withoutOverlapping() protection
- If a purge run takes longer than the cron interval, two instances overlap
- Multi-server deployments where multiple servers run
horizon:purge against the same Redis
Impact
- Orphaned processes can be missed during cleanup, accumulating over time
orphanedFor() returns stale data because timestamps get reset by HSETNX
- No errors are logged — the orphan hash silently drifts from reality
Suggested fix
A Lua script that atomically reads the current hash, computes the diff, deletes stale entries, and adds new ones — all in a single EVAL. This follows the existing pattern used by LuaScripts::updateMetrics().
Part of #1749 (non-atomic Redis patterns overview).
Description
RedisProcessRepository::orphaned()uses a non-atomic read-modify-write pattern on the{master}:orphanshash. If twohorizon:purgecommands overlap (e.g., cron fires while a previous run is still executing), one can delete entries the other intends to preserve.The code
The race
Between HKEYS (step 1) and HDEL (step 2), another process can modify the hash:
orphaned($master, [PID-1, PID-2, PID-3])orphaned($master, [PID-1, PID-2, PID-4])(PID-3 died, PID-4 started)[PID-1, PID-2, PID-3, PID-4, PID-5][PID-4, PID-5](not in A's list)[PID-3, PID-5](not in B's list)orphanedFor()expiry trackingWhen this happens
horizon:purgeis scheduled via cron with nowithoutOverlapping()protectionhorizon:purgeagainst the same RedisImpact
orphanedFor()returns stale data because timestamps get reset by HSETNXSuggested fix
A Lua script that atomically reads the current hash, computes the diff, deletes stale entries, and adds new ones — all in a single
EVAL. This follows the existing pattern used byLuaScripts::updateMetrics().Part of #1749 (non-atomic Redis patterns overview).