Description
RedisJobRepository::updateRetryInformationOnParent() and storeRetryReference() both use a non-atomic read-modify-write pattern on the retried_by hash field. If two retry jobs for the same parent complete concurrently, one update silently overwrites the other.
The code
updateRetryInformationOnParent() (line ~495):
if ($retries = $this->connection()->hget($payload->retryOf(), 'retried_by')) {
$retries = $this->updateRetryStatus($payload, json_decode($retries, true), $failed);
$this->connection()->hset($payload->retryOf(), 'retried_by', json_encode($retries));
}
storeRetryReference() (line ~700):
$retries = json_decode($this->connection()->hget($id, 'retried_by') ?: '[]');
$retries[] = ['id' => $retryId, 'status' => 'pending', ...];
$this->connection()->hmset($id, ['retried_by' => json_encode($retries)]);
The race
Both methods: HGET -> modify in PHP -> HSET. Two concurrent operations:
- Retry A reads
[retry1: pending]
- Retry B reads
[retry1: pending] (same state)
- Retry A updates retry1 to completed, HSETs
[retry1: completed]
- Retry B appends retry2, HSETs
[retry1: pending, retry2: pending]
- Retry A's status update is silently lost
How this was found
Found during a systematic audit of Horizon's Redis usage, searching for non-atomic read-modify-write patterns (HGET → modify in PHP → HSET) that are vulnerable to concurrent access. This is the same class of bug as the SETNX/EXPIRE race in Lock::get() (#1738) and the pipeline race in RedisHorizonCommandQueue::pending() (#1739).
The existing LuaScripts::updateMetrics() method already uses a Lua script for exactly this reason — atomic read-modify-write inside Redis — which confirmed these two methods should follow the same pattern.
Real-world evidence
Several existing issues describe symptoms consistent with this race condition under high concurrency:
The bug is silent — no errors are thrown, entries are simply dropped from the retried_by array or status updates are reverted. Users see missing or incorrect retry entries on the Horizon dashboard but would likely attribute it to other causes.
Suggested fix
Use Lua scripts (via the existing LuaScripts class) to perform the read-modify-write atomically inside Redis. This matches the existing pattern used by LuaScripts::updateMetrics() which does the same kind of atomic HMGET/modify/HMSET.
Impact
Retry status shown in the Horizon dashboard can be incorrect — a retry marked as "completed" may revert to "pending", or a retry reference may disappear entirely. This is a dashboard data integrity issue. Lower severity than #1738/#1739 but same root cause.
Description
RedisJobRepository::updateRetryInformationOnParent()andstoreRetryReference()both use a non-atomic read-modify-write pattern on theretried_byhash field. If two retry jobs for the same parent complete concurrently, one update silently overwrites the other.The code
updateRetryInformationOnParent()(line ~495):storeRetryReference()(line ~700):The race
Both methods: HGET -> modify in PHP -> HSET. Two concurrent operations:
[retry1: pending][retry1: pending](same state)[retry1: completed][retry1: pending, retry2: pending]How this was found
Found during a systematic audit of Horizon's Redis usage, searching for non-atomic read-modify-write patterns (HGET → modify in PHP → HSET) that are vulnerable to concurrent access. This is the same class of bug as the SETNX/EXPIRE race in
Lock::get()(#1738) and the pipeline race inRedisHorizonCommandQueue::pending()(#1739).The existing
LuaScripts::updateMetrics()method already uses a Lua script for exactly this reason — atomic read-modify-write inside Redis — which confirmed these two methods should follow the same pattern.Real-world evidence
Several existing issues describe symptoms consistent with this race condition under high concurrency:
retried_bywould silently lose data.The bug is silent — no errors are thrown, entries are simply dropped from the
retried_byarray or status updates are reverted. Users see missing or incorrect retry entries on the Horizon dashboard but would likely attribute it to other causes.Suggested fix
Use Lua scripts (via the existing
LuaScriptsclass) to perform the read-modify-write atomically inside Redis. This matches the existing pattern used byLuaScripts::updateMetrics()which does the same kind of atomic HMGET/modify/HMSET.Impact
Retry status shown in the Horizon dashboard can be incorrect — a retry marked as "completed" may revert to "pending", or a retry reference may disappear entirely. This is a dashboard data integrity issue. Lower severity than #1738/#1739 but same root cause.