Summary
A Celo mainnet celo-reth / op-reth datadir developed a local gap where blocks 67211634 through 67211637 became unreadable even though logs show they were received, executed, added, and committed earlier. The unsafe chain continued past the gap and matched Forno at later blocks, but op-node could not advance safe/finalized because the engine could not read the next safe block after 67211633.
This appears to be a storage/provider/static-file consistency issue, not a canonical chain divergence.
Observed local gap
On the affected op-reth RPC:
67211633 was present.
67211634 through 67211637 returned null.
67211638 was present.
Forno had the missing blocks, and local 67211638.parentHash matched Forno block 67211637, so the local node did not appear to diverge from Celo mainnet. Instead, the local store had an unreadable gap inside an otherwise canonical chain.
Known hashes from the investigation:
- local/Forno
67211633: 0x96d6fe1e1b81f43e786cde3d44f4c0d5c9e0da930fb4e49611b45cc89d06f3bc
- Forno
67211634: 0x2fbd551dea2d53f20df1966a339d417291a207161f906a935c61f506a3f1fa06
- Forno
67211635: 0xbae1c8dd32ad6b250e42c95d34ac2067646a131cab3c5d66478ab06c8a3d4a53
- Forno
67211636: 0x66593291d1ced88758edb74b3be6015c4c768ec08f8cf6938a63edbd7744fda4
- Forno
67211637: 0xb1cdd7ef255a42f555169ca41d7cab87e4a93e7e2cd18dbe6009a3df45533f4f
- local/Forno
67211638: 0xd022d7c4517d3c76f2b70a81f7baf793a8ae091d41af2ea00e5861a4cb2d97b1
Logs indicate the missing blocks were committed
The missing block timestamps were approximately:
67211634: 2026-05-18T13:53:12Z
67211635: 2026-05-18T13:53:13Z
67211636: 2026-05-18T13:53:14Z
67211637: 2026-05-18T13:53:15Z
op-reth logs showed each of these blocks being received, added, and committed:
67211634: received 13:53:12.035324Z, added 13:53:12.072262Z, committed 13:53:12.074077Z
67211635: received 13:53:13.043232Z, added 13:53:13.162668Z, committed 13:53:13.164519Z
67211636: received 13:53:14.037280Z, added 13:53:14.146090Z, committed 13:53:14.147863Z
67211637: received 13:53:15.034068Z, added 13:53:15.047042Z, committed 13:53:15.048456Z
op-node also inserted/processed those payloads around the same timestamps.
Safe derivation symptom
Later, op-node advanced safe to 67211633 and then immediately reset because it could not read the next block from the engine:
Deriver system is resetting
err="expected engine was synced and had unsafe block to reconcile, but cannot find the block: not found"
The first reset happened around 2026-05-18T13:56:37Z, shortly after op-node recorded safe head 67211633. This lines up with the first missing block being 67211634.
Static-file/provider symptoms
A db get against static-file headers also reported an inconsistent static-file segment:
Error: File is in an inconsistent state
The affected segment was the headers segment covering 67000000_67499999.
A later recovery attempt with a Celo-aware stage command made it past the earlier CIP-64 transaction decoding problem (tracked separately in #180), but failed during Execution unwind because header 67211634 is missing:
2026-05-19T15:54:57.421615Z INFO Unwinding{stage=Execution}: Starting unwind from=67293475 to=67211633 bad_block=None
2026-05-19T16:02:41.526782Z ERROR shutting down due to error
Error: database integrity error occurred: no header found for Number(67211634)
Caused by:
no header found for Number(67211634)
Location:
/usr/local/cargo/git/checkouts/reth-e231042ee7db3fb7/d6324d6/crates/cli/commands/src/stage/unwind.rs:74:9
This reinforces that the local database/static files are missing the header for 67211634, despite earlier commit logs.
Host/process signals checked
Grafana/host checks did not show an obvious OOM, process restart, or filesystem device error around the incident window:
- op-reth/op-node process start timestamps appeared constant around the failure window.
node_vmstat_oom_kill did not indicate an OOM event.
node_filesystem_device_error was zero for /var/lib/op-reth and other checked mounts.
ExEx relevance
The node had proofs-history ExEx enabled, and the ExEx later logged Missing block 67211634, but that looks like an additional detector rather than the root cause. Similar behavior may have been observed on a Celo Sepolia node without ExEx enabled, so this should probably be investigated first as a storage/static-file/provider consistency issue independent of ExEx.
Impact
The node can continue serving/holding later unsafe blocks that match the canonical chain, but op-node safe/finalized derivation cannot cross the missing local block. Recovery by unwind is also blocked because the execution unwind expects the missing header.
Questions / investigation points
- How can op-reth log a block as committed and later be unable to read its header by number?
- Can static-file production or pruning leave a small header/body gap while later headers remain available?
- Is there a known failure mode where static-file segment metadata/indexes become inconsistent without a process restart/OOM/filesystem error?
- Should stage unwind have a repair path for this kind of partial static-file/header gap?
- Is this specific to Celo primitives/storage routing, or inherited from upstream reth static-file behavior?
Summary
A Celo mainnet
celo-reth/ op-reth datadir developed a local gap where blocks67211634through67211637became unreadable even though logs show they were received, executed, added, and committed earlier. The unsafe chain continued past the gap and matched Forno at later blocks, but op-node could not advance safe/finalized because the engine could not read the next safe block after67211633.This appears to be a storage/provider/static-file consistency issue, not a canonical chain divergence.
Observed local gap
On the affected op-reth RPC:
67211633was present.67211634through67211637returnednull.67211638was present.Forno had the missing blocks, and local
67211638.parentHashmatched Forno block67211637, so the local node did not appear to diverge from Celo mainnet. Instead, the local store had an unreadable gap inside an otherwise canonical chain.Known hashes from the investigation:
67211633:0x96d6fe1e1b81f43e786cde3d44f4c0d5c9e0da930fb4e49611b45cc89d06f3bc67211634:0x2fbd551dea2d53f20df1966a339d417291a207161f906a935c61f506a3f1fa0667211635:0xbae1c8dd32ad6b250e42c95d34ac2067646a131cab3c5d66478ab06c8a3d4a5367211636:0x66593291d1ced88758edb74b3be6015c4c768ec08f8cf6938a63edbd7744fda467211637:0xb1cdd7ef255a42f555169ca41d7cab87e4a93e7e2cd18dbe6009a3df45533f4f67211638:0xd022d7c4517d3c76f2b70a81f7baf793a8ae091d41af2ea00e5861a4cb2d97b1Logs indicate the missing blocks were committed
The missing block timestamps were approximately:
67211634:2026-05-18T13:53:12Z67211635:2026-05-18T13:53:13Z67211636:2026-05-18T13:53:14Z67211637:2026-05-18T13:53:15Zop-reth logs showed each of these blocks being received, added, and committed:
67211634: received13:53:12.035324Z, added13:53:12.072262Z, committed13:53:12.074077Z67211635: received13:53:13.043232Z, added13:53:13.162668Z, committed13:53:13.164519Z67211636: received13:53:14.037280Z, added13:53:14.146090Z, committed13:53:14.147863Z67211637: received13:53:15.034068Z, added13:53:15.047042Z, committed13:53:15.048456Zop-node also inserted/processed those payloads around the same timestamps.
Safe derivation symptom
Later, op-node advanced safe to
67211633and then immediately reset because it could not read the next block from the engine:The first reset happened around
2026-05-18T13:56:37Z, shortly after op-node recorded safe head67211633. This lines up with the first missing block being67211634.Static-file/provider symptoms
A
db getagainst static-file headers also reported an inconsistent static-file segment:The affected segment was the headers segment covering
67000000_67499999.A later recovery attempt with a Celo-aware stage command made it past the earlier CIP-64 transaction decoding problem (tracked separately in #180), but failed during
Executionunwind because header67211634is missing:This reinforces that the local database/static files are missing the header for
67211634, despite earlier commit logs.Host/process signals checked
Grafana/host checks did not show an obvious OOM, process restart, or filesystem device error around the incident window:
node_vmstat_oom_killdid not indicate an OOM event.node_filesystem_device_errorwas zero for/var/lib/op-rethand other checked mounts.ExEx relevance
The node had proofs-history ExEx enabled, and the ExEx later logged
Missing block 67211634, but that looks like an additional detector rather than the root cause. Similar behavior may have been observed on a Celo Sepolia node without ExEx enabled, so this should probably be investigated first as a storage/static-file/provider consistency issue independent of ExEx.Impact
The node can continue serving/holding later unsafe blocks that match the canonical chain, but op-node safe/finalized derivation cannot cross the missing local block. Recovery by unwind is also blocked because the execution unwind expects the missing header.
Questions / investigation points