Severity: HIGH — entire blockchain service crash-loops, taking the whole Teranode stack down via docker-compose depends_on cascade
Affected version: Confirmed on v0.13.1 (commit 247dfd5fe). Re-verified on v0.15.0 — the unsafe pattern is unchanged at source line 177 of stores/blockchain/sql/GetBestBlockHeader.go:
bits, _ := model.NewNBitFromSlice(nBits)
blockHeader.Bits = *bits
The panic does not reproduce on v0.15.0 when postgres data is clean; it would reproduce identically if any operator hits the same malformed-bytea condition (or any other case where NewNBitFromSlice returns nil + non-nil error). The defensive nil-check fix is still applicable.
Component: stores/blockchain/sql/GetBestBlockHeader.go
Summary
GetBestBlockHeader queries the blocks table for the chain tip and scans the result into Go structs. The function ignores errors from model.NewNBitFromSlice(nBits) and immediately dereferences the returned pointer (blockHeader.Bits = *bits). If nBits from the database is malformed (wrong length, ASCII-encoded hex stored as bytes, etc.), NewNBitFromSlice returns (nil, error) and *bits panics with nil-pointer dereference.
Because the panic happens in startSubscriptions.func2 early in service startup, the blockchain service crashes immediately and is restarted by docker. The restart loop continued for 104+ cycles in the affected scenario before the underlying data was repaired.
Stack trace (from running v0.13.1 binary)
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x2006d26]
goroutine 226 [running]:
github.com/bsv-blockchain/teranode/stores/blockchain/sql.(*SQL).GetBestBlockHeader(0xc000511600, {0x5a924b0, 0x82727e0})
github.com/bsv-blockchain/teranode/stores/blockchain/sql/GetBestBlockHeader.go:176 +0x786
github.com/bsv-blockchain/teranode/services/blockchain.(*Blockchain).startSubscriptions.func2({{0x5aa4e80, 0xc0019840f0}, {0xc00198e000, 0x3}, 0xc001a92000})
github.com/bsv-blockchain/teranode/services/blockchain/Server.go:638 +0x67
created by github.com/bsv-blockchain/teranode/services/blockchain.(*Blockchain).startSubscriptions in goroutine 95
github.com/bsv-blockchain/teranode/services/blockchain/Server.go:637 +0x386
Trigger scenario (in the affected node)
An operator performed a DELETE + INSERT round-trip on a subset of blocks rows. The restoration SQL used quote_literal(encode(bytea_col, 'hex')) for bytea columns — which produced INSERT statements containing the ASCII representation of the hex string, NOT the binary value:
-- Wrong (what the operator did):
INSERT INTO blocks VALUES (..., '1b45fad0a6f21e9b2db0528e0e443cafb37dff8c536f1d000000000000000000', ...)
-- Postgres stored the 64 ASCII characters '1','b','4','5',... as the 64-byte content of the bytea column.
-- Right:
INSERT INTO blocks VALUES (..., decode('1b45fad0a6f21e9b2db0528e0e443cafb37dff8c536f1d000000000000000000', 'hex'), ...)
-- Postgres decodes the hex string to its 32-byte binary value before storing.
After the bad restore, octet_length(hash) returned 64 for restored rows vs the expected 32. octet_length(n_bits) returned 8 vs the expected 4. When GetBestBlockHeader read these rows and called NewNBitFromSlice(<8-byte-buffer>), the function returned (nil, error). The next line blockHeader.Bits = *bits then panicked.
This is an operator error (the SQL was wrong), but Teranode crash-looped instead of returning a clean error to higher layers OR refusing to start with an explanatory log message.
Suggested fix
In stores/blockchain/sql/GetBestBlockHeader.go, replace the unsafe pattern:
// Before — current code
bits, _ := model.NewNBitFromSlice(nBits)
blockHeader.Bits = *bits // ← panics if bits is nil
// After — handle the error
bits, err := model.NewNBitFromSlice(nBits)
if err != nil || bits == nil {
return nil, nil, errors.NewStorageError(
"GetBestBlockHeader: malformed n_bits (length=%d, expected 4) for block id=%d height=%d: %w",
len(nBits), blockHeaderMeta.ID, blockHeaderMeta.Height, err)
}
blockHeader.Bits = *bits
Apply the same defensive pattern to subsequent calls in the function (chainhash.NewHash, bt.NewTxFromBytes, etc. — the error is checked for those but the surrounding flow still assumes valid binary data).
Optionally: add a sanity check at startup that scans blocks for any row with non-32-byte hash / non-4-byte n_bits / non-32-byte chain_work, and refuses to start with a clear error message pointing the operator at the corrupted rows.
Why severity is HIGH
The panic in blockchain service triggers docker-compose depends_on cascade restarts: blockassembly, legacy, kafka-shared all stop being able to complete startup because their dependency (blockchain) is unavailable. In the observed incident, blockchain restarted 104 times in ~15 minutes; the entire Teranode stack was unable to make any progress until the underlying data was repaired by hand.
A defensive nil check + descriptive error would convert this from "stack crash-loop" to "service exits with operator-actionable log message."
Workaround
- Detect via
octet_length queries on bytea columns
- Repair via
DELETE of affected rows + COPY FROM of the same data in proper CSV bytea format (\xHEX prefix)
- Restart services in dependency order (blockchain → blockassembly → legacy)
Note on operator culpability vs Teranode robustness
The trigger was operator error in the restoration SQL. However:
- Teranode's CLI does not provide a
restore-blocks-from-csv or similar admin path; operators are forced to write raw SQL
- The panic was indistinguishable from a real bug for the duration of the incident (operator had no signal that the corruption was self-inflicted vs an internal Teranode issue)
- A defensive nil check costs O(1) and turns a multi-cycle crash-loop into an immediate operator-actionable error
Severity: HIGH — entire blockchain service crash-loops, taking the whole Teranode stack down via docker-compose
depends_oncascadeAffected version: Confirmed on v0.13.1 (commit
247dfd5fe). Re-verified on v0.15.0 — the unsafe pattern is unchanged at source line 177 ofstores/blockchain/sql/GetBestBlockHeader.go:The panic does not reproduce on v0.15.0 when postgres data is clean; it would reproduce identically if any operator hits the same malformed-bytea condition (or any other case where
NewNBitFromSlicereturns nil + non-nil error). The defensive nil-check fix is still applicable.Component:
stores/blockchain/sql/GetBestBlockHeader.goSummary
GetBestBlockHeaderqueries theblockstable for the chain tip and scans the result into Go structs. The function ignores errors frommodel.NewNBitFromSlice(nBits)and immediately dereferences the returned pointer (blockHeader.Bits = *bits). IfnBitsfrom the database is malformed (wrong length, ASCII-encoded hex stored as bytes, etc.),NewNBitFromSlicereturns(nil, error)and*bitspanics with nil-pointer dereference.Because the panic happens in
startSubscriptions.func2early in service startup, the blockchain service crashes immediately and is restarted by docker. The restart loop continued for 104+ cycles in the affected scenario before the underlying data was repaired.Stack trace (from running v0.13.1 binary)
Trigger scenario (in the affected node)
An operator performed a
DELETE+INSERTround-trip on a subset ofblocksrows. The restoration SQL usedquote_literal(encode(bytea_col, 'hex'))for bytea columns — which produced INSERT statements containing the ASCII representation of the hex string, NOT the binary value:After the bad restore,
octet_length(hash)returned 64 for restored rows vs the expected 32.octet_length(n_bits)returned 8 vs the expected 4. WhenGetBestBlockHeaderread these rows and calledNewNBitFromSlice(<8-byte-buffer>), the function returned(nil, error). The next lineblockHeader.Bits = *bitsthen panicked.This is an operator error (the SQL was wrong), but Teranode crash-looped instead of returning a clean error to higher layers OR refusing to start with an explanatory log message.
Suggested fix
In
stores/blockchain/sql/GetBestBlockHeader.go, replace the unsafe pattern:Apply the same defensive pattern to subsequent calls in the function (
chainhash.NewHash,bt.NewTxFromBytes, etc. — the error is checked for those but the surrounding flow still assumes valid binary data).Optionally: add a sanity check at startup that scans
blocksfor any row with non-32-byte hash / non-4-byte n_bits / non-32-byte chain_work, and refuses to start with a clear error message pointing the operator at the corrupted rows.Why severity is HIGH
The panic in blockchain service triggers docker-compose
depends_oncascade restarts: blockassembly, legacy, kafka-shared all stop being able to complete startup because their dependency (blockchain) is unavailable. In the observed incident, blockchain restarted 104 times in ~15 minutes; the entire Teranode stack was unable to make any progress until the underlying data was repaired by hand.A defensive nil check + descriptive error would convert this from "stack crash-loop" to "service exits with operator-actionable log message."
Workaround
octet_lengthqueries on bytea columnsDELETEof affected rows +COPY FROMof the same data in proper CSV bytea format (\xHEXprefix)Note on operator culpability vs Teranode robustness
The trigger was operator error in the restoration SQL. However:
restore-blocks-from-csvor similar admin path; operators are forced to write raw SQL