Skip to content

[bug] GetBestBlockHeader panics with nil-pointer dereference on malformed blocks row #884

@moseljack

Description

@moseljack

Severity: HIGH — entire blockchain service crash-loops, taking the whole Teranode stack down via docker-compose depends_on cascade
Affected version: Confirmed on v0.13.1 (commit 247dfd5fe). Re-verified on v0.15.0 — the unsafe pattern is unchanged at source line 177 of stores/blockchain/sql/GetBestBlockHeader.go:

bits, _ := model.NewNBitFromSlice(nBits)
blockHeader.Bits = *bits

The panic does not reproduce on v0.15.0 when postgres data is clean; it would reproduce identically if any operator hits the same malformed-bytea condition (or any other case where NewNBitFromSlice returns nil + non-nil error). The defensive nil-check fix is still applicable.

Component: stores/blockchain/sql/GetBestBlockHeader.go

Summary

GetBestBlockHeader queries the blocks table for the chain tip and scans the result into Go structs. The function ignores errors from model.NewNBitFromSlice(nBits) and immediately dereferences the returned pointer (blockHeader.Bits = *bits). If nBits from the database is malformed (wrong length, ASCII-encoded hex stored as bytes, etc.), NewNBitFromSlice returns (nil, error) and *bits panics with nil-pointer dereference.

Because the panic happens in startSubscriptions.func2 early in service startup, the blockchain service crashes immediately and is restarted by docker. The restart loop continued for 104+ cycles in the affected scenario before the underlying data was repaired.

Stack trace (from running v0.13.1 binary)

panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x2006d26]

goroutine 226 [running]:
github.com/bsv-blockchain/teranode/stores/blockchain/sql.(*SQL).GetBestBlockHeader(0xc000511600, {0x5a924b0, 0x82727e0})
    github.com/bsv-blockchain/teranode/stores/blockchain/sql/GetBestBlockHeader.go:176 +0x786
github.com/bsv-blockchain/teranode/services/blockchain.(*Blockchain).startSubscriptions.func2({{0x5aa4e80, 0xc0019840f0}, {0xc00198e000, 0x3}, 0xc001a92000})
    github.com/bsv-blockchain/teranode/services/blockchain/Server.go:638 +0x67
created by github.com/bsv-blockchain/teranode/services/blockchain.(*Blockchain).startSubscriptions in goroutine 95
    github.com/bsv-blockchain/teranode/services/blockchain/Server.go:637 +0x386

Trigger scenario (in the affected node)

An operator performed a DELETE + INSERT round-trip on a subset of blocks rows. The restoration SQL used quote_literal(encode(bytea_col, 'hex')) for bytea columns — which produced INSERT statements containing the ASCII representation of the hex string, NOT the binary value:

-- Wrong (what the operator did):
INSERT INTO blocks VALUES (..., '1b45fad0a6f21e9b2db0528e0e443cafb37dff8c536f1d000000000000000000', ...)
-- Postgres stored the 64 ASCII characters '1','b','4','5',... as the 64-byte content of the bytea column.

-- Right:
INSERT INTO blocks VALUES (..., decode('1b45fad0a6f21e9b2db0528e0e443cafb37dff8c536f1d000000000000000000', 'hex'), ...)
-- Postgres decodes the hex string to its 32-byte binary value before storing.

After the bad restore, octet_length(hash) returned 64 for restored rows vs the expected 32. octet_length(n_bits) returned 8 vs the expected 4. When GetBestBlockHeader read these rows and called NewNBitFromSlice(<8-byte-buffer>), the function returned (nil, error). The next line blockHeader.Bits = *bits then panicked.

This is an operator error (the SQL was wrong), but Teranode crash-looped instead of returning a clean error to higher layers OR refusing to start with an explanatory log message.

Suggested fix

In stores/blockchain/sql/GetBestBlockHeader.go, replace the unsafe pattern:

// Before — current code
bits, _ := model.NewNBitFromSlice(nBits)
blockHeader.Bits = *bits   // ← panics if bits is nil
// After — handle the error
bits, err := model.NewNBitFromSlice(nBits)
if err != nil || bits == nil {
    return nil, nil, errors.NewStorageError(
        "GetBestBlockHeader: malformed n_bits (length=%d, expected 4) for block id=%d height=%d: %w",
        len(nBits), blockHeaderMeta.ID, blockHeaderMeta.Height, err)
}
blockHeader.Bits = *bits

Apply the same defensive pattern to subsequent calls in the function (chainhash.NewHash, bt.NewTxFromBytes, etc. — the error is checked for those but the surrounding flow still assumes valid binary data).

Optionally: add a sanity check at startup that scans blocks for any row with non-32-byte hash / non-4-byte n_bits / non-32-byte chain_work, and refuses to start with a clear error message pointing the operator at the corrupted rows.

Why severity is HIGH

The panic in blockchain service triggers docker-compose depends_on cascade restarts: blockassembly, legacy, kafka-shared all stop being able to complete startup because their dependency (blockchain) is unavailable. In the observed incident, blockchain restarted 104 times in ~15 minutes; the entire Teranode stack was unable to make any progress until the underlying data was repaired by hand.

A defensive nil check + descriptive error would convert this from "stack crash-loop" to "service exits with operator-actionable log message."

Workaround

  • Detect via octet_length queries on bytea columns
  • Repair via DELETE of affected rows + COPY FROM of the same data in proper CSV bytea format (\xHEX prefix)
  • Restart services in dependency order (blockchain → blockassembly → legacy)

Note on operator culpability vs Teranode robustness

The trigger was operator error in the restoration SQL. However:

  • Teranode's CLI does not provide a restore-blocks-from-csv or similar admin path; operators are forced to write raw SQL
  • The panic was indistinguishable from a real bug for the duration of the incident (operator had no signal that the corruption was self-inflicted vs an internal Teranode issue)
  • A defensive nil check costs O(1) and turns a multi-cycle crash-loop into an immediate operator-actionable error

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions