High-performance P2P networking library for blockchain applications in Go.
go-p2p is a production-grade P2P networking library designed for blockchain and distributed systems. It features lock-free data structures, optimized I/O patterns, and extensive benchmarking to ensure maximum performance under high concurrency.
- Lock-free Data Structures: Benchmarked implementations with detailed performance analysis
- Optimized Peer Management:
sync.Mapbased concurrent peer tracking - Efficient Message Broadcasting: Gossip protocol with duplicate filtering
- Zero-allocation Networking:
sync.Poolbased buffer reuse - Pluggable Discovery: Seed nodes, gossip-based peer exchange
┌─────────────────────────────────────────────────────────────┐
│ Node │
├─────────────────────────────────────────────────────────────┤
│ ┌─────────────┐ ┌─────────────┐ ┌─────────────────────┐ │
│ │ Server │ │ Dialer │ │ PeerManager │ │
│ │ (Accept) │ │ (Connect) │ │ (sync.Map) │ │
│ └──────┬──────┘ └──────┬──────┘ └──────────┬──────────┘ │
│ │ │ │ │
│ └────────┬───────┘ │ │
│ ▼ │ │
│ ┌───────────────────────────────┐ │ │
│ │ Peer Connections │◄───────────┘ │
│ │ ┌──────┐ ┌──────┐ ┌──────┐ │ │
│ │ │Peer A│ │Peer B│ │Peer C│ │ │
│ │ └──┬───┘ └──┬───┘ └──┬───┘ │ │
│ └─────┼────────┼────────┼───────┘ │
│ │ │ │ │
│ ▼ ▼ ▼ │
│ ┌───────────────────────────────┐ │
│ │ BroadcastManager │ │
│ │ (sync.Map cache) │ │
│ └───────────────┬───────────────┘ │
│ │ │
│ ▼ │
│ ┌───────────────────────────────┐ │
│ │ Mempool │ │
│ │ (Mutex + Heap) │ │
│ └───────────────────────────────┘ │
└─────────────────────────────────────────────────────────────┘
Comprehensive benchmarks comparing lock-free vs mutex-based implementations.
| Spec | Value |
|---|---|
| OS | Windows 10 |
| CPU | AMD Ryzen 5 3500 (6-Core) |
| Go | 1.21+ |
| Command | go test -bench=. -benchmem -count=5 |
┌────────────────────┬─────────────┬─────────────────────────────────┐
│ Data Structure │ Winner │ Notes │
├────────────────────┼─────────────┼─────────────────────────────────┤
│ Queue │ Mutex │ Go mutex highly optimized │
│ HashMap (read) │ sync.Map │ No contention between readers │
│ HashMap (write) │ Mutex │ CAS retry overhead │
│ Priority Queue │ Mutex │ Heap more cache-friendly │
└────────────────────┴─────────────┴─────────────────────────────────┘
Operation Mutex Lock-free
─────────────────────────────────────
Enqueue 85 ns 80 ns
Dequeue 15 ns 18 ns
Goroutines Mutex Lock-free Diff
───────────────────────────────────────────
1 88 ns 144 ns +64%
4 83 ns 149 ns +79%
8 83 ns 149 ns +79%
16 85 ns 143 ns +68%
32 85 ns 140 ns +65%
64 82 ns 146 ns +78%
Result: Mutex wins. Lock-free queue allocates per-enqueue (32 B/op).
Goroutines Mutex Lock-free Diff
───────────────────────────────────────────
1 377 ns 194 ns -49%
4 375 ns 189 ns -50%
8 377 ns 188 ns -50%
16 371 ns 184 ns -50%
32 377 ns 197 ns -48%
Result: Lock-free (sync.Map) wins by 2x for read-heavy workloads.
Goroutines Mutex(Heap) Lock-free(SkipList)
─────────────────────────────────────────────
1 166 ns ~200 ns
4 184 ns ~200 ns
8 182 ns ~200 ns
16 184 ns ~200 ns
Result: Mutex+Heap wins. Skip list has high memory overhead (311 B/op per node).
Based on benchmark results, we selected optimal implementations for each component:
| Component | Implementation | Rationale |
|---|---|---|
| PeerManager | sync.Map |
90%+ reads, lock-free reads |
| BroadcastCache | sync.Map |
Duplicate check is read-heavy |
| Mempool | Mutex + Heap |
Heap is cache-friendly, simple |
| MessageQueue | Channel | Go channels well-optimized |
// sync.Map internal structure
type Map struct {
mu Mutex // protects dirty map
read atomic.Value // read-only map (lock-free!)
dirty map[any]*entry // write map
misses int
}
// Load operation (read)
// If key exists in read map → atomic load, NO LOCK
// Only falls back to dirty map if not foundRWMutex Problem: Even RLock() requires atomic increment of readerCount, causing cache line contention across cores.
sync.Map Solution: Reads from read map use only atomic.Load - no shared counter, no contention.
The message broadcasting hot path was optimized for maximum throughput. All benchmarks run on AMD Ryzen 5 3500 (6-Core).
Message deduplication requires hashing every incoming message. We replaced cryptographic SHA256 with non-cryptographic xxhash.
| Implementation | ns/op | B/op | allocs/op | Speedup |
|---|---|---|---|---|
| SHA256 + hex.EncodeToString | 398.9 | 160 | 3 | baseline |
| xxhash + sync.Pool | 41.24 | 0 | 0 | 9.7x |
| xxhash.Sum64 (direct) | 23.22 | 0 | 0 | 17.2x |
Why 0 B/op?
sync.Poolreusesxxhash.Digestinstances across calls- After pool warmup, no new allocations occur in steady state
- Direct
Sum64benchmark pre-allocates buffer before timing
// Before: 426.8 ns/op, 160 B/op, 3 allocs/op
hasher := sha256.New()
hasher.Write([]byte{byte(msg.Type)})
hasher.Write(msg.Payload)
hash := hex.EncodeToString(hasher.Sum(nil)) // string key
// After: 45.0 ns/op, 0 B/op, 0 allocs/op
h := digestPool.Get().(*xxhash.Digest)
h.Reset()
h.Write([]byte{byte(msg.Type)})
h.Write(msg.Payload)
hash := h.Sum64() // uint64 key
digestPool.Put(h)The duplicate message cache is read-heavy (90%+ lookups are cache hits).
| Scenario | sync.Map | RWMutex | Speedup |
|---|---|---|---|
| ReadOnly (100% read) | 7.74 ns/op | 27.79 ns/op | 3.6x |
| ReadHeavy (90% read, 10% write) | 66.66 ns/op | 109.9 ns/op | 1.6x |
| WriteHeavy (50% read, 50% write) | 279.1 ns/op | 150.2 ns/op | 0.5x (RWMutex wins) |
Key insight: P2P networks see the same message multiple times from different peers. Most MarkSeen() calls hit existing entries → read-heavy workload → sync.Map wins.
Gossip target selection requires shuffling peer lists.
| Implementation | ns/op | Speedup |
|---|---|---|
rand.Shuffle() (global) |
84.71 | baseline |
rng.Shuffle() (local) |
55.78 | 1.5x |
// Before: global RNG with internal mutex
rand.Shuffle(len(candidates), func(i, j int) { ... })
// After: goroutine-local RNG (race-free)
rng := rand.New(rand.NewSource(time.Now().UnixNano()))
rng.Shuffle(len(candidates), func(i, j int) { ... })Message receive path: Receive → Hash → MarkSeen → Broadcast
| Component | Before | After | Improvement |
|---|---|---|---|
| MessageHash | 398.9 ns | 41.24 ns | 9.7x faster |
| MarkSeen (cache hit) | 27.79 ns | 7.74 ns | 3.6x faster |
| FullFlow (single) | 418.0 ns | 167.6 ns | 2.5x faster |
| FullFlow (parallel) | - | 51.89 ns | 8.0x faster |
| Memory allocation | 160 B/msg | 32 B/msg | 80% 감소 |
Total hot path: ~8x faster in parallel workloads
# All broadcast benchmarks
go test -bench=. -benchmem ./pkg/node/... -run=^$
# Hash comparison only
go test -bench=BenchmarkHash -benchmem ./pkg/node/...
# SeenCache comparison only
go test -bench=BenchmarkSeenCache -benchmem ./pkg/node/...
# Full flow comparison
go test -bench=BenchmarkFullFlow -benchmem ./pkg/node/...$ go test ./...
ok github.com/go-p2p-network/go-p2p/pkg/mempool 0.222s
ok github.com/go-p2p-network/go-p2p/pkg/node 6.736s| Package | Tests | Coverage |
|---|---|---|
pkg/mempool |
14 tests | Transaction pool, priority, validation |
pkg/node |
11 tests | P2P connections, broadcast, discovery |
go get github.com/go-p2p-network/go-p2p# Start node on port 3000
go run cmd/node/main.go --port 3000
# Start second node, connect to first
go run cmd/node/main.go --port 3001 --seed localhost:3000
# Start third node
go run cmd/node/main.go --port 3002 --seed localhost:3000package main
import (
"github.com/go-p2p-network/go-p2p/pkg/node"
"github.com/go-p2p-network/go-p2p/pkg/protocol"
)
func main() {
cfg := node.Config{
ListenAddr: ":3000",
MaxPeers: 50,
Seeds: []string{"seed1.example.com:3000"},
}
n := node.New(cfg)
// Register message handler
n.OnMessage(protocol.MsgTx, func(p *peer.Peer, msg *protocol.Message) {
// Handle transaction
})
n.Start()
}go test ./...go test -v ./...# Node tests (P2P integration)
go test -v ./pkg/node/...
# Mempool tests
go test -v ./pkg/mempool/...go test -race ./...go test -bench=. -benchmem ./pkg/ds/...# Queue only
go test -bench=BenchmarkQueue -benchmem ./pkg/ds/...
# HashMap only
go test -bench=BenchmarkHashMap -benchmem ./pkg/ds/...
# Priority Queue only
go test -bench=BenchmarkPriorityQueue -benchmem ./pkg/ds/...go test -bench=. -benchmem -count=5 ./pkg/ds/...go test -bench=BenchmarkQueue -cpuprofile=cpu.prof ./pkg/ds/...
go tool pprof cpu.prof# Install benchstat
go install golang.org/x/perf/cmd/benchstat@latest
# Run and compare
go test -bench=. -count=10 ./pkg/ds/sync/ > mutex.txt
go test -bench=. -count=10 ./pkg/ds/lockfree/ > lockfree.txt
benchstat mutex.txt lockfree.txtgo-p2p/
├── cmd/
│ └── node/
│ └── main.go # Node entry point
│
├── pkg/
│ ├── node/
│ │ ├── node.go # Core node logic
│ │ ├── broadcast.go # Gossip broadcast (sync.Map)
│ │ ├── config.go # Configuration
│ │ └── node_test.go # P2P integration tests
│ │
│ ├── peer/
│ │ ├── peer.go # Peer connection
│ │ └── manager.go # Peer management (sync.Map)
│ │
│ ├── transport/
│ │ ├── tcp.go # TCP transport
│ │ ├── connection.go # Connection handling
│ │ └── buffer_pool.go # sync.Pool buffers
│ │
│ ├── protocol/
│ │ ├── message.go # Message types
│ │ ├── codec.go # Encoding/decoding
│ │ └── handshake.go # Handshake protocol
│ │
│ ├── discovery/
│ │ ├── seed.go # Seed node discovery
│ │ └── gossip.go # Gossip discovery
│ │
│ ├── mempool/
│ │ ├── mempool.go # Transaction pool (Mutex+Heap)
│ │ ├── tx.go # Transaction structure
│ │ └── mempool_test.go # Mempool tests
│ │
│ └── ds/ # Data structures
│ ├── lockfree/
│ │ ├── queue.go # Lock-free MPSC queue
│ │ ├── hashmap.go # Lock-free hashmap
│ │ └── priority_queue.go # Lock-free skip list
│ ├── sync/
│ │ ├── queue.go # Mutex-based queue
│ │ ├── hashmap.go # Mutex-based hashmap
│ │ └── priority_queue.go # Mutex-based heap
│ └── benchmark_test.go # Comparison benchmarks
│
└── docs/
└── benchmarks.md # Detailed benchmark analysis
┌──────────────────────────────────────┐
│ Magic (4 bytes) │ 0x50325031 │
├───────────────────┼──────────────────┤
│ Type (1 byte) │ Message type │
├───────────────────┼──────────────────┤
│ Length (4 bytes) │ Payload length │
├───────────────────┼──────────────────┤
│ Checksum (4 B) │ CRC32 │
├───────────────────┴──────────────────┤
│ Payload (variable) │
└──────────────────────────────────────┘
| Type | Value | Description |
|---|---|---|
| Handshake | 0x00 | Connection initialization |
| Ping | 0x01 | Keepalive check |
| Pong | 0x02 | Ping response |
| GetPeers | 0x10 | Request peer list |
| Peers | 0x11 | Peer list response |
| Tx | 0x20 | Transaction broadcast |
| Block | 0x21 | Block broadcast |

