Skip to content

Lock-free 자료구조로 최적화한 고성능 P2P 네트워킹 라이브러리 High-performance P2P networking library with lock-free data structures in Go

Notifications You must be signed in to change notification settings

ahwlsqja/LockFree-p2p-go

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

27 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

go-p2p

High-performance P2P networking library for blockchain applications in Go.

Go Version License

Overview

go-p2p is a production-grade P2P networking library designed for blockchain and distributed systems. It features lock-free data structures, optimized I/O patterns, and extensive benchmarking to ensure maximum performance under high concurrency.

Key Features

  • Lock-free Data Structures: Benchmarked implementations with detailed performance analysis
  • Optimized Peer Management: sync.Map based concurrent peer tracking
  • Efficient Message Broadcasting: Gossip protocol with duplicate filtering
  • Zero-allocation Networking: sync.Pool based buffer reuse
  • Pluggable Discovery: Seed nodes, gossip-based peer exchange

Architecture

┌─────────────────────────────────────────────────────────────┐
│                           Node                               │
├─────────────────────────────────────────────────────────────┤
│  ┌─────────────┐  ┌─────────────┐  ┌─────────────────────┐  │
│  │   Server    │  │   Dialer    │  │    PeerManager      │  │
│  │  (Accept)   │  │  (Connect)  │  │    (sync.Map)       │  │
│  └──────┬──────┘  └──────┬──────┘  └──────────┬──────────┘  │
│         │                │                     │             │
│         └────────┬───────┘                     │             │
│                  ▼                             │             │
│  ┌───────────────────────────────┐            │             │
│  │        Peer Connections        │◄───────────┘             │
│  │  ┌──────┐ ┌──────┐ ┌──────┐   │                          │
│  │  │Peer A│ │Peer B│ │Peer C│   │                          │
│  │  └──┬───┘ └──┬───┘ └──┬───┘   │                          │
│  └─────┼────────┼────────┼───────┘                          │
│        │        │        │                                   │
│        ▼        ▼        ▼                                   │
│  ┌───────────────────────────────┐                          │
│  │     BroadcastManager          │                          │
│  │      (sync.Map cache)         │                          │
│  └───────────────┬───────────────┘                          │
│                  │                                           │
│                  ▼                                           │
│  ┌───────────────────────────────┐                          │
│  │          Mempool              │                          │
│  │      (Mutex + Heap)           │                          │
│  └───────────────────────────────┘                          │
└─────────────────────────────────────────────────────────────┘

Benchmark Results

Comprehensive benchmarks comparing lock-free vs mutex-based implementations.

Test Environment

Spec Value
OS Windows 10
CPU AMD Ryzen 5 3500 (6-Core)
Go 1.21+
Command go test -bench=. -benchmem -count=5

Summary

┌────────────────────┬─────────────┬─────────────────────────────────┐
│ Data Structure     │ Winner      │ Notes                           │
├────────────────────┼─────────────┼─────────────────────────────────┤
│ Queue              │ Mutex       │ Go mutex highly optimized       │
│ HashMap (read)     │ sync.Map    │ No contention between readers   │
│ HashMap (write)    │ Mutex       │ CAS retry overhead              │
│ Priority Queue     │ Mutex       │ Heap more cache-friendly        │
└────────────────────┴─────────────┴─────────────────────────────────┘

Detailed Benchmarks

Queue (Single Thread)

Operation   Mutex       Lock-free
─────────────────────────────────────
Enqueue     85 ns       80 ns
Dequeue     15 ns       18 ns

Queue (Concurrent - Mixed Operations)

Goroutines  Mutex       Lock-free   Diff
───────────────────────────────────────────
1           88 ns       144 ns      +64%
4           83 ns       149 ns      +79%
8           83 ns       149 ns      +79%
16          85 ns       143 ns      +68%
32          85 ns       140 ns      +65%
64          82 ns       146 ns      +78%

Result: Mutex wins. Lock-free queue allocates per-enqueue (32 B/op).

HashMap (Concurrent - 75% Read, 25% Write)

Goroutines  Mutex       Lock-free   Diff
───────────────────────────────────────────
1           377 ns      194 ns      -49%
4           375 ns      189 ns      -50%
8           377 ns      188 ns      -50%
16          371 ns      184 ns      -50%
32          377 ns      197 ns      -48%

Result: Lock-free (sync.Map) wins by 2x for read-heavy workloads.

Priority Queue (Concurrent)

Goroutines  Mutex(Heap) Lock-free(SkipList)
─────────────────────────────────────────────
1           166 ns      ~200 ns
4           184 ns      ~200 ns
8           182 ns      ~200 ns
16          184 ns      ~200 ns

Result: Mutex+Heap wins. Skip list has high memory overhead (311 B/op per node).

Benchmark Screenshots

BroadcastManager Optimizations

Benchmark Results


Implementation Decisions

Based on benchmark results, we selected optimal implementations for each component:

Component Implementation Rationale
PeerManager sync.Map 90%+ reads, lock-free reads
BroadcastCache sync.Map Duplicate check is read-heavy
Mempool Mutex + Heap Heap is cache-friendly, simple
MessageQueue Channel Go channels well-optimized

Why sync.Map for Read-Heavy Workloads?

// sync.Map internal structure
type Map struct {
    mu     Mutex          // protects dirty map
    read   atomic.Value   // read-only map (lock-free!)
    dirty  map[any]*entry // write map
    misses int
}

// Load operation (read)
// If key exists in read map → atomic load, NO LOCK
// Only falls back to dirty map if not found

RWMutex Problem: Even RLock() requires atomic increment of readerCount, causing cache line contention across cores.

sync.Map Solution: Reads from read map use only atomic.Load - no shared counter, no contention.


BroadcastManager Optimizations

The message broadcasting hot path was optimized for maximum throughput. All benchmarks run on AMD Ryzen 5 3500 (6-Core).

Hash Function: SHA256 → xxhash

Message deduplication requires hashing every incoming message. We replaced cryptographic SHA256 with non-cryptographic xxhash.

Implementation ns/op B/op allocs/op Speedup
SHA256 + hex.EncodeToString 398.9 160 3 baseline
xxhash + sync.Pool 41.24 0 0 9.7x
xxhash.Sum64 (direct) 23.22 0 0 17.2x

Why 0 B/op?

  • sync.Pool reuses xxhash.Digest instances across calls
  • After pool warmup, no new allocations occur in steady state
  • Direct Sum64 benchmark pre-allocates buffer before timing
// Before: 426.8 ns/op, 160 B/op, 3 allocs/op
hasher := sha256.New()
hasher.Write([]byte{byte(msg.Type)})
hasher.Write(msg.Payload)
hash := hex.EncodeToString(hasher.Sum(nil))  // string key

// After: 45.0 ns/op, 0 B/op, 0 allocs/op
h := digestPool.Get().(*xxhash.Digest)
h.Reset()
h.Write([]byte{byte(msg.Type)})
h.Write(msg.Payload)
hash := h.Sum64()  // uint64 key
digestPool.Put(h)

SeenCache: RWMutex → sync.Map

The duplicate message cache is read-heavy (90%+ lookups are cache hits).

Scenario sync.Map RWMutex Speedup
ReadOnly (100% read) 7.74 ns/op 27.79 ns/op 3.6x
ReadHeavy (90% read, 10% write) 66.66 ns/op 109.9 ns/op 1.6x
WriteHeavy (50% read, 50% write) 279.1 ns/op 150.2 ns/op 0.5x (RWMutex wins)

Key insight: P2P networks see the same message multiple times from different peers. Most MarkSeen() calls hit existing entries → read-heavy workload → sync.Map wins.

RNG: Global → Goroutine-Local

Gossip target selection requires shuffling peer lists.

Implementation ns/op Speedup
rand.Shuffle() (global) 84.71 baseline
rng.Shuffle() (local) 55.78 1.5x
// Before: global RNG with internal mutex
rand.Shuffle(len(candidates), func(i, j int) { ... })

// After: goroutine-local RNG (race-free)
rng := rand.New(rand.NewSource(time.Now().UnixNano()))
rng.Shuffle(len(candidates), func(i, j int) { ... })

Full Hot Path Improvement

Message receive path: Receive → Hash → MarkSeen → Broadcast

Component Before After Improvement
MessageHash 398.9 ns 41.24 ns 9.7x faster
MarkSeen (cache hit) 27.79 ns 7.74 ns 3.6x faster
FullFlow (single) 418.0 ns 167.6 ns 2.5x faster
FullFlow (parallel) - 51.89 ns 8.0x faster
Memory allocation 160 B/msg 32 B/msg 80% 감소

Total hot path: ~8x faster in parallel workloads

Run BroadcastManager Benchmarks

# All broadcast benchmarks
go test -bench=. -benchmem ./pkg/node/... -run=^$

# Hash comparison only
go test -bench=BenchmarkHash -benchmem ./pkg/node/...

# SeenCache comparison only
go test -bench=BenchmarkSeenCache -benchmem ./pkg/node/...

# Full flow comparison
go test -bench=BenchmarkFullFlow -benchmem ./pkg/node/...

Test Results

Unit Tests

$ go test ./...

ok  	github.com/go-p2p-network/go-p2p/pkg/mempool    0.222s
ok  	github.com/go-p2p-network/go-p2p/pkg/node       6.736s

Test Coverage

Package Tests Coverage
pkg/mempool 14 tests Transaction pool, priority, validation
pkg/node 11 tests P2P connections, broadcast, discovery

Test Screenshots

Test Results


Quick Start

Installation

go get github.com/go-p2p-network/go-p2p

Run a Node

# Start node on port 3000
go run cmd/node/main.go --port 3000

# Start second node, connect to first
go run cmd/node/main.go --port 3001 --seed localhost:3000

# Start third node
go run cmd/node/main.go --port 3002 --seed localhost:3000

Use as Library

package main

import (
    "github.com/go-p2p-network/go-p2p/pkg/node"
    "github.com/go-p2p-network/go-p2p/pkg/protocol"
)

func main() {
    cfg := node.Config{
        ListenAddr: ":3000",
        MaxPeers:   50,
        Seeds:      []string{"seed1.example.com:3000"},
    }

    n := node.New(cfg)

    // Register message handler
    n.OnMessage(protocol.MsgTx, func(p *peer.Peer, msg *protocol.Message) {
        // Handle transaction
    })

    n.Start()
}

Running Tests

All Tests

go test ./...

Verbose Output

go test -v ./...

Specific Package

# Node tests (P2P integration)
go test -v ./pkg/node/...

# Mempool tests
go test -v ./pkg/mempool/...

With Race Detector

go test -race ./...

Running Benchmarks

All Benchmarks

go test -bench=. -benchmem ./pkg/ds/...

Specific Benchmarks

# Queue only
go test -bench=BenchmarkQueue -benchmem ./pkg/ds/...

# HashMap only
go test -bench=BenchmarkHashMap -benchmem ./pkg/ds/...

# Priority Queue only
go test -bench=BenchmarkPriorityQueue -benchmem ./pkg/ds/...

With Multiple Iterations (Stable Results)

go test -bench=. -benchmem -count=5 ./pkg/ds/...

CPU Profiling

go test -bench=BenchmarkQueue -cpuprofile=cpu.prof ./pkg/ds/...
go tool pprof cpu.prof

Compare Results

# Install benchstat
go install golang.org/x/perf/cmd/benchstat@latest

# Run and compare
go test -bench=. -count=10 ./pkg/ds/sync/ > mutex.txt
go test -bench=. -count=10 ./pkg/ds/lockfree/ > lockfree.txt
benchstat mutex.txt lockfree.txt

Project Structure

go-p2p/
├── cmd/
│   └── node/
│       └── main.go              # Node entry point
│
├── pkg/
│   ├── node/
│   │   ├── node.go              # Core node logic
│   │   ├── broadcast.go         # Gossip broadcast (sync.Map)
│   │   ├── config.go            # Configuration
│   │   └── node_test.go         # P2P integration tests
│   │
│   ├── peer/
│   │   ├── peer.go              # Peer connection
│   │   └── manager.go           # Peer management (sync.Map)
│   │
│   ├── transport/
│   │   ├── tcp.go               # TCP transport
│   │   ├── connection.go        # Connection handling
│   │   └── buffer_pool.go       # sync.Pool buffers
│   │
│   ├── protocol/
│   │   ├── message.go           # Message types
│   │   ├── codec.go             # Encoding/decoding
│   │   └── handshake.go         # Handshake protocol
│   │
│   ├── discovery/
│   │   ├── seed.go              # Seed node discovery
│   │   └── gossip.go            # Gossip discovery
│   │
│   ├── mempool/
│   │   ├── mempool.go           # Transaction pool (Mutex+Heap)
│   │   ├── tx.go                # Transaction structure
│   │   └── mempool_test.go      # Mempool tests
│   │
│   └── ds/                      # Data structures
│       ├── lockfree/
│       │   ├── queue.go         # Lock-free MPSC queue
│       │   ├── hashmap.go       # Lock-free hashmap
│       │   └── priority_queue.go # Lock-free skip list
│       ├── sync/
│       │   ├── queue.go         # Mutex-based queue
│       │   ├── hashmap.go       # Mutex-based hashmap
│       │   └── priority_queue.go # Mutex-based heap
│       └── benchmark_test.go    # Comparison benchmarks
│
└── docs/
    └── benchmarks.md            # Detailed benchmark analysis

Protocol

Message Format

┌──────────────────────────────────────┐
│  Magic (4 bytes)  │  0x50325031      │
├───────────────────┼──────────────────┤
│  Type (1 byte)    │  Message type    │
├───────────────────┼──────────────────┤
│  Length (4 bytes) │  Payload length  │
├───────────────────┼──────────────────┤
│  Checksum (4 B)   │  CRC32           │
├───────────────────┴──────────────────┤
│  Payload (variable)                  │
└──────────────────────────────────────┘

Message Types

Type Value Description
Handshake 0x00 Connection initialization
Ping 0x01 Keepalive check
Pong 0x02 Ping response
GetPeers 0x10 Request peer list
Peers 0x11 Peer list response
Tx 0x20 Transaction broadcast
Block 0x21 Block broadcast

About

Lock-free 자료구조로 최적화한 고성능 P2P 네트워킹 라이브러리 High-performance P2P networking library with lock-free data structures in Go

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages