Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
31 commits
Select commit Hold shift + click to select a range
51f9b68
run parallel snapsync using docker-compose
rodrigo-o Dec 10, 2025
883701f
Try to run the 3 networks in parallel
rodrigo-o Dec 12, 2025
737a868
update the monitor while syncing also
rodrigo-o Dec 12, 2025
2d64e7c
Some changes related to timing
rodrigo-o Dec 12, 2025
7040831
renamed lighthouse to consensus
rodrigo-o Dec 12, 2025
2dff5be
simplify docker monitor and add a check for how much blocks do we pro…
rodrigo-o Dec 12, 2025
a9ef138
add more time to docker monitor
rodrigo-o Dec 12, 2025
62f8ae9
check containers already running to avoid false failure messages
rodrigo-o Dec 12, 2025
c8b5d5d
test reruns
rodrigo-o Dec 12, 2025
5ef8997
reduce block execution time check and add rerun
rodrigo-o Dec 13, 2025
df2e7c9
Add loop task to the makefile
rodrigo-o Dec 13, 2025
d978f22
Merge branch 'main' into parallel-snapsync-test
rodrigo-o Dec 18, 2025
fdc8a10
Merge branch 'main' into parallel-snapsync-test
rodrigo-o Dec 19, 2025
45fbd26
improved history logging and separated runs
rodrigo-o Dec 22, 2025
cccda90
renamed docker compose file and make targets
rodrigo-o Dec 23, 2025
67d6bcb
Merge branch 'main' into parallel-snapsync-test
rodrigo-o Dec 23, 2025
a6dbdad
make service configurable
rodrigo-o Dec 23, 2025
821216e
simplify network and port management
rodrigo-o Dec 23, 2025
556facf
fix the naming in the service derive
rodrigo-o Dec 23, 2025
e6bcd38
ensure cleanup is done for all networks
rodrigo-o Dec 23, 2025
9e93631
enhance slack notification
rodrigo-o Dec 23, 2025
379f44c
enhance logging and slack notification summary
rodrigo-o Dec 23, 2025
804819a
Merge branch 'main' into parallel-snapsync-test
rodrigo-o Dec 23, 2025
57c1b58
added logs path to the notification
rodrigo-o Dec 23, 2025
b967a77
Merge branch 'main' into parallel-snapsync-test
rodrigo-o Dec 23, 2025
e2da46f
Merge branch 'main' into parallel-snapsync-test
rodrigo-o Dec 23, 2025
82c84bc
Merge branch 'main' into parallel-snapsync-test
rodrigo-o Jan 7, 2026
185526c
Added docuemntation to the tooling/sync README
rodrigo-o Jan 7, 2026
5210d1f
fix some issues regarding getattr, formatting and unnecesary and miss…
rodrigo-o Jan 7, 2026
9501604
Fix a Makefile complex command and added comments in the compose
rodrigo-o Jan 7, 2026
48fd423
Added exception comment and fixed sys.exit
rodrigo-o Jan 7, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,7 @@ tooling/ef_tests/state/runner_v2/success_report.txt
tooling/reorgs/data

tooling/sync/logs/
tooling/sync/multisync_logs/

# Repos checked out by make target
/hive/
Expand Down
77 changes: 76 additions & 1 deletion tooling/sync/Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,9 @@
flamegraph-branch flamegraph-inner flamegraph-mainnet flamegraph-sepolia flamegraph-holesky \
flamegraph-hoodi start-lighthouse start-ethrex backup-db start-mainnet-metrics-docker \
start-sepolia-metrics-docker start-holesky-metrics-docker start-hoodi-metrics-docker \
start-metrics-docker tail-syncing-logs tail-metrics-logs copy_flamegraph import-with-metrics
start-metrics-docker tail-syncing-logs tail-metrics-logs copy_flamegraph import-with-metrics \
multisync-up multisync-down multisync-clean multisync-logs multisync-status \
multisync-restart multisync-monitor multisync-run multisync-loop

ETHREX_DIR ?= "../.."
EVM ?= levm
Expand Down Expand Up @@ -220,3 +222,76 @@ server-sync:
sleep 0.2

tmux new-window -t sync:2 -n ethrex "cd ../../metrics && docker stop metrics-ethereum-metrics-exporter-1 || true && docker compose -f docker-compose-metrics.yaml -f docker-compose-metrics-l1.overrides.yaml up -d && cd .. && ulimit -n 1000000 && rm -rf ~/.local/share/ethrex && RUST_LOG=info,ethrex_p2p::sync=debug $(if $(DEBUG_ASSERT),RUSTFLAGS='-C debug-assertions=yes') $(if $(HEALING),SKIP_START_SNAP_SYNC=1) cargo run --release --bin ethrex --features rocksdb -- --http.addr 0.0.0.0 --metrics --metrics.port 3701 --network $(SERVER_SYNC_NETWORK) $(if $(MEMORY),--datadir memory) --authrpc.jwtsecret ~/secrets/jwt.hex $(if $(or $(FULL_SYNC),$(HEALING)),--syncmode full) 2>&1 | tee $(LOGS_FILE)"

# ==============================================================================
# Docker Compose Multi-Network Snapsync
# ==============================================================================

MULTISYNC_COMPOSE = docker compose -f docker-compose.multisync.yaml
MULTISYNC_NETWORKS ?= hoodi,sepolia,mainnet
comma := ,
MULTISYNC_NETWORK_LIST := $(subst $(comma), ,$(MULTISYNC_NETWORKS))
MULTISYNC_SERVICES := $(foreach n,$(MULTISYNC_NETWORK_LIST),setup-jwt-$(n) ethrex-$(n) consensus-$(n))

multisync-up: ## Start all networks specified in MULTISYNC_NETWORKS via Docker Compose.
$(MULTISYNC_COMPOSE) up -d $(MULTISYNC_SERVICES)

multisync-down: ## Stop and remove all snapsync containers.
$(MULTISYNC_COMPOSE) down

multisync-clean: ## Stop, remove containers AND volumes (full reset).
$(MULTISYNC_COMPOSE) down -v

multisync-logs: ## Tail logs from all networks.
$(MULTISYNC_COMPOSE) logs -f

multisync-logs-%: ## Tail logs for a specific network (e.g., multisync-logs-hoodi).
$(MULTISYNC_COMPOSE) logs -f ethrex-$* consensus-$*

multisync-logs-ethrex-%: ## Tail only ethrex logs for a network (e.g., multisync-logs-ethrex-hoodi).
$(MULTISYNC_COMPOSE) logs -f ethrex-$*

multisync-logs-consensus-%: ## Tail only consensus logs for a network (e.g., multisync-logs-consensus-hoodi).
$(MULTISYNC_COMPOSE) logs -f consensus-$*

multisync-restart: ## Restart the cycle (clean volumes + start fresh).
$(MULTISYNC_COMPOSE) down -v
$(MULTISYNC_COMPOSE) up -d $(MULTISYNC_SERVICES)

multisync-monitor: ## Monitor all networks (one-shot, exits on completion).
python3 docker_monitor.py --networks $(MULTISYNC_NETWORKS) --exit-on-success

multisync-run: ## Full run: start + monitor (one-shot, exits on completion).
$(MULTISYNC_COMPOSE) up -d $(MULTISYNC_SERVICES)
@echo "Waiting 10s for containers to start..."
@sleep 10
python3 docker_monitor.py --networks $(MULTISYNC_NETWORKS) --exit-on-success

multisync-loop: ## Continuous loop: sync all networks, restart on success, repeat forever.
$(MULTISYNC_COMPOSE) up -d $(MULTISYNC_SERVICES)
@echo "Waiting 10s for containers to start..."
@sleep 10
python3 docker_monitor.py --networks $(MULTISYNC_NETWORKS) --compose-file docker-compose.multisync.yaml --compose-dir $(CURDIR)

multisync-history: ## View the run history log.
@if [ -f multisync_logs/run_history.log ]; then \
cat multisync_logs/run_history.log; \
else \
echo "No run history found. Run 'make multisync-loop' first."; \
fi

multisync-list-logs: ## List all saved run logs.
@if [ -d multisync_logs ]; then \
echo "=== Saved Run Logs ===" && \
ls -la multisync_logs/ && \
echo "" && \
for dir in multisync_logs/run_*/; do \
if [ -d "$$dir" ]; then \
echo "$$dir:"; \
ls "$$dir"; \
echo ""; \
fi; \
done; \
else \
echo "No logs directory found."; \
fi
123 changes: 123 additions & 0 deletions tooling/sync/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -61,3 +61,126 @@ It's advisable to only run flamegraphs on blocks that have already been synced,
- `make copy-flamegraph` can be used to quickly copy the flamegraph generated by the flamegraph commands from the `ethrex` repo folder to the `tooling/sync/flamegraphs` folder so it isn't overwritten by future flamegraph runs. `GRAPHNAME` can be provided to give the file a custom name.

- `make import-with-metrics` can be used to import blocks from an RLP file with metrics enabled, specially useful for a block processing profile. The path to the rlp file can be passed with the `RLP_FILE` environment variable, while the network can be provided with the `NETWORK` variable.

## Multi-Network Parallel Snapsync

This feature allows running multiple Ethrex nodes in parallel (hoodi, sepolia, mainnet) via Docker Compose, with automated monitoring, Slack notifications, and a history log of runs.

### Overview

The parallel snapsync system:
- Spawns multiple networks simultaneously via Docker Compose
- Monitors snapsync progress with a 4-hour timeout
- Verifies block processing for 22 minutes after sync completion
- Sends Slack notifications on success/failure
- Maintains a history log of all runs
- On success: restarts containers and begins a new sync cycle
- On failure: keeps containers running for debugging

### Requirements

- Docker and Docker Compose
- Python 3 with the `requests` library (`pip install requests`)
- (Optional) Slack webhook URLs for notifications

### Quick Start

```bash
# Start a continuous monitoring loop (recommended for servers)
make multisync-loop

# Or run a single sync cycle
make multisync-run
```

### Docker Compose Setup

The `docker-compose.multisync.yaml` file defines services for each network with isolated volumes. Each network uses Lighthouse as the consensus client with checkpoint sync.

Host port mapping:
- **hoodi**: `localhost:8545`
- **sepolia**: `localhost:8546`
- **mainnet**: `localhost:8547`
- **hoodi-2**: `localhost:8548` (for additional testing)

### Environment Variables

Create a `.env` file in `tooling/sync/` with:

```bash
# Slack notifications (optional)
SLACK_WEBHOOK_URL_SUCCESS=https://hooks.slack.com/services/...
SLACK_WEBHOOK_URL_FAILED=https://hooks.slack.com/services/...
```

The `MULTISYNC_NETWORKS` variable controls which networks to sync (default: `hoodi,sepolia,mainnet`):

```bash
# Sync only hoodi and sepolia
make multisync-loop MULTISYNC_NETWORKS=hoodi,sepolia
```

### Monitoring Behavior

The `docker_monitor.py` script manages the sync lifecycle:

1. **Waiting**: Node container starting up
2. **Syncing**: Snapsync in progress (4-hour timeout)
3. **Block Processing**: Sync complete, verifying block processing (22 minutes)
4. **Success**: Network synced and processing blocks
5. **Failed**: Timeout, stall, or error detected

The monitor checks for:
- Sync timeout (default 4 hours)
- Block processing stall (10 minutes without new blocks)
- Node unresponsiveness

### Logs and History

Logs are saved to `tooling/sync/multisync_logs/`:

```
multisync_logs/
├── run_history.log # Append-only history of all runs
└── run_YYYYMMDD_HHMMSS/ # Per-run folder
├── summary.txt # Run summary
├── ethrex-hoodi.log # Ethrex logs per network
├── consensus-hoodi.log # Lighthouse logs per network
└── ...
```

### Commands

**Starting and Stopping:**

- `make multisync-up` starts all networks via Docker Compose.
- `make multisync-down` stops and removes containers (preserves volumes).
- `make multisync-clean` stops containers and removes volumes (full reset).
- `make multisync-restart` restarts the cycle (clean volumes + start fresh).

**Monitoring:**

- `make multisync-loop` runs continuous sync cycles (recommended for servers). On success, restarts and syncs again. On failure, stops for debugging.
- `make multisync-run` runs a single sync cycle and exits on completion.
- `make multisync-monitor` monitors already-running containers (one-shot).

**Logs:**

- `make multisync-logs` tails logs from all networks.
- `make multisync-logs-hoodi` tails logs for a specific network.
- `make multisync-logs-ethrex-hoodi` tails only ethrex logs for a network.
- `make multisync-logs-consensus-hoodi` tails only consensus logs for a network.
- `make multisync-history` views the run history log.
- `make multisync-list-logs` lists all saved run logs.

### Slack Notifications

When configured, notifications are sent:
- On **success**: All networks synced and processing blocks
- On **failure**: Any network failed (timeout, stall, or error)

Notifications include:
- Run ID and count
- Host, branch, and commit info
- Per-network status with sync time and blocks processed
- Link to the commit on GitHub
Loading