Skip to content

Commit 931b4c6

Browse files
committed
add some docs on troubleshooting batch seal perf
1 parent 6daa769 commit 931b4c6

File tree

1 file changed

+66
-2
lines changed

1 file changed

+66
-2
lines changed

documentation/en/supraseal.md

+66-2
Original file line numberDiff line numberDiff line change
@@ -23,7 +23,10 @@ SupraSeal is an optimized batch sealing implementation for Filecoin that allows
2323
- NVMe drives with high IOPS (10-20M total IOPS recommended)
2424
- GPU for PC2 phase (NVIDIA RTX 3090 or better recommended)
2525
- 1GB hugepages configured (minimum 36 pages)
26-
- Ubuntu 22.04 or compatible Linux distribution
26+
- Ubuntu 22.04 or compatible Linux distribution (gcc-11 required, doesn't need to be system-wide)
27+
- At least 256GB RAM, ALL MEMORY CHANNELS POPULATED
28+
- Without **all** memory channels populated sealing **performance will suffer drastically**
29+
- NUMA-Per-Socket (NPS) set to 1
2730

2831
## Setup
2932

@@ -68,6 +71,13 @@ LayerNVMEDevices = [
6871
# Add PCIe addresses for all NVMe devices to use
6972
]
7073

74+
# Set to your desiced batch size (what the batch-cpu command says your CPU supports AND what you have nvme space for)
75+
BatchSealBatchSize = 32
76+
77+
# pipelines can be either 1 or 2; 2 pipelines double storage requirements but in correctly balanced systems makes
78+
# layer hashing run 100% of the time, nearly doubling throughput
79+
BatchSealPipelines = 2
80+
7181
# Set to true for Zen2 or older CPUs for compatibility
7282
SingleHasherPerThread = false
7383
```
@@ -139,10 +149,64 @@ curio seal start --now --cc --count 32 --actor f01234 --layers cluster --duratio
139149
* Monitor hasher core utilisation
140150

141151
## Troubleshooting
152+
153+
### Node doesn't start / isn't visible in the UI
142154
* Ensure hugepages are configured correctly
143155
* Check NVMe device IOPS and capacity
144156
* If spdk setup fails, try to `wipefs -a` the NVMe devices (this will wipe partitions from the devices, be careful!)
145-
* Benchmark iops with:
157+
158+
### Performance issues
159+
160+
You can monitor performance by looking at "hasher" core utilisation in e.g. `htop`.
161+
162+
To identify hasher cores, call `curio calc supraseal-config --batch-size 128` (with the correct batch size), and look for `coordinators`
163+
164+
```go
165+
topology:
166+
...
167+
{
168+
pc1: {
169+
writer = 1;
170+
...
171+
hashers_per_core = 2;
172+
173+
sector_configs: (
174+
{
175+
sectors = 128;
176+
coordinators = (
177+
{ core = 59;
178+
hashers = 8; },
179+
{ core = 64;
180+
hashers = 14; },
181+
{ core = 72;
182+
hashers = 14; },
183+
{ core = 80;
184+
hashers = 14; },
185+
{ core = 88;
186+
hashers = 14; }
187+
)
188+
}
189+
190+
)
191+
},
192+
193+
pc2: {
194+
...
195+
}
196+
197+
```
198+
199+
In this example, cores 59, 64, 72, 80, and 88 are "coordinators", with two hashers per core, meaning that
200+
* In first group core 59 is a coordinator, cores 60-63 are hashers (4 hasher cores / 8 hasher threads)
201+
* In second group core 64 is a coordinator, cores 65-71 are hashers (7 hasher cores / 14 hasher threads)
202+
* And so on
203+
204+
Coordinator cores will usually sit at 100% utilisation, hasher threads **SHOULD** sit at 100% utilisation, anything less
205+
indicates a bottleneck in the system, like not enough NVMe IOPS, not enough Memory bandwidth, or incorrect NUMA setup.
206+
207+
To troubleshoot:
208+
* Read the requirements at the top of this page very carefully
209+
* Benchmark iops with:
146210
```bash
147211
cd extern/supra_seal/deps/spdk-v22.09/
148212

0 commit comments

Comments
 (0)