@@ -23,13 +23,18 @@ SupraSeal is an optimized batch sealing implementation for Filecoin that allows
23
23
- NVMe drives with high IOPS (10-20M total IOPS recommended)
24
24
- GPU for PC2 phase (NVIDIA RTX 3090 or better recommended)
25
25
- 1GB hugepages configured (minimum 36 pages)
26
- - Ubuntu 22.04 or compatible Linux distribution
26
+ - Ubuntu 22.04 or compatible Linux distribution (gcc-11 required, doesn't need to be system-wide)
27
+ - At least 256GB RAM, ALL MEMORY CHANNELS POPULATED
28
+ - Without ** all** memory channels populated sealing ** performance will suffer drastically**
29
+ - NUMA-Per-Socket (NPS) set to 1
27
30
28
31
## Setup
29
32
30
33
### Dependencies
31
34
32
- Cuda 12.x is required
35
+ CUDA 12.x is required, 11.x won't work.
36
+
37
+ ``` bash
33
38
34
39
The build process depends on GCC 11.x system-wide or gcc-11/g++-11 installed locally.
35
40
* On Arch install https://aur.archlinux.org/packages/gcc11
@@ -68,6 +73,13 @@ LayerNVMEDevices = [
68
73
# Add PCIe addresses for all NVMe devices to use
69
74
]
70
75
76
+ # Set to your desiced batch size (what the batch-cpu command says your CPU supports AND what you have nvme space for)
77
+ BatchSealBatchSize = 32
78
+
79
+ # pipelines can be either 1 or 2; 2 pipelines double storage requirements but in correctly balanced systems makes
80
+ # layer hashing run 100% of the time, nearly doubling throughput
81
+ BatchSealPipelines = 2
82
+
71
83
# Set to true for Zen2 or older CPUs for compatibility
72
84
SingleHasherPerThread = false
73
85
```
@@ -139,10 +151,64 @@ curio seal start --now --cc --count 32 --actor f01234 --layers cluster --duratio
139
151
* Monitor hasher core utilisation
140
152
141
153
## Troubleshooting
154
+
155
+ ### Node doesn' t start / isn' t visible in the UI
142
156
* Ensure hugepages are configured correctly
143
157
* Check NVMe device IOPS and capacity
144
158
* If spdk setup fails, try to `wipefs -a` the NVMe devices (this will wipe partitions from the devices, be careful!)
145
- * Benchmark iops with:
159
+
160
+ ### Performance issues
161
+
162
+ You can monitor performance by looking at "hasher" core utilisation in e.g. `htop`.
163
+
164
+ To identify hasher cores, call `curio calc supraseal-config --batch-size 128` (with the correct batch size), and look for `coordinators`
165
+
166
+ ```go
167
+ topology:
168
+ ...
169
+ {
170
+ pc1: {
171
+ writer = 1;
172
+ ...
173
+ hashers_per_core = 2;
174
+
175
+ sector_configs: (
176
+ {
177
+ sectors = 128;
178
+ coordinators = (
179
+ { core = 59;
180
+ hashers = 8; },
181
+ { core = 64;
182
+ hashers = 14; },
183
+ { core = 72;
184
+ hashers = 14; },
185
+ { core = 80;
186
+ hashers = 14; },
187
+ { core = 88;
188
+ hashers = 14; }
189
+ )
190
+ }
191
+
192
+ )
193
+ },
194
+
195
+ pc2: {
196
+ ...
197
+ }
198
+
199
+ ```
200
+
201
+ In this example, cores 59, 64, 72, 80, and 88 are "coordinators", with two hashers per core, meaning that
202
+ * In first group core 59 is a coordinator, cores 60-63 are hashers (4 hasher cores / 8 hasher threads)
203
+ * In second group core 64 is a coordinator, cores 65-71 are hashers (7 hasher cores / 14 hasher threads)
204
+ * And so on
205
+
206
+ Coordinator cores will usually sit at 100% utilisation, hasher threads **SHOULD** sit at 100% utilisation, anything less
207
+ indicates a bottleneck in the system, like not enough NVMe IOPS, not enough Memory bandwidth, or incorrect NUMA setup.
208
+
209
+ To troubleshoot:
210
+ * Read the requirements at the top of this page very carefully
211
+ * Benchmark iops with:
146
212
```bash
147
213
cd extern/supra_seal/deps/spdk-v22.09/
148
214
0 commit comments