You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
> For backwards compatibility reasons IPFS is the default if no network is provided
110
+
> For backwards compatibility reasons IPFS is the default if no network is specified
110
111
111
-
The crawler can store its results as JSON documents in a [Postgres](https://www.postgresql.org/) or [Clickhouse](https://clickhouse.com/) database -
112
+
The crawler can store its results as JSON documents, in a [Postgres](https://www.postgresql.org/), or in a[Clickhouse](https://clickhouse.com/) database -
112
113
the `--dry-run` flag prevents it from doing any of it. Nebula will just print a
113
-
summary of the crawl at the end instead. A crawl takes ~5-10 min depending on
114
+
summary of the crawl at the end instead. For the IPFS network, a crawl takes ~5-10 min depending on
114
115
your internet connection. You can also specify the network you want to crawl by
115
116
appending, e.g., `--network FILECOIN` and limit the number of peers to crawl by
116
117
providing the `--limit` flag with the value of, e.g., `1000`. Example:
If you want to store the information in a proper database, you could run `just start-postgres` to start a local postgres instance via docker in the background and run Nebula like:
At this point, you can also start Nebula's monitoring process, which would periodically probe the discovered peers to track their uptime. Run in another terminal:
When Nebula is configured to store its results in a postgres database, then it also tracks session information of remote peers. A session is one continuous streak of uptime (see below).
176
177
177
-
However, this is not implemented for all supported networks. The [ProbeLab](https://probelab.network) team is using the monitoring feature for the IPFS, Celestia, Filecoin, and Avail networks. Most notably, the Ethereum discv4/discv5 monitoring implementation still needs some work.
178
+
However, this is not implemented for all supported networks. The [ProbeLab](https://probelab.network) team is using the monitoring feature for the IPFS, Celestia, Filecoin, and Avail networks. Most notably, the Ethereum discv4/discv5 and Bitcoin monitoring implementations still need work.
178
179
179
180
---
180
181
@@ -184,62 +185,16 @@ There are a few more command line flags that are documented when you run`nebula
184
185
185
186
### `crawl`
186
187
187
-
The `crawl` sub-command starts by connecting to a set of bootstrap nodes and constructing the routing tables (kademlia _k_-buckets)
188
-
of these peers based on their [`PeerIDs`](https://docs.libp2p.io/concepts/peer-id/). Then `nebula` builds
189
-
random `PeerIDs` with common prefix lengths (CPL) that fall each peers buckets, and asks each remote peer if they know any peers that are
190
-
closer (XOR distance) to the ones `nebula` just constructed. This will effectively yield a list of all `PeerIDs` that a peer has
191
-
in its routing table. The process repeats for all found peers until `nebula` does not find any new `PeerIDs`.
192
-
193
-
If Nebula is configured to store its results in a database, every peer that was visited is written to it. The visit information includes latency measurements (dial/connect/crawl durations), current set of multi addresses, current agent version and current set of supported protocols. If the peer was dialable `nebula` will
194
-
also create a `session` instance that contains the following information:
195
-
196
-
```sql
197
-
CREATETABLEsessions (
198
-
-- A unique id that identifies this particular session
199
-
id INT GENERATED ALWAYS AS IDENTITY,
200
-
-- Reference to the remote peer ID. (database internal ID)
201
-
peer_id INTNOT NULL,
202
-
-- Timestamp of the first time we were able to visit that peer.
203
-
first_successful_visit TIMESTAMPTZNOT NULL,
204
-
-- Timestamp of the last time we were able to visit that peer.
205
-
last_successful_visit TIMESTAMPTZNOT NULL,
206
-
-- Timestamp when we should start visiting this peer again.
207
-
next_visit_due_at TIMESTAMPTZ,
208
-
-- When did we notice that this peer is not reachable.
209
-
first_failed_visit TIMESTAMPTZ,
210
-
-- When did we first notice that this peer is not reachable anymore.
211
-
last_failed_visit TIMESTAMPTZ,
212
-
-- When did we last visit this peer. For indexing purposes.
213
-
last_visited_at TIMESTAMPTZNOT NULL,
214
-
-- When was this session instance updated the last time
215
-
updated_at TIMESTAMPTZNOT NULL,
216
-
-- When was this session instance created
217
-
created_at TIMESTAMPTZNOT NULL,
218
-
-- Number of successful visits in this session.
219
-
successful_visits_count INTEGERNOT NULL,
220
-
-- The number of times this session went from pending to open again.
221
-
recovered_count INTEGERNOT NULL,
222
-
-- The state this session is in (open, pending, closed)
223
-
-- open: currently considered online
224
-
-- pending: peer missed a dial and is pending to be closed
225
-
-- closed: peer is considered to be offline and session is complete
226
-
state session_state NOT NULL,
227
-
-- Number of failed visits before closing this session.
228
-
failed_visits_count SMALLINTNOT NULL,
229
-
-- What's the first error before we close this session.
230
-
finish_reason net_error,
231
-
-- The uptime time range for this session measured from first- to last_successful_visit to
232
-
uptime TSTZRANGE NOT NULL,
233
-
234
-
-- The peer ID should always point to an existing peer in the DB
The `crawl` sub-command starts by connecting to a set of bootstrap nodes and then
189
+
requesting the information of other peers in the network using the network-native
190
+
discovery protocol. For most supported networks these are several Kademlia
191
+
`FIND_NODE` RPCs. For Bitcoin-related networks it's a `getaddr` RPC.
241
192
242
-
At the end of each crawl `nebula` persists general statistics about the crawl like the total duration, dialable peers, encountered errors, agent versions etc...
193
+
For Kademlia-based networks Nebula constructs the routing tables (kademlia _k_-buckets)
194
+
of the remote peer based on its [`PeerID`](https://docs.libp2p.io/concepts/peer-id/). Then `nebula` builds
195
+
random `PeerIDs` with common prefix lengths (CPL) that fall in each of the peers' buckets, and asks if it knows any peers that are
196
+
closer (XOR distance) to the ones `nebula` just generated. This will effectively yield a list of all `PeerIDs` that a peer has
197
+
in its routing table. The process repeats for all found peers until `nebula` does not find any new `PeerIDs`.
243
198
244
199
> [!TIP]
245
200
> You can use the `crawl` sub-command with the global `--dry-run` option that skips any database operations.
@@ -251,7 +206,7 @@ NAME:
251
206
nebula crawl - Crawls the entire network starting with a set of bootstrap nodes.
252
207
253
208
USAGE:
254
-
nebula crawl [command options] [arguments...]
209
+
nebula crawl [command options]
255
210
256
211
OPTIONS:
257
212
--addr-dial-type value Which type of addresses should Nebula try to dial (private, public, any) (default: "public") [$NEBULA_CRAWL_ADDR_DIAL_TYPE]
@@ -265,13 +220,16 @@ OPTIONS:
265
220
266
221
Network Specific Configuration:
267
222
268
-
--check-exposed Whether to check if the Kubo API is exposed. Checking also includes crawling the API. (default: false) [$NEBULA_CRAWL_CHECK_EXPOSED]
223
+
--check-exposed IPFS/AMINO: Whether to check if the Kubo API is exposed. Checking also includes crawling the API. (default: false) [$NEBULA_CRAWL_CHECK_EXPOSED]
224
+
--keep-enr ETHEREUM_CONSENSUS: Whether to keep the full ENR. (default: false) [$NEBULA_CRAWL_KEEP_ENR]
225
+
--udp-response-timeout value ETHEREUM_EXECUTION: The response timeout for UDP requests in the disv4 DHT (default: 3s) [$NEBULA_CRAWL_UDP_RESPONSE_TIMEOUT]
269
226
270
227
```
271
228
272
229
### `monitor`
273
230
274
-
The `monitor` sub-command polls every 10 seconds all sessions from the database (see above) that are due to be dialed
231
+
The `monitor` sub-command is only implemented for libp2p based networks and with the postgres database backend.
232
+
It polls every 10 seconds all sessions from the database (see above) that are due to be dialed
275
233
in the next 10 seconds (based on the `next_visit_due_at` timestamp). It attempts to dial all peers using previously
276
234
saved multi-addresses and updates their `session` instances accordingly if they're dialable or not.
277
235
@@ -286,16 +244,18 @@ NAME:
286
244
nebula monitor - Monitors the network by periodically dialing previously crawled peers.
287
245
288
246
USAGE:
289
-
nebula monitor [command options] [arguments...]
247
+
nebula monitor [command options]
290
248
291
249
OPTIONS:
292
250
--workers value How many concurrent workers should dial peers. (default: 1000) [$NEBULA_MONITOR_WORKER_COUNT]
251
+
--network value Which network belong the database sessions to. Relevant for parsing peer IDs and muti addresses. (default: "IPFS") [$NEBULA_MONITOR_NETWORK]
293
252
--help, -h show help
253
+
294
254
```
295
255
296
256
### `resolve`
297
257
298
-
The resolve sub-command goes through all multi addresses that are present in the database and resolves them to their respective IP-addresses. Behind one multi address can be multiple IP addresses due to, e.g., the [`dnsaddr` protocol](https://github.com/multiformats/multiaddr/blob/master/protocols/DNSADDR.md).
258
+
The resolve sub-command is only available when using the postgres datbaase backend. It goes through all multi addresses that are present in the database and resolves them to their respective IP-addresses. Behind one multi address can be multiple IP addresses due to, e.g., the [`dnsaddr` protocol](https://github.com/multiformats/multiaddr/blob/master/protocols/DNSADDR.md).
299
259
Further, it queries the GeoLite2 database from [Maxmind](https://www.maxmind.com/en/home) to extract country information about the IP addresses and [UdgerDB](https://udger.com/) to detect datacenters. The command saves all information alongside the resolved addresses.
300
260
301
261
Command line help page:
@@ -316,32 +276,62 @@ OPTIONS:
316
276
## Development
317
277
318
278
To develop this project, you need Go `1.23` and the following tools:
-[`golang-migrate/migrate`](https://github.com/golang-migrate/migrate) to manage the SQL migration `v4.15.2`
321
-
-[`volatiletech/sqlboiler`](https://github.com/volatiletech/sqlboiler) to generate Go ORM `v4.14.1`
322
-
-`docker` to run a local postgres instance
323
-
324
-
To install the necessary tools you can run `make tools`. This will use the `go install` command to download and install the tools into your `$GOPATH/bin` directory. So make sure you have it in your `$PATH` environment variable.
287
+
To install the necessary tools you can run `just tools`. This will use the `go install` command to download and install the tools into your `$GOPATH/bin` directory. So make sure you have it in your `$PATH` environment variable.
325
288
326
289
### Database
327
290
328
-
You need a running postgres instance to persist and/or read the crawl results. Run `just start-postgres` or use the following command to start a local instance of postgres:
291
+
You need a running Postgres or ClickHouse instance to persist and/or read the crawl results.
292
+
Run `just start-postgres` or `just start-clickhouse` respectively
for convenience there are also the `just restart-postgres` and `just restart-clickhouse` recipes.
320
+
334
321
> [!TIP]
335
322
> You can use the `crawl` sub-command with the global `--dry-run` option that skips any database operations or store the results as JSON files with the `--json-out` flag.
336
323
337
324
The default database settings for local development are:
338
325
339
-
```
326
+
```toml
340
327
Name = "nebula_local"
341
328
Password = "password_local"
342
329
User = "nebula_local"
343
330
Host = "localhost"
331
+
# postgres
344
332
Port = 5432
333
+
# clickhouse
334
+
Port = 9000
345
335
```
346
336
347
337
Migrations are applied automatically when `nebula` starts and successfully establishes a database connection.
@@ -351,27 +341,60 @@ To run them manually you can run:
351
341
```shell
352
342
# Up migrations
353
343
just migrate-postgres up
344
+
just migrate-clickhouse up
354
345
355
346
# Down migrations
356
347
just migrate-postgres down
348
+
just migrate-clickhouse down
357
349
358
-
# Generate the ORM with SQLBoiler
350
+
# Generate the ORMs with SQLBoiler (only postgres)
> Make sure to adjust the `chlocal` migration and copy it over to the `chcluster` folder. In a clustered clickhouse deployment the table engines need to be prefixed with `Replicated`, like `ReplicatedMergeTree` as opposed to just `MergeTree`.
365
+
367
366
### Tests
368
367
369
-
To run the tests you need a running test database instance:
368
+
To run the tests you need a running test database instance. The following command
369
+
starts test postgres and clickhouse containers, runs the tests and tears them
370
+
down again:
370
371
371
372
```shell
372
373
just test
373
374
```
374
375
376
+
The test database containers won't interfere with other local containers as
377
+
all names etc. are suffixed with `_test` as opposed to `_local`.
378
+
379
+
To speed up running database tests you can do the following:
380
+
381
+
```shell
382
+
just start-postgres test
383
+
just start-clickhouse test
384
+
```
385
+
386
+
Then run the plain tests (without starting database containers):
0 commit comments