Skip to content

Commit 5e53b57

Browse files
committed
update README.md
1 parent 3e59777 commit 5e53b57

File tree

2 files changed

+101
-78
lines changed

2 files changed

+101
-78
lines changed

README.md

+100-77
Original file line numberDiff line numberDiff line change
@@ -16,14 +16,15 @@ networks, but this restriction was lifted.
1616
Currently, Nebula supports the following networks:
1717

1818
- [IPFS](https://ipfs.network) - [_Amino DHT_](https://blog.ipfs.tech/2023-09-amino-refactoring/)
19+
- [Bitcoin](https://bitcoin.org/) | [Litecoin](https://litecoin.org/) | [Dogecoin](https://dogecoin.com/) (alpha)
1920
- [Ethereum](https://ethereum.org/en/) - [_Consensus Layer (discv5)_](https://ethereum.org/uz/developers/docs/networking-layer/#consensus-discovery) | [_Execution Layer (discv4)_](https://ethereum.org/uz/developers/docs/networking-layer/#discovery)
21+
- [Optimism](https://www.optimism.io/) compatible chains
2022
- [Portal](https://www.portal.network/) - (_alpha - [wire protocol](https://github.com/ethereum/portal-network-specs/blob/master/portal-wire-protocol.md) not implemented_)
2123
- [Filecoin](https://filecoin.io)
2224
- [Polkadot](https://polkadot.network/) - [_Kusama_](https://kusama.network/) | [_Rococo_](https://substrate.io/developers/rococo-network/) | [_Westend_](https://wiki.polkadot.network/docs/maintain-networks#westend-test-network)
2325
- [Avail](https://www.availproject.org/) - [_Mainnet_](https://docs.availproject.org/docs/networks#mainnet) | [_Turing_](https://docs.availproject.org/docs/networks#turing-testnet) | _<small>Light Client + Full Node versions</small>_
2426
- [Celestia](https://celestia.org/) - [_Mainnet_](https://blog.celestia.org/celestia-mainnet-is-live/) | [_Mocha_](https://docs.celestia.org/nodes/mocha-testnet) | [_Arabica_](https://github.com/celestiaorg/celestia-node/blob/9c0a5fb0626ada6e6cdb8bcd816d01a3aa5043ad/nodebuilder/p2p/bootstrap.go#L40)
2527
- [Pactus](https://pactus.org)
26-
- [Bitcoin](https://bitcoin.org/) (_alpha-monitoring not implemented_)
2728
- [Dria](https://dria.co/)
2829
- [Gnosis](https://www.gnosis.io/)
2930
- ... your network? Get in touch [[email protected]]([email protected]).
@@ -106,11 +107,11 @@ nebula --dry-run crawl
106107
```
107108

108109
> [!NOTE]
109-
> For backwards compatibility reasons IPFS is the default if no network is provided
110+
> For backwards compatibility reasons IPFS is the default if no network is specified
110111
111-
The crawler can store its results as JSON documents in a [Postgres](https://www.postgresql.org/) or [Clickhouse](https://clickhouse.com/) database -
112+
The crawler can store its results as JSON documents, in a [Postgres](https://www.postgresql.org/), or in a [Clickhouse](https://clickhouse.com/) database -
112113
the `--dry-run` flag prevents it from doing any of it. Nebula will just print a
113-
summary of the crawl at the end instead. A crawl takes ~5-10 min depending on
114+
summary of the crawl at the end instead. For the IPFS network, a crawl takes ~5-10 min depending on
114115
your internet connection. You can also specify the network you want to crawl by
115116
appending, e.g., `--network FILECOIN` and limit the number of peers to crawl by
116117
providing the `--limit` flag with the value of, e.g., `1000`. Example:
@@ -163,18 +164,18 @@ jq -r '.NeighborIDs[] as $neighbor | [.PeerID, $neighbor] | @csv' ./results/2025
163164
If you want to store the information in a proper database, you could run `just start-postgres` to start a local postgres instance via docker in the background and run Nebula like:
164165

165166
```shell
166-
nebula --db-user nebula_test --db-name nebula_test crawl --neighbors
167+
nebula --db-user nebula_local --db-name nebula_local crawl --neighbors
167168
```
168169

169170
At this point, you can also start Nebula's monitoring process, which would periodically probe the discovered peers to track their uptime. Run in another terminal:
170171

171172
```shell
172-
nebula --db-user nebula_test --db-name nebula_test monitor
173+
nebula --db-user nebula_local --db-name nebula_local monitor
173174
```
174175

175176
When Nebula is configured to store its results in a postgres database, then it also tracks session information of remote peers. A session is one continuous streak of uptime (see below).
176177

177-
However, this is not implemented for all supported networks. The [ProbeLab](https://probelab.network) team is using the monitoring feature for the IPFS, Celestia, Filecoin, and Avail networks. Most notably, the Ethereum discv4/discv5 monitoring implementation still needs some work.
178+
However, this is not implemented for all supported networks. The [ProbeLab](https://probelab.network) team is using the monitoring feature for the IPFS, Celestia, Filecoin, and Avail networks. Most notably, the Ethereum discv4/discv5 and Bitcoin monitoring implementations still need work.
178179

179180
---
180181

@@ -184,62 +185,16 @@ There are a few more command line flags that are documented when you run`nebula
184185

185186
### `crawl`
186187

187-
The `crawl` sub-command starts by connecting to a set of bootstrap nodes and constructing the routing tables (kademlia _k_-buckets)
188-
of these peers based on their [`PeerIDs`](https://docs.libp2p.io/concepts/peer-id/). Then `nebula` builds
189-
random `PeerIDs` with common prefix lengths (CPL) that fall each peers buckets, and asks each remote peer if they know any peers that are
190-
closer (XOR distance) to the ones `nebula` just constructed. This will effectively yield a list of all `PeerIDs` that a peer has
191-
in its routing table. The process repeats for all found peers until `nebula` does not find any new `PeerIDs`.
192-
193-
If Nebula is configured to store its results in a database, every peer that was visited is written to it. The visit information includes latency measurements (dial/connect/crawl durations), current set of multi addresses, current agent version and current set of supported protocols. If the peer was dialable `nebula` will
194-
also create a `session` instance that contains the following information:
195-
196-
```sql
197-
CREATE TABLE sessions (
198-
-- A unique id that identifies this particular session
199-
id INT GENERATED ALWAYS AS IDENTITY,
200-
-- Reference to the remote peer ID. (database internal ID)
201-
peer_id INT NOT NULL,
202-
-- Timestamp of the first time we were able to visit that peer.
203-
first_successful_visit TIMESTAMPTZ NOT NULL,
204-
-- Timestamp of the last time we were able to visit that peer.
205-
last_successful_visit TIMESTAMPTZ NOT NULL,
206-
-- Timestamp when we should start visiting this peer again.
207-
next_visit_due_at TIMESTAMPTZ,
208-
-- When did we notice that this peer is not reachable.
209-
first_failed_visit TIMESTAMPTZ,
210-
-- When did we first notice that this peer is not reachable anymore.
211-
last_failed_visit TIMESTAMPTZ,
212-
-- When did we last visit this peer. For indexing purposes.
213-
last_visited_at TIMESTAMPTZ NOT NULL,
214-
-- When was this session instance updated the last time
215-
updated_at TIMESTAMPTZ NOT NULL,
216-
-- When was this session instance created
217-
created_at TIMESTAMPTZ NOT NULL,
218-
-- Number of successful visits in this session.
219-
successful_visits_count INTEGER NOT NULL,
220-
-- The number of times this session went from pending to open again.
221-
recovered_count INTEGER NOT NULL,
222-
-- The state this session is in (open, pending, closed)
223-
-- open: currently considered online
224-
-- pending: peer missed a dial and is pending to be closed
225-
-- closed: peer is considered to be offline and session is complete
226-
state session_state NOT NULL,
227-
-- Number of failed visits before closing this session.
228-
failed_visits_count SMALLINT NOT NULL,
229-
-- What's the first error before we close this session.
230-
finish_reason net_error,
231-
-- The uptime time range for this session measured from first- to last_successful_visit to
232-
uptime TSTZRANGE NOT NULL,
233-
234-
-- The peer ID should always point to an existing peer in the DB
235-
CONSTRAINT fk_sessions_peer_id FOREIGN KEY (peer_id) REFERENCES peers (id) ON DELETE CASCADE,
236-
237-
PRIMARY KEY (id, state, last_visited_at)
238-
239-
) PARTITION BY LIST (state);
240-
```
188+
The `crawl` sub-command starts by connecting to a set of bootstrap nodes and then
189+
requesting the information of other peers in the network using the network-native
190+
discovery protocol. For most supported networks these are several Kademlia
191+
`FIND_NODE` RPCs. For Bitcoin-related networks it's a `getaddr` RPC.
241192

242-
At the end of each crawl `nebula` persists general statistics about the crawl like the total duration, dialable peers, encountered errors, agent versions etc...
193+
For Kademlia-based networks Nebula constructs the routing tables (kademlia _k_-buckets)
194+
of the remote peer based on its [`PeerID`](https://docs.libp2p.io/concepts/peer-id/). Then `nebula` builds
195+
random `PeerIDs` with common prefix lengths (CPL) that fall in each of the peers' buckets, and asks if it knows any peers that are
196+
closer (XOR distance) to the ones `nebula` just generated. This will effectively yield a list of all `PeerIDs` that a peer has
197+
in its routing table. The process repeats for all found peers until `nebula` does not find any new `PeerIDs`.
243198

244199
> [!TIP]
245200
> You can use the `crawl` sub-command with the global `--dry-run` option that skips any database operations.
@@ -251,7 +206,7 @@ NAME:
251206
nebula crawl - Crawls the entire network starting with a set of bootstrap nodes.
252207
253208
USAGE:
254-
nebula crawl [command options] [arguments...]
209+
nebula crawl [command options]
255210
256211
OPTIONS:
257212
--addr-dial-type value Which type of addresses should Nebula try to dial (private, public, any) (default: "public") [$NEBULA_CRAWL_ADDR_DIAL_TYPE]
@@ -265,13 +220,16 @@ OPTIONS:
265220
266221
Network Specific Configuration:
267222
268-
--check-exposed Whether to check if the Kubo API is exposed. Checking also includes crawling the API. (default: false) [$NEBULA_CRAWL_CHECK_EXPOSED]
223+
--check-exposed IPFS/AMINO: Whether to check if the Kubo API is exposed. Checking also includes crawling the API. (default: false) [$NEBULA_CRAWL_CHECK_EXPOSED]
224+
--keep-enr ETHEREUM_CONSENSUS: Whether to keep the full ENR. (default: false) [$NEBULA_CRAWL_KEEP_ENR]
225+
--udp-response-timeout value ETHEREUM_EXECUTION: The response timeout for UDP requests in the disv4 DHT (default: 3s) [$NEBULA_CRAWL_UDP_RESPONSE_TIMEOUT]
269226
270227
```
271228

272229
### `monitor`
273230

274-
The `monitor` sub-command polls every 10 seconds all sessions from the database (see above) that are due to be dialed
231+
The `monitor` sub-command is only implemented for libp2p based networks and with the postgres database backend.
232+
It polls every 10 seconds all sessions from the database (see above) that are due to be dialed
275233
in the next 10 seconds (based on the `next_visit_due_at` timestamp). It attempts to dial all peers using previously
276234
saved multi-addresses and updates their `session` instances accordingly if they're dialable or not.
277235

@@ -286,16 +244,18 @@ NAME:
286244
nebula monitor - Monitors the network by periodically dialing previously crawled peers.
287245
288246
USAGE:
289-
nebula monitor [command options] [arguments...]
247+
nebula monitor [command options]
290248
291249
OPTIONS:
292250
--workers value How many concurrent workers should dial peers. (default: 1000) [$NEBULA_MONITOR_WORKER_COUNT]
251+
--network value Which network belong the database sessions to. Relevant for parsing peer IDs and muti addresses. (default: "IPFS") [$NEBULA_MONITOR_NETWORK]
293252
--help, -h show help
253+
294254
```
295255

296256
### `resolve`
297257

298-
The resolve sub-command goes through all multi addresses that are present in the database and resolves them to their respective IP-addresses. Behind one multi address can be multiple IP addresses due to, e.g., the [`dnsaddr` protocol](https://github.com/multiformats/multiaddr/blob/master/protocols/DNSADDR.md).
258+
The resolve sub-command is only available when using the postgres datbaase backend. It goes through all multi addresses that are present in the database and resolves them to their respective IP-addresses. Behind one multi address can be multiple IP addresses due to, e.g., the [`dnsaddr` protocol](https://github.com/multiformats/multiaddr/blob/master/protocols/DNSADDR.md).
299259
Further, it queries the GeoLite2 database from [Maxmind](https://www.maxmind.com/en/home) to extract country information about the IP addresses and [UdgerDB](https://udger.com/) to detect datacenters. The command saves all information alongside the resolved addresses.
300260

301261
Command line help page:
@@ -316,32 +276,62 @@ OPTIONS:
316276
## Development
317277

318278
To develop this project, you need Go `1.23` and the following tools:
279+
```
280+
github.com/golang-migrate/migrate/v4/cmd/[email protected]
281+
github.com/volatiletech/sqlboiler/[email protected]
282+
github.com/volatiletech/sqlboiler/v4/drivers/[email protected]
283+
go.uber.org/mock/[email protected]
284+
285+
```
319286

320-
- [`golang-migrate/migrate`](https://github.com/golang-migrate/migrate) to manage the SQL migration `v4.15.2`
321-
- [`volatiletech/sqlboiler`](https://github.com/volatiletech/sqlboiler) to generate Go ORM `v4.14.1`
322-
- `docker` to run a local postgres instance
323-
324-
To install the necessary tools you can run `make tools`. This will use the `go install` command to download and install the tools into your `$GOPATH/bin` directory. So make sure you have it in your `$PATH` environment variable.
287+
To install the necessary tools you can run `just tools`. This will use the `go install` command to download and install the tools into your `$GOPATH/bin` directory. So make sure you have it in your `$PATH` environment variable.
325288

326289
### Database
327290

328-
You need a running postgres instance to persist and/or read the crawl results. Run `just start-postgres` or use the following command to start a local instance of postgres:
291+
You need a running Postgres or ClickHouse instance to persist and/or read the crawl results.
292+
Run `just start-postgres` or `just start-clickhouse` respectively
293+
or use one of the following commands:
294+
295+
```shell
296+
# for postgres
297+
docker run --rm -d --name nebula-postgres-local -p 5432:5432 -e POSTGRES_DB=nebula_local -e POSTGRES_USER=nebula_local -e POSTGRES_PASSWORD=password_local postgres:14
298+
299+
# for clickhouse
300+
docker run --rm -d --name nebula-clickhouse-local -p 8123:8123 -p 9000:9000 -e CLICKHOUSE_DB=nebula_local -e CLICKHOUSE_USER=nebula_local -e CLICKHOUSE_PASSWORD=password_local clickhouse/clickhouse-server:24.12
301+
```
302+
303+
Then you can connect to the database with:
304+
305+
```shell
306+
just repl-postgres
307+
# or
308+
just repl-clickhouse
309+
```
310+
311+
To stop the containers:
329312

330313
```shell
331-
docker run --rm -p 5432:5432 -e POSTGRES_PASSWORD=password_password -e POSTGRES_USER=nebula_local -e POSTGRES_DB=nebula_local --name nebula_local_db postgres:14
314+
just stop-postgres
315+
# or
316+
just stop-clickhouse
332317
```
333318

319+
for convenience there are also the `just restart-postgres` and `just restart-clickhouse` recipes.
320+
334321
> [!TIP]
335322
> You can use the `crawl` sub-command with the global `--dry-run` option that skips any database operations or store the results as JSON files with the `--json-out` flag.
336323
337324
The default database settings for local development are:
338325

339-
```
326+
```toml
340327
Name = "nebula_local"
341328
Password = "password_local"
342329
User = "nebula_local"
343330
Host = "localhost"
331+
# postgres
344332
Port = 5432
333+
# clickhouse
334+
Port = 9000
345335
```
346336

347337
Migrations are applied automatically when `nebula` starts and successfully establishes a database connection.
@@ -351,27 +341,60 @@ To run them manually you can run:
351341
```shell
352342
# Up migrations
353343
just migrate-postgres up
344+
just migrate-clickhouse up
354345

355346
# Down migrations
356347
just migrate-postgres down
348+
just migrate-clickhouse down
357349

358-
# Generate the ORM with SQLBoiler
350+
# Generate the ORMs with SQLBoiler (only postgres)
359351
just models # runs: sqlboiler
360352
```
361353

362354
```shell
363355
# Create new migration
356+
# postgres
364357
migrate create -ext sql -dir db/migrations/pg -seq some_migration_name
358+
359+
# clickhouse
360+
migrate create -ext sql -dir db/migrations/chlocal -seq some_migration_name
365361
```
366362

363+
> [!NOTE]
364+
> Make sure to adjust the `chlocal` migration and copy it over to the `chcluster` folder. In a clustered clickhouse deployment the table engines need to be prefixed with `Replicated`, like `ReplicatedMergeTree` as opposed to just `MergeTree`.
365+
367366
### Tests
368367

369-
To run the tests you need a running test database instance:
368+
To run the tests you need a running test database instance. The following command
369+
starts test postgres and clickhouse containers, runs the tests and tears them
370+
down again:
370371

371372
```shell
372373
just test
373374
```
374375

376+
The test database containers won't interfere with other local containers as
377+
all names etc. are suffixed with `_test` as opposed to `_local`.
378+
379+
To speed up running database tests you can do the following:
380+
381+
```shell
382+
just start-postgres test
383+
just start-clickhouse test
384+
```
385+
386+
Then run the plain tests (without starting database containers):
387+
```shell
388+
just test-plain
389+
```
390+
391+
Eventually, stop the containers again:
392+
393+
```shell
394+
just stop-postgres test
395+
just stop-clickhouse test
396+
```
397+
375398
## Release Checklist
376399

377400
- [ ] Merge everything into `main`

justfile

+1-1
Original file line numberDiff line numberDiff line change
@@ -71,7 +71,7 @@ start-postgres env="local" detached="true":
7171
-p {{ if env == "local" { "5432" } else { "5433" } }}:5432 \
7272
-e POSTGRES_DB={{postgres_dbname_prefix}}{{env}} \
7373
-e POSTGRES_USER={{postgres_user_prefix}}{{env}} \
74-
-e POSTGRES_PASSWORD={{postgres_pass_prefix}}{{env}} {{postgres_image}} > /dev/null 2>&1
74+
-e POSTGRES_PASSWORD={{postgres_pass_prefix}}{{env}} {{postgres_image}} > /dev/null 2>&1 || true
7575

7676
@echo "Waiting for Postgres to become ready..."
7777
@while ! docker exec {{postgres_container_prefix}}{{env}} pg_isready > /dev/null 2>&1; do sleep 1; done

0 commit comments

Comments
 (0)