Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(code/starknet): Starknet interoperability #868

Draft
wants to merge 30 commits into
base: main
Choose a base branch
from

Conversation

ancazamfir
Copy link
Collaborator

Closes: #XXX

Temporary PR for progress on interop with Starknet


PR author checklist

For all contributors

For external contributors

@ancazamfir
Copy link
Collaborator Author

ancazamfir commented Feb 19, 2025

Setup and run

Start one starknet (bootstrap) node and one Malachite node:

  • Starknet code
git clone https://github.com/starkware-libs/sequencer.git
cd sequencer
git co shahak/for_informalsystems/mock_batcher_to_return_empty_proposals
  • export NUM_VALIDATORS=3
  • Start the starkent bootstrap node with:
cargo run --bin starknet_sequencer_node -- --chain_id MY_CUSTOM_CHAIN_ID --eth_fee_token_address 0x1001 --strk_fee_token_address 0x1002 --recorder_url http://invalid_address.com --base_layer_config.node_url http://invalid_address.com --batcher_config.storage.db_config.path_prefix ./batcher_data --class_manager_config.class_storage_config.class_hash_storage_config.path_prefix ./class_manager_data --state_sync_config.storage_config.db_config.path_prefix ./sync_data --consensus_manager_config.network_config.tcp_port 27000 --mempool_p2p_config.network_config.tcp_port 11000 --state_sync_config.network_config.tcp_port 12000 --http_server_config.port 13000 --monitoring_endpoint_config.port 14000 --consensus_manager_config.network_config.secret_key 0x1111111111111111111111111111111111111111111111111111111111111111 --state_sync_config.network_config.secret_key 0x2222222222222222222222222222222222222222222222222222222222222222 --validator_id 0x64 --consensus_manager_config.context_config.num_validators $NUM_VALIDATORS
  • Initialize a 3 node malachite network:
cargo run --bin informalsystems-malachitebft-starknet-app -- testnet --home nodes --nodes 3
  • Start malachite node 2:
cargo run --bin informalsystems-malachitebft-starknet-app -q -- start --home nodes/2

Current state

  • Starknet bootstrap closing connection:
2025-02-19T19:16:14.230928Z  WARN papyrus_network::network_manager: Incoming connection error. connection id: ConnectionId(1), local addr: /ip4/127.0.0.1/tcp/27000, send back addr: /ip4/127.0.0.1/tcp/58862, error: Transport(Other(Custom { kind: Other, error: Other(Transport(Left(Right(Apply(InvalidKey(DecodingError { msg: "cargo feature `ecdsa` is not enabled", source: None })))))) }))
  • Malachite node noticing
2025-02-19T19:16:14.230664Z DEBUG node{moniker=test-2}:network{peer=QmRw9Db7fvy1R2Ge9Q5jKkZeYEixyUEazTuGByX8mPAp72}: Connected to 12D3KooWPqT2nMDSiXUSx5D7fasaxhxKigVhcqfkKqrLghCq9jxz with connection 7
2025-02-19T19:16:14.231095Z  WARN node{moniker=test-2}:network{peer=QmRw9Db7fvy1R2Ge9Q5jKkZeYEixyUEazTuGByX8mPAp72}: Connection closed with 12D3KooWPqT2nMDSiXUSx5D7fasaxhxKigVhcqfkKqrLghCq9jxz, reason: Connection error: I/O error: i/o error: Broken pipe (os error 32)

cc @romac

@romac
Copy link
Member

romac commented Feb 19, 2025

@romac
Copy link
Member

romac commented Feb 20, 2025

I can do the switch to Ed25519, but will keep the ECDSA implementation around in case we ever need to switch back to it.

@romac romac changed the title code(interop): starknet interop feat(code/starknet): Starknet interoperability Feb 20, 2025
Copy link

codecov bot commented Feb 27, 2025

❌ 15 Tests Failed:

Tests completed Failed Passed Skipped
155 15 140 0
View the top 3 failed test(s) by shortest run time
informalsystems-malachitebft-starknet-test tests::full_nodes::full_node_crash_and_sync
Stack Traces | 0.238s run time
No failure message available
informalsystems-malachitebft-starknet-test tests::full_nodes::mixed_validator_and_full_node_failures
Stack Traces | 0.248s run time
No failure message available
informalsystems-malachitebft-starknet-test tests::full_nodes::basic_full_node
Stack Traces | 0.252s run time
No failure message available

To view more test analytics, go to the Test Analytics Dashboard
📋 Got 3 mins? Take this short survey to help us improve Test Analytics.

@romac
Copy link
Member

romac commented Feb 28, 2025

By making sure to

a) use the LAN address for the sequencer multiaddr in our persistent_peers (eg. 192.168.1.74 instead of 127.0.0.1)
b) wait until this log line shows up in the sequencer logs: Found new external address of this node: /ip4/192.168.1.74/tcp/27000

I got Malachite to the sequencer a few times but it always gets disconnected right away:

2025-02-28T11:13:00.028977Z  INFO node{moniker=test-1}:network: Dialing peer at /ip4/192.168.1.74/tcp/27000, retry #0
2025-02-28T11:13:00.036898Z DEBUG node{moniker=test-1}:network: Connected to 12D3KooWPqT2nMDSiXUSx5D7fasaxhxKigVhcqfkKqrLghCq9jxz with connection 2
2025-02-28T11:13:00.038695Z  WARN node{moniker=test-1}:network: Connection closed with 12D3KooWPqT2nMDSiXUSx5D7fasaxhxKigVhcqfkKqrLghCq9jxz at /ip4/192.168.1.74/tcp/27000, reason: unknown

On the sequencer side:

2025-02-28T11:13:00.039085Z  INFO papyrus_network::peer_manager: Peer Manager found new peer PeerId("12D3KooWBqpUgg9cy3J93VfCDixuTu44nqbNM6ZYN1FE8dJrPCZL")
2025-02-28T11:13:04.709878Z  INFO run_consensus:run_height: starknet_consensus::manager: running consensus for height BlockNumber(0). is_observer: false, validators: [ContractAddress(PatriciaKey(0x64)), ContractAddress(PatriciaKey(0x65)), ContractAddress(PatriciaKey(0x66))] validator_id=0x0000000000000000000000000000000000000000000000000000000000000064 height=0
2025-02-28T11:13:04.710016Z  INFO run_consensus:run_height:start: starknet_consensus::state_machine: Starting round 0 as Proposer validator_id=0x0000000000000000000000000000000000000000000000000000000000000064 height=0
2025-02-28T11:13:04.710173Z  INFO run_consensus:run_height:start:build_proposal: starknet_consensus_orchestrator::sequencer_consensus_context: Building proposal proposal_init=ProposalInit { height: BlockNumber(0), round: 0, valid_round: None, proposer: ContractAddress(PatriciaKey(0x64)) } timeout=3s proposal_id=0 validator_id=0x0000000000000000000000000000000000000000000000000000000000000064 height=0
2025-02-28T11:13:05.712033Z  INFO run_consensus:run_height:start:build_proposal:consensus_build_proposal: starknet_consensus_orchestrator::sequencer_consensus_context: Finished building proposal proposal_commitment=BlockHash(0x0) num_txs=0 validator_id=0x0000000000000000000000000000000000000000000000000000000000000064 height=0 proposal_id=0 round=0
2025-02-28T11:13:05.712416Z  INFO run_consensus:run_height:handle_event: starknet_consensus::single_height_consensus: Broadcasting Vote { vote_type: Prevote, height: 0, round: 0, block_hash: Some(BlockHash(0x0)), voter: ContractAddress(PatriciaKey(0x64)) } validator_id=0x0000000000000000000000000000000000000000000000000000000000000064 height=0

Moreover, this is very finicky and I have not managed to get them to connect this way since…

@romac
Copy link
Member

romac commented Mar 3, 2025

When this works, in the peer manager, add_peer is called with our peer id.

https://github.com/starkware-libs/sequencer/blob/shahak%2Ffor_informalsystems%2Fmock_batcher_to_return_empty_proposals/crates/papyrus_network/src/peer_manager/mod.rs#L122

But sessions_received_when_no_peers is empty so we are never assigned a session.

https://github.com/starkware-libs/sequencer/blob/shahak%2Ffor_informalsystems%2Fmock_batcher_to_return_empty_proposals/crates/papyrus_network/src/peer_manager/mod.rs#L127-L129

Looking at the logs, there should be a pending outbound session with id 0:

2025-03-03T09:25:26.439015Z  INFO papyrus_network::peer_manager: No peers. Waiting for a new peer to be connected for OutboundSessionId { value: 0 }

Forcing a call to self.assign_peer_to_session(OutboundSessionId { value: 0 }); does not help and instead yields the following errors:

2025-03-03T09:33:43.753168Z ERROR papyrus_network::sqmr::behaviour: Outbound session assigned peer but it isn't in outbound_sessions_pending_peer_assignment. Not running query.
2025-03-03T09:33:43.753380Z ERROR papyrus_network::network_manager: Session OutboundSessionId(OutboundSessionId { value: 0 }) failed on ConnectionClosed

I wonder if we need to through the Starknet discovery protocol to make sure that the sequencer's peer manager expects us to connect?

@ancazamfir
Copy link
Collaborator Author

I wonder if we need to through the Starknet discovery protocol to make sure that the sequencer's peer manager expects us to connect?

iirc, shahak mentioned that we need to run at minimum the identity protocol

@romac
Copy link
Member

romac commented Mar 3, 2025

iirc, shahak mentioned that we need to run at minimum the identity protocol

Yes I noticed and changed the identify protocol version accordingly to match Starknet's but that did not help.

@romac
Copy link
Member

romac commented Mar 14, 2025

Update: With the latest changes, the two engines can now connect to each other, provided that Malachite discovery is disabled in the config. See an example config below.

Here is the sequencer processing our vote and ignoring it because we are not in the validator set:

2025-03-14T08:35:09.576407Z DEBUG run_consensus:run_height:handle_vote: starknet_consensus::single_height_consensus: Received Vote { vote_type: Prevote, height: 0, round: 0, block_hash: None, voter: ContractAddress(PatriciaKey(0x616b3e446ebd5597554a34e5419db6b3b3907923258df0fd2ac64984da8227c)) } validator_id=0x0000000000000000000000000000000000000000000000000000000000000064 height=0
2025-03-14T08:35:09.576559Z DEBUG run_consensus:run_height:handle_vote: starknet_consensus::single_height_consensus: Ignoring vote from non validator: vote=Vote { vote_type: Prevote, height: 0, round: 0, block_hash: None, voter: ContractAddress(PatriciaKey(0x616b3e446ebd5597554a34e5419db6b3b3907923258df0fd2ac64984da8227c)) } validator_id=0x0000000000000000000000000000000000000000000000000000000000000064 height=0

The sequencer now sees the vote sent by Malachite, but Malachite does not see votes or proposals broadcasted by the sequencer.

Next steps:

  • Figure out why Malachite does not see any consensus messages from the sequencer
  • Modify the validator sets on both sides to include each other
  • Change the Address type to contain a ContractAddress wrapping a PatriciaKey
Show Malachite config
moniker = "test-1"

[logging]
log_level = "debug"
log_format = "plaintext"

[consensus]
value_payload = "parts-only"

timeout_propose = "10s"
timeout_propose_delta = "500ms"
timeout_prevote = "5s"
timeout_prevote_delta = "500ms"
timeout_precommit = "5s"
timeout_precommit_delta = "500ms"
timeout_commit = "0s"
timeout_step = "300s"

[consensus.p2p]
listen_addr = "/ip4/127.0.0.1/tcp/27001"
persistent_peers = ["/ip4/127.0.0.1/tcp/27000/p2p/12D3KooWPqT2nMDSiXUSx5D7fasaxhxKigVhcqfkKqrLghCq9jxz"]
# persistent_peers = []
transport = "tcp"
pubsub_max_size = "4.2 MB"
rpc_max_size = "10.5 MB"

[consensus.p2p.discovery]
enabled = false
bootstrap_protocol = "kademlia"
selector = "random"
num_outbound_peers = 1
num_inbound_peers = 1
ephemeral_connection_timeout = "5s"
connect_request_max_retries = 10000

[consensus.p2p.protocol]
type = "gossipsub"
mesh_n = 6
mesh_n_high = 12
mesh_n_low = 4
mesh_outbound_min = 2

[consensus.vote_sync]
mode = "rebroadcast"

[mempool]
max_tx_count = 10000
gossip_batch_size = 0

[mempool.p2p]
listen_addr = "/ip4/127.0.0.1/tcp/28001"
persistent_peers = []
transport = "tcp"
pubsub_max_size = "4.2 MB"
rpc_max_size = "10.5 MB"

[mempool.p2p.discovery]
enabled = false
bootstrap_protocol = "full"
selector = "random"
num_outbound_peers = 20
num_inbound_peers = 20
ephemeral_connection_timeout = "5s"

[mempool.p2p.protocol]
type = "gossipsub"
mesh_n = 6
mesh_n_high = 12
mesh_n_low = 4
mesh_outbound_min = 2

[value_sync]
enabled = true
status_update_interval = "10s"
request_timeout = "10s"

[metrics]
enabled = true
listen_addr = "127.0.0.1:29001"

[runtime]
flavor = "single_threaded"

[test]
max_block_size = "1048.6 KB"
tx_size = "1.0 KB"
txs_per_part = 256
time_allowance_factor = 0.5
exec_time_per_tx = "1ms"
max_retain_blocks = 1000

[test.vote_extensions]
enabled = false
size = "0 B"

romac added 2 commits March 17, 2025 11:46
- Use `starknet_api`'s `ContractAddress` type for addresses
- Propose empty blocks
- Fix `BlockInfo` proposal part to match the sequencer's version
- Add validator address in genesis and private key file
@romac
Copy link
Member

romac commented Mar 17, 2025

The two nodes are now able to take turns proposing (empty) blocks and can decide on them.

For this to work, pull the latest changes from this branch, and use the following config.toml, genesis.json and priv_validator_key.json for node 1, ie. copy those in sn/1/config.

config.toml
moniker = "test-1"

[logging]
log_level = "debug"
log_format = "plaintext"

[consensus]
value_payload = "parts-only"

timeout_propose = "10s"
timeout_propose_delta = "500ms"
timeout_prevote = "5s"
timeout_prevote_delta = "500ms"
timeout_precommit = "5s"
timeout_precommit_delta = "500ms"
timeout_commit = "0s"
timeout_step = "300s"

[consensus.p2p]
listen_addr = "/ip4/127.0.0.1/tcp/27001"
persistent_peers = ["/ip4/127.0.0.1/tcp/27000/p2p/12D3KooWPqT2nMDSiXUSx5D7fasaxhxKigVhcqfkKqrLghCq9jxz"]
transport = "tcp"
pubsub_max_size = "4.2 MB"
rpc_max_size = "10.5 MB"

[consensus.p2p.discovery]
enabled = false
bootstrap_protocol = "kademlia"
selector = "random"
num_outbound_peers = 1
num_inbound_peers = 1
ephemeral_connection_timeout = "5s"
connect_request_max_retries = 10000

[consensus.p2p.protocol]
type = "gossipsub"
mesh_n = 6
mesh_n_high = 12
mesh_n_low = 4
mesh_outbound_min = 2

[consensus.vote_sync]
mode = "rebroadcast"

[mempool]
max_tx_count = 10000
gossip_batch_size = 0

[mempool.p2p]
listen_addr = "/ip4/127.0.0.1/tcp/28001"
persistent_peers = []
transport = "tcp"
pubsub_max_size = "4.2 MB"
rpc_max_size = "10.5 MB"

[mempool.p2p.discovery]
enabled = false
bootstrap_protocol = "full"
selector = "random"
num_outbound_peers = 20
num_inbound_peers = 20
ephemeral_connection_timeout = "5s"

[mempool.p2p.protocol]
type = "gossipsub"
mesh_n = 6
mesh_n_high = 12
mesh_n_low = 4
mesh_outbound_min = 2

[value_sync]
enabled = false
status_update_interval = "10s"
request_timeout = "10s"

[metrics]
enabled = true
listen_addr = "127.0.0.1:29001"

[runtime]
flavor = "single_threaded"

[test]
max_block_size = "1048.6 KB"
tx_size = "1.0 KB"
txs_per_part = 256
time_allowance_factor = 0.5
exec_time_per_tx = "1ms"
max_retain_blocks = 1000

[test.vote_extensions]
enabled = false
size = "0 B"
genesis.json
{
  "validator_set": {
    "validators": [
      {
        "address": "0x0000000000000000000000000000000000000000000000000000000000000065",
        "public_key": {
          "type": "tendermint/PubKeyEd25519",
          "value": "Hhaz5Ebr1Yx1VKNOVBnbazs5B5IyWN8P0qxkmE2oIn8="
        },
        "voting_power": 1
      },
      {
        "address": "0x0000000000000000000000000000000000000000000000000000000000000064",
        "public_key": {
          "type": "tendermint/PubKeyEd25519",
          "value": "xZpDiwDi7+uP/4Goz1QChct3qvDZ+IyHr89EqZIgrk0="
        },
        "voting_power": 1
      }
    ]
  }
}
priv_validator_key.json
{
  "private_key": {
    "type": "tendermint/PrivKeyEd25519",
    "value": "z1exB2b1hS8Ajtr/Qf/PvjD0ES/m7RoEwzaT69t2+Ow="
  },
  "public_key": {
    "type": "tendermint/PubKeyEd25519",
    "value": "Hhaz5Ebr1Yx1VKNOVBnbazs5B5IyWN8P0qxkmE2oIn8="
  },
  "address": "0x0000000000000000000000000000000000000000000000000000000000000065"
}

Then, reset the state and start Malachite with

$ rm sn/1/db/* sn/1/wal/*;
$ cargo run --bin informalsystems-malachitebft-starknet-app -- start --home sn/1

Once Malachite starts trying to connect to the sequencer, reset the sequencer state and start the sequencer with:

$ rm -rf batcher_data class_manager_data sync_data
$ RUST_LOG=starknet_consensus=debug,starknet=info,papyrus_network=debug,papyrus=info cargo run --bin starknet_sequencer_node -- --chain_id MY_CUSTOM_CHAIN_ID --eth_fee_token_address 0x1001 --strk_fee_token_address 0x1002 --recorder_url http://invalid_address.com --base_layer_config.node_url http://invalid_address.com --batcher_config.storage.db_config.path_prefix ./batcher_data --class_manager_config.class_storage_config.class_hash_storage_config.path_prefix ./class_manager_data --state_sync_config.storage_config.db_config.path_prefix ./sync_data --consensus_manager_config.network_config.tcp_port 27000 --mempool_p2p_config.network_config.tcp_port 11000 --state_sync_config.network_config.tcp_port 12000 --http_server_config.port 13000 --monitoring_endpoint_config.port 14000 --consensus_manager_config.network_config.secret_key 0x1111111111111111111111111111111111111111111111111111111111111111 --state_sync_config.network_config.secret_key 0x2222222222222222222222222222222222222222222222222222222222222222 --validator_id 0x64 --consensus_manager_config.context_config.num_validators 2

The two nodes will start taking turns proposing empty blocks and decide them.

After killing the two processes, if one does not clear the state of each node, the nodes are able to restart, pick up where they left off, and produce blocks again :)

tx_part.send(part).await?;
sequence += 1;
}
// let max_block_size = params.max_block_size.as_u64() as usize;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If max_block_size is set to 0, this should yield empty blocks without reaping any txes from the mempool.

// }
// }

// // Proposal Commitment
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can stay commented out until the sequencer updates to the latest protos

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants