Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Initial support work for dist-connected ractor nodes #32

Merged
merged 2 commits into from
Jan 27, 2023
Merged

Conversation

slawlor
Copy link
Owner

@slawlor slawlor commented Jan 24, 2023

This is the initial workings of dist-connected nodes in ractor-cluster as well as a lot of associated changes in ractor necessary to support the node() protocol. This PR includes changes to

  1. The way the ractor::Message trait is handled globally when the cluster feature is active.
  2. Cleanup of the BoxedMessage trait and extension to support either a Box of the raw message (for local actor communications) or a Vec<u8> of binary data, representing a "remote" message which needs to be deserialized since it is transmitted over a network link. This leaves deserialization up to each message implementation.
  3. Extensions to a lot of the creation logic of ActorCell and ActorRef to support creating an actor without a dynamically generated ActorId since we remote actor id's will already be created and transmitted from the remote system.
  4. A PID based actor registry, which is a global map of all currently active Actors based on their PID (for local actors only). The NodeSession will utilize this registry to route messages from a remote system to a local system.
  5. Threading the timeout (if set) to the RpcReplyPort so downstream handlers can also timeout with a reasonable window, this is helpful when we are talking to a remote actor since an RPC might be of some complex type, which the reply will come back serialized on a different Rpc port of Vec<u8>, be decoded, then the original port will receive a decoded reply. Therefore it's like linking Rpc ports together with a converter in the middle, and that linkage is spawned as a background tokio::Task. We don't want this living forever potentially (though is should auto-cleanup on the channel being dropped) so this allows us to thread the timeout where necessary, including over the network link (see ractor-cluster/src/protocol/node.proto)

Beyond the changes to ractor listed above, this includes the initial design for ractor-cluster which owns and maintains the inter-node links and remoting protocols. We so far have sketched out

  1. Inter-node authentication handshakes
  2. TCP server socket + session management
  3. The "RemoteActor" implementation which represents an actor on a remote node() locally and allows for communication to a remote actor.
  4. Node cast + call operations and RPC forwarding

However it is not complete, and requires much more

  1. Both unit and integration testing (setup a real link, dist-connect two hosts, authenticate, etc)
  2. Control logic for inter-node control messages (list actors, actor lifecycle event forwarding, do we want to handle remote-link supervision?, etc) UPDATE: everything here is done except remote supervision and actor death doesn't report exit or panic, just that an actor exited on the remote host + synchronizes the local RemoteActor.
  3. Remote node's in the global node registry. Because nodes are spawned locally, and the registry isn't live over the remote link, we don't today register remote actors in the local named registry (ractor::registry)
  4. Probably some more things I can't think of right now..

Associated to issue #16

@slawlor
Copy link
Owner Author

slawlor commented Jan 25, 2023

Update

Some bugs were found in the native TCP handling, along with some other tracing issues in the dist protocol. A recent update fixes many problems there and adds an integration test getting two nodes dist connected (at least authenticated)

[2023-01-25T20:47:11.597Z INFO  ractor_cluster::node::client] TCP Session opened for 127.0.0.1:8198
[2023-01-25T20:47:11.597Z INFO  ractor_cluster::net::listener] TCP Session opened for 127.0.0.1:53675
[2023-01-25T20:47:11.597Z INFO  ractor_playground::distributed] Client connected NodeServer b to NodeServer a
[2023-01-25T20:47:11.598Z DEBUG ractor_cluster::net::session] SEND: 127.0.0.1:53675 -> 127.0.0.1:8198 - 'NetworkMessage { message: Some(Auth(AuthenticationMessage { msg: Some(Name(NameMessage { name: "node_b@localhost", flags: Some(NodeFlags { version: 1 }) })) })) }'
[2023-01-25T20:47:11.598Z DEBUG ractor_cluster::net::session] RECEIVE 127.0.0.1:8198 <- 127.0.0.1:53675 - 'NetworkMessage { message: Some(Auth(AuthenticationMessage { msg: Some(Name(NameMessage { name: "node_b@localhost", flags: Some(NodeFlags { version: 1 }) })) })) }'
[2023-01-25T20:47:11.598Z DEBUG ractor_cluster::node::node_session] Next server auth state: WaitingOnClientChallengeReply(1694482506, [140, 78, 61, 120, 70, 66, 59, 112, 236, 183, 82, 218, 175, 137, 165, 255, 254, 207, 228, 19, 21, 13, 211, 155, 92, 64, 37, 154, 36, 51, 100, 228])
[2023-01-25T20:47:11.598Z DEBUG ractor_cluster::net::session] SEND: 127.0.0.1:8198 -> 127.0.0.1:53675 - 'NetworkMessage { message: Some(Auth(AuthenticationMessage { msg: Some(ServerStatus(ServerStatus { status: Ok })) })) }'
[2023-01-25T20:47:11.598Z DEBUG ractor_cluster::net::session] SEND: 127.0.0.1:8198 -> 127.0.0.1:53675 - 'NetworkMessage { message: Some(Auth(AuthenticationMessage { msg: Some(ServerChallenge(Challenge { name: "node_a@localhost", flags: Some(NodeFlags { version: 1 }), challenge: 1694482506 })) })) }'
[2023-01-25T20:47:11.598Z DEBUG ractor_cluster::net::session] RECEIVE 127.0.0.1:53675 <- 127.0.0.1:8198 - 'NetworkMessage { message: Some(Auth(AuthenticationMessage { msg: Some(ServerStatus(ServerStatus { status: Ok })) })) }'
[2023-01-25T20:47:11.598Z DEBUG ractor_cluster::net::session] RECEIVE 127.0.0.1:53675 <- 127.0.0.1:8198 - 'NetworkMessage { message: Some(Auth(AuthenticationMessage { msg: Some(ServerChallenge(Challenge { name: "node_a@localhost", flags: Some(NodeFlags { version: 1 }), challenge: 1694482506 })) })) }'
[2023-01-25T20:47:11.598Z DEBUG ractor_cluster::node::node_session] Next client auth state: WaitingForServerChallenge(ServerStatus { status: Ok })
[2023-01-25T20:47:11.599Z DEBUG ractor_cluster::node::node_session] Next client auth state: WaitingForServerChallengeAck(Challenge { name: "node_a@localhost", flags: Some(NodeFlags { version: 1 }), challenge: 1694482506 }, [140, 78, 61, 120, 70, 66, 59, 112, 236, 183, 82, 218, 175, 137, 165, 255, 254, 207, 228, 19, 21, 13, 211, 155, 92, 64, 37, 154, 36, 51, 100, 228], 3175486454, [240, 248, 217, 115, 205, 95, 242, 36, 155, 10, 123, 123, 240, 55, 226, 236, 106, 170, 222, 69, 121, 33, 200, 160, 94, 252, 102, 243, 104, 48, 166, 1])
[2023-01-25T20:47:11.599Z DEBUG ractor_cluster::net::session] SEND: 127.0.0.1:53675 -> 127.0.0.1:8198 - 'NetworkMessage { message: Some(Auth(AuthenticationMessage { msg: Some(ClientChallenge(ChallengeReply { challenge: 3175486454, digest: [140, 78, 61, 120, 70, 66, 59, 112, 236, 183, 82, 218, 175, 137, 165, 255, 254, 207, 228, 19, 21, 13, 211, 155, 92, 64, 37, 154, 36, 51, 100, 228] })) })) }'
[2023-01-25T20:47:11.599Z DEBUG ractor_cluster::net::session] RECEIVE 127.0.0.1:8198 <- 127.0.0.1:53675 - 'NetworkMessage { message: Some(Auth(AuthenticationMessage { msg: Some(ClientChallenge(ChallengeReply { challenge: 3175486454, digest: [140, 78, 61, 120, 70, 66, 59, 112, 236, 183, 82, 218, 175, 137, 165, 255, 254, 207, 228, 19, 21, 13, 211, 155, 92, 64, 37, 154, 36, 51, 100, 228] })) })) }'
[2023-01-25T20:47:11.599Z INFO  ractor_cluster::node::node_session] Node Session 0 is authenticated
[2023-01-25T20:47:11.599Z DEBUG ractor_cluster::node::node_session] Next server auth state: Ok([240, 248, 217, 115, 205, 95, 242, 36, 155, 10, 123, 123, 240, 55, 226, 236, 106, 170, 222, 69, 121, 33, 200, 160, 94, 252, 102, 243, 104, 48, 166, 1])
[2023-01-25T20:47:11.599Z INFO  ractor_cluster::node::node_session] Session authenticated on NodeSession 0 - (Some(127.0.0.1:53675))
[2023-01-25T20:47:11.599Z DEBUG ractor_cluster::net::session] SEND: 127.0.0.1:8198 -> 127.0.0.1:53675 - 'NetworkMessage { message: Some(Auth(AuthenticationMessage { msg: Some(ServerAck(ChallengeAck { digest: [240, 248, 217, 115, 205, 95, 242, 36, 155, 10, 123, 123, 240, 55, 226, 236, 106, 170, 222, 69, 121, 33, 200, 160, 94, 252, 102, 243, 104, 48, 166, 1] })) })) }'
[2023-01-25T20:47:11.600Z DEBUG ractor_cluster::net::session] RECEIVE 127.0.0.1:53675 <- 127.0.0.1:8198 - 'NetworkMessage { message: Some(Auth(AuthenticationMessage { msg: Some(ServerAck(ChallengeAck { digest: [240, 248, 217, 115, 205, 95, 242, 36, 155, 10, 123, 123, 240, 55, 226, 236, 106, 170, 222, 69, 121, 33, 200, 160, 94, 252, 102, 243, 104, 48, 166, 1] })) })) }'
[2023-01-25T20:47:11.600Z INFO  ractor_cluster::node::node_session] Node Session 0 is authenticated
[2023-01-25T20:47:11.600Z DEBUG ractor_cluster::node::node_session] Next client auth state: Ok
[2023-01-25T20:47:11.600Z INFO  ractor_cluster::node::node_session] Session authenticated on NodeSession 0 - (Some(127.0.0.1:8198))
[2023-01-25T20:47:14.265Z DEBUG ractor_cluster::net::session] SEND: 127.0.0.1:53675 -> 127.0.0.1:8198 - 'NetworkMessage { message: Some(Control(ControlMessage { msg: Some(Ping(Ping { timestamp: Some(Timestamp { seconds: 1674679634, nanos: 265196000 }) })) })) }'
[2023-01-25T20:47:14.267Z DEBUG ractor_cluster::net::session] RECEIVE 127.0.0.1:8198 <- 127.0.0.1:53675 - 'NetworkMessage { message: Some(Control(ControlMessage { msg: Some(Ping(Ping { timestamp: Some(Timestamp { seconds: 1674679634, nanos: 265196000 }) })) })) }'
[2023-01-25T20:47:14.267Z DEBUG ractor_cluster::net::session] SEND: 127.0.0.1:8198 -> 127.0.0.1:53675 - 'NetworkMessage { message: Some(Control(ControlMessage { msg: Some(Pong(Pong { timestamp: Some(Timestamp { seconds: 1674679634, nanos: 265196000 }) })) })) }'
[2023-01-25T20:47:14.267Z DEBUG ractor_cluster::net::session] RECEIVE 127.0.0.1:53675 <- 127.0.0.1:8198 - 'NetworkMessage { message: Some(Control(ControlMessage { msg: Some(Pong(Pong { timestamp: Some(Timestamp { seconds: 1674679634, nanos: 265196000 }) })) })) }'
[2023-01-25T20:47:14.268Z DEBUG ractor_cluster::node::node_session] Ping -> Pong took 2ms
[2023-01-25T20:47:14.397Z DEBUG ractor_cluster::net::session] SEND: 127.0.0.1:8198 -> 127.0.0.1:53675 - 'NetworkMessage { message: Some(Control(ControlMessage { msg: Some(Ping(Ping { timestamp: Some(Timestamp { seconds: 1674679634, nanos: 397114000 }) })) })) }'
[2023-01-25T20:47:14.397Z DEBUG ractor_cluster::net::session] RECEIVE 127.0.0.1:53675 <- 127.0.0.1:8198 - 'NetworkMessage { message: Some(Control(ControlMessage { msg: Some(Ping(Ping { timestamp: Some(Timestamp { seconds: 1674679634, nanos: 397114000 }) })) })) }'
[2023-01-25T20:47:14.398Z DEBUG ractor_cluster::net::session] SEND: 127.0.0.1:53675 -> 127.0.0.1:8198 - 'NetworkMessage { message: Some(Control(ControlMessage { msg: Some(Pong(Pong { timestamp: Some(Timestamp { seconds: 1674679634, nanos: 397114000 }) })) })) }'
[2023-01-25T20:47:14.398Z DEBUG ractor_cluster::net::session] RECEIVE 127.0.0.1:8198 <- 127.0.0.1:53675 - 'NetworkMessage { message: Some(Control(ControlMessage { msg: Some(Pong(Pong { timestamp: Some(Timestamp { seconds: 1674679634, nanos: 397114000 }) })) })) }'
[2023-01-25T20:47:14.398Z DEBUG ractor_cluster::node::node_session] Ping -> Pong took 1ms
[2023-01-25T20:47:15.521Z DEBUG ractor_cluster::net::session] SEND: 127.0.0.1:8198 -> 127.0.0.1:53675 - 'NetworkMessage { message: Some(Control(ControlMessage { msg: Some(Ping(Ping { timestamp: Some(Timestamp { seconds: 1674679635, nanos: 521425000 }) })) })) }'
[2023-01-25T20:47:15.522Z DEBUG ractor_cluster::net::session] RECEIVE 127.0.0.1:53675 <- 127.0.0.1:8198 - 'NetworkMessage { message: Some(Control(ControlMessage { msg: Some(Ping(Ping { timestamp: Some(Timestamp { seconds: 1674679635, nanos: 521425000 }) })) })) }'
[2023-01-25T20:47:15.523Z DEBUG ractor_cluster::net::session] SEND: 127.0.0.1:53675 -> 127.0.0.1:8198 - 'NetworkMessage { message: Some(Control(ControlMessage { msg: Some(Pong(Pong { timestamp: Some(Timestamp { seconds: 1674679635, nanos: 521425000 }) })) })) }'
[2023-01-25T20:47:15.523Z DEBUG ractor_cluster::net::session] RECEIVE 127.0.0.1:8198 <- 127.0.0.1:53675 - 'NetworkMessage { message: Some(Control(ControlMessage { msg: Some(Pong(Pong { timestamp: Some(Timestamp { seconds: 1674679635, nanos: 521425000 }) })) })) }'
[2023-01-25T20:47:15.523Z DEBUG ractor_cluster::node::node_session] Ping -> Pong took 2ms
[2023-01-25T20:47:18.805Z DEBUG ractor_cluster::net::session] SEND: 127.0.0.1:53675 -> 127.0.0.1:8198 - 'NetworkMessage { message: Some(Control(ControlMessage { msg: Some(Ping(Ping { timestamp: Some(Timestamp { seconds: 1674679638, nanos: 805199000 }) })) })) }'
[2023-01-25T20:47:18.806Z DEBUG ractor_cluster::net::session] RECEIVE 127.0.0.1:8198 <- 127.0.0.1:53675 - 'NetworkMessage { message: Some(Control(ControlMessage { msg: Some(Ping(Ping { timestamp: Some(Timestamp { seconds: 1674679638, nanos: 805199000 }) })) })) }'
[2023-01-25T20:47:18.806Z DEBUG ractor_cluster::net::session] SEND: 127.0.0.1:8198 -> 127.0.0.1:53675 - 'NetworkMessage { message: Some(Control(ControlMessage { msg: Some(Pong(Pong { timestamp: Some(Timestamp { seconds: 1674679638, nanos: 805199000 }) })) })) }'
[2023-01-25T20:47:18.807Z DEBUG ractor_cluster::net::session] RECEIVE 127.0.0.1:53675 <- 127.0.0.1:8198 - 'NetworkMessage { message: Some(Control(ControlMessage { msg: Some(Pong(Pong { timestamp: Some(Timestamp { seconds: 1674679638, nanos: 805199000 }) })) })) }'
[2023-01-25T20:47:18.807Z DEBUG ractor_cluster::node::node_session] Ping -> Pong took 1ms
[2023-01-25T20:47:19.759Z DEBUG ractor_cluster::net::session] SEND: 127.0.0.1:8198 -> 127.0.0.1:53675 - 'NetworkMessage { message: Some(Control(ControlMessage { msg: Some(Ping(Ping { timestamp: Some(Timestamp { seconds: 1674679639, nanos: 759635000 }) })) })) }'
[2023-01-25T20:47:19.760Z DEBUG ractor_cluster::net::session] RECEIVE 127.0.0.1:53675 <- 127.0.0.1:8198 - 'NetworkMessage { message: Some(Control(ControlMessage { msg: Some(Ping(Ping { timestamp: Some(Timestamp { seconds: 1674679639, nanos: 759635000 }) })) })) }'
[2023-01-25T20:47:19.760Z DEBUG ractor_cluster::net::session] SEND: 127.0.0.1:53675 -> 127.0.0.1:8198 - 'NetworkMessage { message: Some(Control(ControlMessage { msg: Some(Pong(Pong { timestamp: Some(Timestamp { seconds: 1674679639, nanos: 759635000 }) })) })) }'
[2023-01-25T20:47:19.760Z DEBUG ractor_cluster::net::session] RECEIVE 127.0.0.1:8198 <- 127.0.0.1:53675 - 'NetworkMessage { message: Some(Control(ControlMessage { msg: Some(Pong(Pong { timestamp: Some(Timestamp { seconds: 1674679639, nanos: 759635000 }) })) })) }'
[2023-01-25T20:47:19.760Z DEBUG ractor_cluster::node::node_session] Ping -> Pong took 1ms
[2023-01-25T20:47:21.599Z WARN  ractor_playground::distributed] Terminating test
[2023-01-25T20:47:21.604Z INFO  ractor_cluster::net::session] TCP Session closed for 127.0.0.1:53675
[2023-01-25T20:47:21.604Z INFO  ractor_cluster::net::session] TCP Session closed for 127.0.0.1:8198

The test spawns two node servers + runs through the authentication protocol logging at trace + debug various steps in the process

@slawlor slawlor force-pushed the distributed branch 5 times, most recently from af3823a to 274cc2c Compare January 25, 2023 17:26
@codecov-commenter
Copy link

codecov-commenter commented Jan 25, 2023

Codecov Report

Base: 86.18% // Head: 86.70% // Increases project coverage by +0.52% 🎉

Coverage data is based on head (12db101) compared to base (59e6ff8).
Patch coverage: 86.16% of modified lines in pull request are covered.

Additional details and impacted files
@@            Coverage Diff             @@
##             main      #32      +/-   ##
==========================================
+ Coverage   86.18%   86.70%   +0.52%     
==========================================
  Files          25       27       +2     
  Lines        2331     2942     +611     
==========================================
+ Hits         2009     2551     +542     
- Misses        322      391      +69     
Impacted Files Coverage Δ
ractor/src/actor/errors.rs 69.04% <0.00%> (+1.60%) ⬆️
ractor/src/concurrency.rs 96.42% <ø> (ø)
ractor/src/port/output/tests.rs 96.38% <ø> (-0.09%) ⬇️
ractor/src/actor/messages.rs 44.64% <30.00%> (-12.12%) ⬇️
ractor/src/registry/mod.rs 78.26% <33.33%> (-3.56%) ⬇️
ractor/src/pg/mod.rs 72.32% <56.75%> (+0.58%) ⬆️
ractor/src/message.rs 67.56% <67.56%> (ø)
ractor/src/rpc/mod.rs 64.78% <71.42%> (+0.12%) ⬆️
ractor/src/registry/pid_registry.rs 73.01% <73.01%> (ø)
ractor/src/actor/mod.rs 90.68% <79.54%> (+0.64%) ⬆️
... and 22 more

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

@slawlor slawlor force-pushed the distributed branch 2 times, most recently from 2efc282 to ec90324 Compare January 25, 2023 18:07
…y on the Erlang

protocol.

This is a collection of tcp managing actors and session management for automated session
handling

Related issue: #16
@slawlor slawlor marked this pull request as ready for review January 25, 2023 22:41
@slawlor slawlor force-pushed the distributed branch 6 times, most recently from 38c0e78 to 2940034 Compare January 26, 2023 18:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants