Conversation
… resolution if the IP address (`struct in_addr`) is already provided.
INET_ADDRSTRLEN limit for symbolic hostnames and optimize socket connections.
There was a problem hiding this comment.
Pull request overview
This PR updates the socket-based network abstraction used by the federated runtime to (1) avoid truncating symbolic hostnames, and (2) optimize certain federate-to-federate connections by using an already-available IPv4 address directly instead of round-tripping through presentation format + DNS resolution.
Changes:
- Removed persisted
server_hostnamestorage fromsocket_priv_t(fixes hostname truncation caused byINET_ADDRSTRLEN-sized buffers). - Extended socket connection parameters to optionally pass a raw
struct in_addr*and bypassgetaddrinfo()when available. - Cleaned up ineffective “set local pointer to NULL after shutdown” code paths.
Reviewed changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
network/impl/src/socket_common.c |
Removes debug hostname storage; adds connect_to_socket() fast-path using struct in_addr. |
network/impl/src/lf_socket_support.c |
Stops copying hostname into a fixed-size buffer; passes hostname/IP through to connect_to_socket(). |
network/api/socket_common.h |
Adds optional server_ip_addr to connection params; removes server_hostname from socket_priv_t; updates API docs/signature. |
core/federated/RTI/rti_remote.h |
Updates comments to reflect removal of server_hostname tracking. |
core/federated/RTI/rti_remote.c |
Removes hostname-based debug log and ineffective NULL assignments after shutdown. |
core/federated/federate.c |
Uses server_ip_addr when connecting to federates; avoids terminating accept loop on transient accept failure. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| if (ip_addr != NULL) { | ||
| char host_str[INET_ADDRSTRLEN]; | ||
| inet_ntop(AF_INET, ip_addr, host_str, INET_ADDRSTRLEN); | ||
| lf_print_info("Connected to %s:%d.", host_str, used_port); | ||
| } else { | ||
| lf_print_info("Connected to %s:%d.", hostname, used_port); |
There was a problem hiding this comment.
connect_to_socket() logs "Connected to ..." unconditionally, even when ret remains < 0 due to timeout or repeated failures. This can mislead operators and test logs. Consider only printing the success message when ret == 0, and print a failure/timeout message (or nothing) otherwise; also handle inet_ntop() failure before using host_str.
| if (ip_addr != NULL) { | |
| char host_str[INET_ADDRSTRLEN]; | |
| inet_ntop(AF_INET, ip_addr, host_str, INET_ADDRSTRLEN); | |
| lf_print_info("Connected to %s:%d.", host_str, used_port); | |
| } else { | |
| lf_print_info("Connected to %s:%d.", hostname, used_port); | |
| if (ret == 0) { | |
| if (ip_addr != NULL) { | |
| char host_str[INET_ADDRSTRLEN]; | |
| if (inet_ntop(AF_INET, ip_addr, host_str, INET_ADDRSTRLEN) != NULL) { | |
| lf_print_info("Connected to %s:%d.", host_str, used_port); | |
| } else { | |
| lf_print_info("Connected to port %d.", used_port); | |
| } | |
| } else { | |
| lf_print_info("Connected to %s:%d.", hostname, used_port); | |
| } |
There was a problem hiding this comment.
I don't think we need this? On line 267, which is not shown here,
if (ret < 0) {
lf_sleep(CONNECT_RETRY_INTERVAL);
lf_print_warning("Could not connect. Will try again every " PRINTF_TIME " nanoseconds. Connecting to port %d.\n",
CONNECT_RETRY_INTERVAL, used_port);
continue;
} else {
break;
}
}
If ret<0, it continues, so it wouldn't get to this part if ret<0.
network/impl/src/socket_common.c
Outdated
| // Get address structure matching hostname and hints criteria, and | ||
| // set port to the port number provided in str. There should only | ||
| // ever be one matching address structure, and we connect to that. | ||
| if (getaddrinfo(hostname, (const char*)&str, &hints, &result)) { |
There was a problem hiding this comment.
In the getaddrinfo() call, (const char*)&str is a confusing cast of an array address to char* (and can hide type issues). Passing str directly is clearer and avoids relying on an unnecessary cast.
| if (getaddrinfo(hostname, (const char*)&str, &hints, &result)) { | |
| if (getaddrinfo(hostname, str, &hints, &result)) { |
| if (net == NULL) { | ||
| lf_print_warning("Federate failed to accept the network abstraction."); | ||
| return NULL; | ||
| continue; |
There was a problem hiding this comment.
If accept_net() returns NULL (e.g., because the server socket was closed and accept() starts failing with EBADF), continue will immediately retry in a tight loop, potentially spamming logs and burning CPU until _lf_termination_executed flips. Consider breaking/returning on NULL (as before), or adding an explicit backoff + exit condition based on errno/termination state.
| continue; | |
| break; |
There was a problem hiding this comment.
void* lf_handle_p2p_connections_from_federates(void* env_arg) {
...
while (received_federates < _fed.number_of_inbound_p2p_connections && !_lf_termination_executed) {
if (rti_failed()) {
break;
}
// Wait for an incoming connection request.
net_abstraction_t net = accept_net(_fed.server_net);
if (net == NULL) {
lf_print_warning("Federate failed to accept the network abstraction.");
continue;
}
...
}
LF_PRINT_LOG("All %zu remote federates are connected.", _fed.number_of_inbound_p2p_connections);
return NULL;
}
I think it should still be continue, however, as it says it will keep looping when there are spamming connect() attempts.
How about adding a short sleep like 100 ms? @edwardalee
Also besides of that after rti_failed(), I think it should be return NULL;, because it shouldn't print the log that all remote federates are connected.
| /** @brief Hostname of the remote server. */ | ||
| const char* server_hostname; | ||
|
|
||
| /** @brief IP address of the remote server. If provided, bypasses DNS resolution. */ | ||
| struct in_addr* server_ip_addr; | ||
| } socket_connection_params_t; |
There was a problem hiding this comment.
server_ip_addr is treated as an input-only value (and connect_to_socket() does not modify it). Declaring this as const struct in_addr* (in both socket_connection_params_t and connect_to_socket() signature) would better communicate intent and allow callers to pass pointers to const data without casts.
Summary
server_hostnamefromsocket_priv_t, which was actually not necessary to save in the struct.net_absin stack to NULL aftershutdown_net().connect_to_socket()usein_addrinstead ofchar* hostnamewhen available.1. Remove
server_hostnamefromsocket_priv_tThere was a bug found in #576, which happened after merging #508, not allowing symbolic hostnames longer than 16 bytes (
INET_ADDRSTRLEN) from being connected.This happened because there was a
memcpy()to save the host_name inchar server_hostname[INET_ADDRSTRLEN];.While investigating this, the code only used this for printing debug messages on the hostname, and it was never used again.
I removed the
server_hostnamefield fromsocket_priv_t.2. Bug Fix: Remove unnecessary NULL setting to pointers.
After
shutdown_net(net_abs), I set allnet_absto NULL, however some were just pointers from the stack, so had no effect. Removed all of these. Thanks for reporting this @edwardalee.3. Support
connect_to_socket()usein_addrinstead ofchar* hostnamewhen available.When a federate tries to connect another federate, the RTI sends the target federate's IP address in a network format (
uint32_t), not the presentation format (192.168.0.1orlocalhost).However,
connect_to_socket()only supported the presentation format as the input parameter.So there was some inefficiency where
federate.cdoesinet_ntop()to convert the recieved network format into the presentation format, and insideconnect_to_socket, doesgetaddrinfoto convert that presentation format backward into the network format (struct sockaddr_in) to make the actual system callconnect().So, by updating
connect_to_socket()to accept an optionalstruct in_addr*, this PR removes this redundant Network -> Presentation -> Network conversion, which removes unnecessary DNS resolution overhead.Why we couldn't remove
getaddrinfo()entirely?The redundancy above was from
lf_connect_to_federate, however,connect_to_netis used once again inlf_connect_to_rti.However, when used in
lf_connect_to_rti, this always uses thechar* hostname, because the RTI's address is provided strictly as a human-readable string (e.g., "localhost", "192.168.0.1" or "rti-server.domain.com"), performing DNS resolution viagetaddrinfo()is mandatory to discover its actual network routing address. Therefore,connect_to_socket()was updated to selectively bypass DNS only when a raw IP is available.This suggestion was also made by @edwardalee, thanks for your insight!