Skip to content

Remove INET_ADDRSTRLEN limit for symbolic hostnames and optimize socket connections.#578

Open
Jakio815 wants to merge 9 commits intomainfrom
ipv4-fix
Open

Remove INET_ADDRSTRLEN limit for symbolic hostnames and optimize socket connections.#578
Jakio815 wants to merge 9 commits intomainfrom
ipv4-fix

Conversation

@Jakio815
Copy link
Copy Markdown
Collaborator

@Jakio815 Jakio815 commented Apr 13, 2026

Summary

  • Remove server_hostname from socket_priv_t, which was actually not necessary to save in the struct.
  • Fix bug setting pointers of net_abs in stack to NULL after shutdown_net().
  • Support connect_to_socket() use in_addr instead of char* hostname when available.

1. Remove server_hostname from socket_priv_t

There was a bug found in #576, which happened after merging #508, not allowing symbolic hostnames longer than 16 bytes (INET_ADDRSTRLEN) from being connected.
This happened because there was a memcpy() to save the host_name in char server_hostname[INET_ADDRSTRLEN];.

While investigating this, the code only used this for printing debug messages on the hostname, and it was never used again.
I removed the server_hostname field from socket_priv_t.

2. Bug Fix: Remove unnecessary NULL setting to pointers.

After shutdown_net(net_abs), I set all net_abs to NULL, however some were just pointers from the stack, so had no effect. Removed all of these. Thanks for reporting this @edwardalee.

3. Support connect_to_socket() use in_addr instead of char* hostname when available.

When a federate tries to connect another federate, the RTI sends the target federate's IP address in a network format (uint32_t), not the presentation format (192.168.0.1 or localhost).
However, connect_to_socket() only supported the presentation format as the input parameter.
So there was some inefficiency where federate.c does inet_ntop() to convert the recieved network format into the presentation format, and inside connect_to_socket, does getaddrinfo to convert that presentation format backward into the network format (struct sockaddr_in) to make the actual system call connect().

So, by updating connect_to_socket() to accept an optional struct in_addr*, this PR removes this redundant Network -> Presentation -> Network conversion, which removes unnecessary DNS resolution overhead.

Why we couldn't remove getaddrinfo() entirely?

The redundancy above was from lf_connect_to_federate, however, connect_to_net is used once again in lf_connect_to_rti.

However, when used in lf_connect_to_rti, this always uses the char* hostname, because the RTI's address is provided strictly as a human-readable string (e.g., "localhost", "192.168.0.1" or "rti-server.domain.com"), performing DNS resolution via getaddrinfo() is mandatory to discover its actual network routing address. Therefore, connect_to_socket() was updated to selectively bypass DNS only when a raw IP is available.

This suggestion was also made by @edwardalee, thanks for your insight!

… resolution if the IP address (`struct in_addr`) is already provided.
@Jakio815 Jakio815 changed the title Ipv4 fix Remove INET_ADDRSTRLEN limit for symbolic hostnames and optimize socket connections. Apr 13, 2026
@Jakio815 Jakio815 marked this pull request as ready for review April 13, 2026 14:07
@edwardalee edwardalee requested a review from Copilot April 13, 2026 14:07
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates the socket-based network abstraction used by the federated runtime to (1) avoid truncating symbolic hostnames, and (2) optimize certain federate-to-federate connections by using an already-available IPv4 address directly instead of round-tripping through presentation format + DNS resolution.

Changes:

  • Removed persisted server_hostname storage from socket_priv_t (fixes hostname truncation caused by INET_ADDRSTRLEN-sized buffers).
  • Extended socket connection parameters to optionally pass a raw struct in_addr* and bypass getaddrinfo() when available.
  • Cleaned up ineffective “set local pointer to NULL after shutdown” code paths.

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
network/impl/src/socket_common.c Removes debug hostname storage; adds connect_to_socket() fast-path using struct in_addr.
network/impl/src/lf_socket_support.c Stops copying hostname into a fixed-size buffer; passes hostname/IP through to connect_to_socket().
network/api/socket_common.h Adds optional server_ip_addr to connection params; removes server_hostname from socket_priv_t; updates API docs/signature.
core/federated/RTI/rti_remote.h Updates comments to reflect removal of server_hostname tracking.
core/federated/RTI/rti_remote.c Removes hostname-based debug log and ineffective NULL assignments after shutdown.
core/federated/federate.c Uses server_ip_addr when connecting to federates; avoids terminating accept loop on transient accept failure.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +277 to +282
if (ip_addr != NULL) {
char host_str[INET_ADDRSTRLEN];
inet_ntop(AF_INET, ip_addr, host_str, INET_ADDRSTRLEN);
lf_print_info("Connected to %s:%d.", host_str, used_port);
} else {
lf_print_info("Connected to %s:%d.", hostname, used_port);
Copy link

Copilot AI Apr 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

connect_to_socket() logs "Connected to ..." unconditionally, even when ret remains < 0 due to timeout or repeated failures. This can mislead operators and test logs. Consider only printing the success message when ret == 0, and print a failure/timeout message (or nothing) otherwise; also handle inet_ntop() failure before using host_str.

Suggested change
if (ip_addr != NULL) {
char host_str[INET_ADDRSTRLEN];
inet_ntop(AF_INET, ip_addr, host_str, INET_ADDRSTRLEN);
lf_print_info("Connected to %s:%d.", host_str, used_port);
} else {
lf_print_info("Connected to %s:%d.", hostname, used_port);
if (ret == 0) {
if (ip_addr != NULL) {
char host_str[INET_ADDRSTRLEN];
if (inet_ntop(AF_INET, ip_addr, host_str, INET_ADDRSTRLEN) != NULL) {
lf_print_info("Connected to %s:%d.", host_str, used_port);
} else {
lf_print_info("Connected to port %d.", used_port);
}
} else {
lf_print_info("Connected to %s:%d.", hostname, used_port);
}

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we need this? On line 267, which is not shown here,

    if (ret < 0) {
      lf_sleep(CONNECT_RETRY_INTERVAL);
      lf_print_warning("Could not connect. Will try again every " PRINTF_TIME " nanoseconds. Connecting to port %d.\n",
                       CONNECT_RETRY_INTERVAL, used_port);
      continue;
    } else {
      break;
    }
  }

If ret<0, it continues, so it wouldn't get to this part if ret<0.

// Get address structure matching hostname and hints criteria, and
// set port to the port number provided in str. There should only
// ever be one matching address structure, and we connect to that.
if (getaddrinfo(hostname, (const char*)&str, &hints, &result)) {
Copy link

Copilot AI Apr 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the getaddrinfo() call, (const char*)&str is a confusing cast of an array address to char* (and can hide type issues). Passing str directly is clearer and avoids relying on an unnecessary cast.

Suggested change
if (getaddrinfo(hostname, (const char*)&str, &hints, &result)) {
if (getaddrinfo(hostname, str, &hints, &result)) {

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

if (net == NULL) {
lf_print_warning("Federate failed to accept the network abstraction.");
return NULL;
continue;
Copy link

Copilot AI Apr 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If accept_net() returns NULL (e.g., because the server socket was closed and accept() starts failing with EBADF), continue will immediately retry in a tight loop, potentially spamming logs and burning CPU until _lf_termination_executed flips. Consider breaking/returning on NULL (as before), or adding an explicit backoff + exit condition based on errno/termination state.

Suggested change
continue;
break;

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

void* lf_handle_p2p_connections_from_federates(void* env_arg) {
...
  while (received_federates < _fed.number_of_inbound_p2p_connections && !_lf_termination_executed) {
    if (rti_failed()) {
      break;
    }
    // Wait for an incoming connection request.
    net_abstraction_t net = accept_net(_fed.server_net);
    if (net == NULL) {
      lf_print_warning("Federate failed to accept the network abstraction.");
      continue;
    }
...
  }

  LF_PRINT_LOG("All %zu remote federates are connected.", _fed.number_of_inbound_p2p_connections);
  return NULL;
}

I think it should still be continue, however, as it says it will keep looping when there are spamming connect() attempts.

How about adding a short sleep like 100 ms? @edwardalee

Also besides of that after rti_failed(), I think it should be return NULL;, because it shouldn't print the log that all remote federates are connected.

Comment on lines 134 to 139
/** @brief Hostname of the remote server. */
const char* server_hostname;

/** @brief IP address of the remote server. If provided, bypasses DNS resolution. */
struct in_addr* server_ip_addr;
} socket_connection_params_t;
Copy link

Copilot AI Apr 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

server_ip_addr is treated as an input-only value (and connect_to_socket() does not modify it). Declaring this as const struct in_addr* (in both socket_connection_params_t and connect_to_socket() signature) would better communicate intent and allow callers to pass pointers to const data without casts.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants