Fix segfault on closing down network connections by edwardalee · Pull Request #579 · lf-lang/reactor-c

edwardalee · 2026-04-13T15:56:00Z

The problem was diagnosed and fixed by Claude Code. Below in its diagnosis:

Root Cause: Use-After-Free from `shutdown_net` Freeing Heap Memory

The segfault is caused by a use-after-free race condition introduced by PR #508's switch from raw socket file descriptors to heap-allocated socket_priv_t structs.

The Key Change

Before PR #508, connections were plain int socket file descriptors:

// Old lf_terminate_execution — no memory to free:
shutdown_socket(&_fed.sockets_for_inbound_p2p_connections[i], false);
_fed.sockets_for_inbound_p2p_connections[i] = -1;

After PR #508, connections are net_abstraction_t (pointers to malloc'd socket_priv_t):

int shutdown_net(net_abstraction_t net_abs, bool read_before_closing) {
  if (net_abs == NULL) {
    LF_PRINT_LOG("Socket already closed.");
    return 0;
  }
  socket_priv_t* priv = (socket_priv_t*)net_abs;
  int ret = shutdown_socket(&priv->socket_descriptor, read_before_closing);
  free_net(net_abs);  // <-- FREES the socket_priv_t!
  return ret;
}

The Race

In lf_terminate_execution, inbound P2P connections are shut down (and freed) before the listener threads are joined:

  LF_PRINT_DEBUG("Closing incoming P2P network abstractions.");
  // Close any incoming P2P network abstractions that are still open.
  for (int i = 0; i < NUMBER_OF_FEDERATES; i++) {
    shutdown_net(_fed.net_for_inbound_p2p_connections[i], false);  // frees socket_priv_t!
    // ... sets to NULL ...
  }
  // ... closes outbound connections ...

  LF_PRINT_DEBUG("Waiting for inbound p2p network abstraction listener threads.");
  // Wait for each inbound network abstraction listener thread to close.
  // ... joins threads AFTER the memory is already freed ...

Meanwhile, listen_to_federates holds a local copy of the pointer:

  net_abstraction_t net = _fed.net_for_inbound_p2p_connections[fed_id];
  // ...
  while (!_lf_termination_executed) {
    // ...
    if (read_from_net_close_on_error(net, 1, buffer)) {

And read_from_net_close_on_error dereferences the (now freed) pointer on error:

int read_from_net_close_on_error(net_abstraction_t net_abs, size_t num_bytes, unsigned char* buffer) {
  LF_ASSERT_NON_NULL(net_abs);
  socket_priv_t* priv = (socket_priv_t*)net_abs;
  int read_failed = read_from_net(net_abs, num_bytes, buffer);
  if (read_failed) {
    shutdown_socket(&priv->socket_descriptor, false);  // priv points to freed memory!
    return -1;
  }
  return 0;
}

The crash sequence:

Main thread enters termination() → calls lf_terminate_execution
shutdown_net closes the socket (unblocking the listener's read()) and frees the socket_priv_t
Listener thread's blocked read() returns with an error
read_from_net_close_on_error accesses priv->socket_descriptor through the freed pointer → use-after-free
shutdown_socket writes *socket = -1 to the freed memory → heap corruption → SIGSEGV

Before PR #508, this was safe because shutdown_socket only closed the integer file descriptor — no heap memory was involved, so there was nothing for the concurrent thread to dereference into freed memory.

There's Also the Same Race in `handle_tagged_message`

The stop-tag discard path (seen in your trace) also frees the connection while listen_to_federates still holds a local pointer to it:

      _lf_done_using(message_token);
      // Close network abstraction, reading any incoming data and discarding it.
      shutdown_net(_fed.net_for_inbound_p2p_connections[fed_id], false);  // frees the memory
      _fed.net_for_inbound_p2p_connections[fed_id] = NULL;
      LF_MUTEX_UNLOCK(&env->mutex);
      return -1;

In this case, handle_tagged_message is called from listen_to_federates on the same thread, and the listener exits immediately afterward — so this path alone is safe. However, if lf_terminate_execution concurrently reads _fed.net_for_inbound_p2p_connections[fed_id] between the shutdown_net (free) and the = NULL assignment, it would call shutdown_net on a freed (non-NULL) pointer — a double-free.

Two Pre-Existing (Non-Segfault) Bugs Exposed in the Same Path

These existed before PR #508 but are worth noting:

1. Missing tag barrier decrement (decentralized mode). The tag barrier is incremented at line 546 but the stop-tag discard path returns at line 650 without calling _lf_decrement_tag_barrier_locked(env):

  _lf_increment_tag_barrier(env, intended_tag);

The normal exit and failed-read paths both decrement it, but the stop-tag path does not.

2. Token with ref_count = 0 passed to _lf_done_using. _lf_new_token creates a token with ref_count = 0. The stop-tag path calls _lf_done_using(message_token), which sees ref_count == 0, prints the "Token being freed that has already been freed" warning, and returns without freeing either the token or the message_contents payload — a memory leak.

Implemented Fix

The core fix is to separate socket shutdown from memory deallocation so that lf_terminate_execution can unblock the listener threads without freeing the memory they reference. Specifically:

Split shutdown_net into two phases: a shutdown_net that only closes the socket (to unblock reads), and a separate free_net that deallocates memory. Call shutdown_net before joining threads, and free_net after.

Copilot

Pull request overview

This PR addresses a federated runtime shutdown crash by separating “close the connection” from “free the network abstraction,” preventing listener threads from dereferencing freed heap memory during termination.

Changes:

Introduces close_net() (close-only) and refactors shutdown_net() to be close_net() + free_net().
Updates federate shutdown logic to close inbound P2P connections before joining listener threads, and free them only after listeners exit; also fixes the stop-tag discard path to properly free tokens and decrement the tag barrier.
Documentation-only update to LF code-fence language tags in initialize_from_file.h.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.

File	Description
util/initialize_from_file.h	Updates documentation code-fence language tags (`lf-c` → `lf`).
network/impl/src/lf_socket_support.c	Implements `close_net()` and refactors `shutdown_net()` to close then free.
network/api/net_abstraction.h	Documents and exposes `close_net()` and `free_net()` APIs, clarifying threading expectations.
core/federated/federate.c	Uses `close_net()` to unblock inbound listener threads before join; frees after join; fixes stop-tag discard token/barrier handling.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

core/federated/federate.c

edwardalee added 2 commits April 13, 2026 08:50

Fixes to prevent segfault on shutting down network connections

e71af80

Typos in comments

8a1effa

edwardalee requested review from Jakio815 and Copilot April 13, 2026 15:56

edwardalee added the bugfix label Apr 13, 2026

Copilot started reviewing on behalf of edwardalee April 13, 2026 15:56 View session

Copilot AI reviewed Apr 13, 2026

View reviewed changes

core/federated/federate.c Outdated Show resolved Hide resolved

core/federated/federate.c Outdated Show resolved Hide resolved

Apply Copilot suggestions

1416d01

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix segfault on closing down network connections#579

Fix segfault on closing down network connections#579
edwardalee wants to merge 3 commits intomainfrom
net-fixes

edwardalee commented Apr 13, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

edwardalee commented Apr 13, 2026

Root Cause: Use-After-Free from shutdown_net Freeing Heap Memory

The Key Change

The Race

There's Also the Same Race in handle_tagged_message

Two Pre-Existing (Non-Segfault) Bugs Exposed in the Same Path

Implemented Fix

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Root Cause: Use-After-Free from `shutdown_net` Freeing Heap Memory

There's Also the Same Race in `handle_tagged_message`