Skip to content

Conversation

zygfrydw
Copy link

Fixed race condition in TransformListener which happens when TransformListener destructor is called before dedicated thread reaches executor->spin()

The issue is reported here

The alternative solution is to add thread-safe executor:

class SingleTreadedThreadSafeExecutor : public rclcpp::executors::SingleThreadedExecutor{
public:
  SingleTreadedThreadSafeExecutor() : is_terminated_(false)
  {

  }

  void spin_thread_safe(){
    if (spinning.exchange(true)) {
      throw std::runtime_error("spin() called while already spinning");
    }
    RCLCPP_SCOPE_EXIT(this->spinning.store(false); );
    while (rclcpp::ok(this->context_) && !is_terminated_ && spinning.load()) {
      rclcpp::AnyExecutable any_executable;
      if (get_next_executable(any_executable)) {
        execute_any_executable(any_executable);
      }
    }
  }
  void shutdown(){
    is_terminated_ = true;
    this->cancel();
  }

protected:
  std::atomic_bool is_terminated_;
};


void TransformListener::initThread(
  rclcpp::node_interfaces::NodeBaseInterface::SharedPtr node_base_interface)
{
  auto executor = std::make_shared<SingleTreadedThreadSafeExecutor>();

  // This lambda is required because `std::thread` cannot infer the correct
  // rclcpp::spin, since there are more than one versions of it (overloaded).
  // see: http://stackoverflow.com/a/27389714/671658
  // I (wjwwood) chose to use the lamda rather than the static cast solution.
  auto run_func =
    [executor](rclcpp::node_interfaces::NodeBaseInterface::SharedPtr node_base_interface) {
      executor->add_node(node_base_interface);
      executor->spin();
      executor->remove_node(node_base_interface);
    };
  dedicated_listener_thread_ = thread_ptr(
    new std::thread(run_func, node_base_interface),
    [executor](std::thread * t) {
      executor->shutdown();
      t->join();
      delete t;
      // TODO(tfoote) reenable callback queue processing
      // tf_message_callback_queue_.callAvailable(ros::WallDuration(0.01));
    });
  // Tell the buffer we have a dedicated thread to enable timeouts
  buffer_.setUsingDedicatedThread(true);
}

@zygfrydw zygfrydw force-pushed the fix-race-in-transfrom-listener-destructor branch from 72e5d8b to 933609a Compare April 14, 2022 15:05
@zygfrydw zygfrydw force-pushed the fix-race-in-transfrom-listener-destructor branch from 39b3eaf to 5326e5a Compare April 15, 2022 07:58
@zygfrydw
Copy link
Author

If you like to include a unit test that checks the path on which this bug occurs, I have implemented an appropriate test here.
However, my test requires extending TransformListener constructor with a thread factory, allowing to inject test code to thread construction.

I have excluded the change from this PR because it may be too intrusive for a public API of TransformListener. If you find it useful, I will include the changes in this PR.

@clalancette clalancette self-requested a review July 28, 2022 13:15
@clalancette clalancette self-assigned this Jul 28, 2022
@CursedRock17
Copy link
Contributor

I found this PR when delving into both the respective issue and the essentially identical #73, but I belive this was fixed in #442 with an updated destructor:

  if (spin_thread_) {
    executor_->cancel();
    dedicated_listener_thread_->join();
  }

which seems to prevent steps 6 and 7

Dedicated thread at the beginning of executor->spin() sets executor.spinning = true
Dedicated thread is spinning waiting for messages or cancel which will not happen because main thread is blocked in t->join();

The current implementation currently works for me as there's no longer a spinning node, nor any "loose" threads.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants