-
Notifications
You must be signed in to change notification settings - Fork 6.3k
fix(pty): fix child processes not being killed #6553
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Codex Review
Here are some automated review suggestions for this pull request.
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| impl Drop for ExecCommandSession { | ||
| fn drop(&mut self) { | ||
| #[cfg(unix)] | ||
| if let Ok(mut pid_guard) = self.pid.lock() { | ||
| if let Some(pid) = pid_guard.take() { | ||
| if let Err(e) = kill_child_process_group(pid) { | ||
| trace!("Failed to kill process group for pid {}: {}", pid, e); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Guard against killing reused PIDs after child exit
The session destructor kills a process group whenever pid is still set (kill_child_process_group(pid)), but the PID is only cleared in the wait thread after child.wait() returns. Because wait() reaps the process and frees its PID before the mutex is updated (lines ~233‑247), there is a brief window where the child has fully exited and its PID may already be reused by an unrelated process while pid_guard still contains the stale value. If the session is dropped during that window, getpgid/killpg will target the new process group and may SIGKILL an unrelated process tree. Consider checking exit_status before killing or clearing pid before calling wait() so that drop never acts on a PID that might already be recycled.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Taking a look.
|
Thanks for the contribution. Before I ask someone on the codex team to review your PR, could you please fix the lint issues? You can ignore the failed test. That's a flaky test that someone on the team is already working on fixing. |
Closes #5229
Closes #4337
Summary
Users have reported that commands sometimes hang. The root cause is because the timeout handler only kills the direct child process, but grandchildren processes are kept alive.
Note: While this appears similar to recently merged #5258, those changes did not address the same issue that was present in the
unified_execcodepath.Reproduction
If you run the command below, it will hang. Instead, it should yield back to Codex.
Solution
While
portable_ptyalready callssetsid()internally (creating a process group where pgid == pid), it only kills the direct child. This PR adds process group killing: