fix: memory leak by using process groups to kill orphaned grandchildren#270
fix: memory leak by using process groups to kill orphaned grandchildren#270Joseph19820124 wants to merge 1 commit into
Conversation
f6b5882 to
1875c72
Compare
1875c72 to
3bc9762
Compare
chaodu-agent
left a comment
There was a problem hiding this comment.
LGTM 🙏
核心改動正確:process_group(0) + Drop 裡 SIGTERM 整個 group,乾淨解決孤兒程序 memory leak。
✅ kill_on_drop 已移除,不再重疊
✅ SIGKILL → SIGTERM,更 graceful
✅ regression test 覆蓋
✅ scope 乾淨,只動核心邏輯
Minor follow-ups (not blocking):
- Consider SIGTERM → wait → SIGKILL fallback for processes that trap SIGTERM
- Version bump in Cargo.lock (0.6.6 → 0.7.2) ideally should be in a separate commit
Technical Note: SIGTERM vs. SIGKILL and Graceful ShutdownI've updated the PR to use
This can be tracked as a separate follow-up task to further improve the reliability of the agent lifecycle management. |
Comparative Analysis: OpenClaw (Node.js/PTY) vs. openab (Rust/Process Groups)I've looked into how other tools like OpenClaw handle the 'grandchild leak' issue to ensure our approach in openab is robust. Here’s a breakdown of the architectural differences: 1. OpenClaw's Approach: Pseudo-Terminals (PTY)OpenClaw (Node.js) uses
2. openab's Approach: Process Groups (PGID)Since openab is built in Rust and prioritizes performance, we use standard pipes via
ConclusionUsing Process Groups is the most 'idiomatic' and lightweight way to solve this in Rust. It provides precise control over the agent's process tree while maintaining the performance benefits of standard I/O pipes. Our current implementation of |
|
@chaodu-agent Thanks for the review and approval! Regarding your follow-up suggestions:
@pahud Thanks for taking a look as well! |
|
@chaodu-agent Thanks for the review and approval! Regarding your follow-up suggestions:
@pahud Thanks for taking a look as well! |
masami-agent
left a comment
There was a problem hiding this comment.
Thanks for tackling this — the process group approach is the right direction for cleaning up orphaned grandchildren. A few things I'd like to flag before this merges:
1. Version bump should be in a separate PR
The Cargo.lock change from 0.6.6 → 0.7.2 is unrelated to this fix. Please revert it and handle version bumps in a dedicated commit/PR to keep history clean.
2. kill_on_drop(true) should be preserved as a fallback
Removing kill_on_drop means if Drop doesn't run (e.g. panic abort, or future refactoring that skips drop), the direct child process will leak. kill_on_drop and the process group SIGTERM are complementary — keeping both gives defense in depth. I'd recommend adding it back.
3. Guard against pid == 0 in Drop
If pid is 0, kill(-0, SIGTERM) sends SIGTERM to the caller's own process group, which would kill openab itself. A simple guard would prevent this:
if let Some(pid) = self._proc.id() {
let pgid = pid as i32;
if pgid > 0 {
unsafe { libc::kill(-pgid, libc::SIGTERM); }
}
}4. Windows fallback
The Drop impl is #[cfg(unix)] only, which is fine — but since kill_on_drop(true) was removed unconditionally, Windows builds now have no cleanup at all. At minimum, kill_on_drop(true) should remain for non-unix targets.
The SIGTERM vs SIGKILL discussion and the test coverage look good. Happy to re-review once the above are addressed.

Fixes #269
Problem Summary
When
openabterminates an agent process, only the direct child (kiro-cli) was being killed. Its sub-processes (kiro-cli-chat) were becoming orphaned and consuming significant memory (~300MB per session).Changes
libcdependency.tokio::process::Commandto use a new process group (process_group(0)) for each agent.DropforAcpConnectionto sendSIGKILLto the entire process group (-PGID) on cleanup.src/acp/connection.rsthat verifies orphaned grandchild cleanup.Verified with
cargo test test_process_group_cleanup.