Summary
In Clay v2.23.0 single-user mode (no osUsers), each sdk.query() call spawns a claude child process via the Claude Agent SDK. After the query stream completes, the process is never terminated — it remains as a child of the daemon indefinitely. Over normal usage this leads to dozens of orphaned claude processes consuming gigabytes of RAM.
Environment
- Clay v2.23.0, Agent SDK v0.2.92
- Linux (systemd service), single-user mode, no
osUsers
- Daemon running as root
Observed Behavior
After ~24 hours of normal usage:
- 35 orphaned
claude processes (all children of daemon PID)
- ~7.8 GB RSS consumed
- All orphans had exactly 2 file descriptors (base stdio sockets only — no active client connections)
- Only the 3 sessions with active WebSocket clients had 3-4 FDs
Root Cause Analysis
In sdk-bridge.js, the in-process query path (startQuery(), used when linuxUser is null):
sdk.query() is called at line ~1767, spawning a claude child process via the Agent SDK
processQueryStream() iterates the async iterable (for await ... of session.queryInstance)
- When the stream ends, the
finally block (line ~1594) sets session.queryInstance = null but never calls .close() or [Symbol.dispose]() on the query object
Compare this to the worker path (startQueryViaWorker()), which properly calls cleanupSessionWorker() → worker.kill() → sends "shutdown" + SIGKILL after 3s. The in-process path has no equivalent cleanup.
Suggested Fix
In processQueryStream()'s finally block, before nulling queryInstance, explicitly close/dispose it:
// In the finally block of processQueryStream():
if (session.queryInstance) {
// Close the SDK query to terminate the underlying claude process
if (typeof session.queryInstance.close === 'function') {
try { session.queryInstance.close(); } catch (e) {}
} else if (typeof session.queryInstance[Symbol.dispose] === 'function') {
try { session.queryInstance[Symbol.dispose](); } catch (e) {}
}
}
session.queryInstance = null;
Additionally, deleteSession() and deleteSessionQuiet() in sessions.js should also ensure any active query's process is killed (they currently only call abortController.abort() and messageQueue.end(), which don't terminate the child process).
Workaround
Manually kill orphaned processes:
# Find claude processes with only 2 sockets (orphaned, no active client)
for pid in $(ps -eo pid,cmd --no-headers | grep 'claude$' | awk '{print $1}'); do
sockets=$(ls -la /proc/$pid/fd 2>/dev/null | grep -c socket)
if [ "$sockets" -le 2 ]; then
kill -TERM $pid
fi
done
Impact
On a machine with 8 GB RAM, this exhausted available memory within ~24 hours of normal usage, likely degrading performance of all sessions.
Summary
In Clay v2.23.0 single-user mode (no
osUsers), eachsdk.query()call spawns aclaudechild process via the Claude Agent SDK. After the query stream completes, the process is never terminated — it remains as a child of the daemon indefinitely. Over normal usage this leads to dozens of orphanedclaudeprocesses consuming gigabytes of RAM.Environment
osUsersObserved Behavior
After ~24 hours of normal usage:
claudeprocesses (all children of daemon PID)Root Cause Analysis
In
sdk-bridge.js, the in-process query path (startQuery(), used whenlinuxUseris null):sdk.query()is called at line ~1767, spawning aclaudechild process via the Agent SDKprocessQueryStream()iterates the async iterable (for await ... of session.queryInstance)finallyblock (line ~1594) setssession.queryInstance = nullbut never calls.close()or[Symbol.dispose]()on the query objectCompare this to the worker path (
startQueryViaWorker()), which properly callscleanupSessionWorker()→worker.kill()→ sends "shutdown" + SIGKILL after 3s. The in-process path has no equivalent cleanup.Suggested Fix
In
processQueryStream()'sfinallyblock, before nullingqueryInstance, explicitly close/dispose it:Additionally,
deleteSession()anddeleteSessionQuiet()insessions.jsshould also ensure any active query's process is killed (they currently only callabortController.abort()andmessageQueue.end(), which don't terminate the child process).Workaround
Manually kill orphaned processes:
Impact
On a machine with 8 GB RAM, this exhausted available memory within ~24 hours of normal usage, likely degrading performance of all sessions.