Skip to content

Use posix_spawn on modern glibc to avoid NSTask fork deadlocks#572

Merged
rfm merged 1 commit intognustep:masterfrom
johnathan-becker:johnb-patch-debugging-gdnc
Jan 23, 2026
Merged

Use posix_spawn on modern glibc to avoid NSTask fork deadlocks#572
rfm merged 1 commit intognustep:masterfrom
johnathan-becker:johnb-patch-debugging-gdnc

Conversation

@johnathan-becker
Copy link
Contributor

This change updates NSTask on Linux to use posix_spawn() instead of fork() when running on glibc 2.29 or newer. The old fork() path was unsafe in multi‑threaded applications and could deadlock before calling exec(), especially on newer distributions like Ubuntu 22.04. The new implementation uses posix_spawn_file_actions to set up stdin/stdout/stderr, PTY handling, signal defaults, file‑descriptor cleanup, and working‑directory changes through the glibc addchdir_np() extension. This keeps all setup work in the parent and ensures the child immediately executes the target program without touching unsafe APIs. Systems with glibc older than 2.29 continue using the existing fork() implementation. This improves reliability, avoids multi‑threaded deadlocks, and matches modern best practices without changing NSTask’s external behavior.

@johnathan-becker johnathan-becker requested a review from rfm as a code owner January 15, 2026 17:05
Copy link
Contributor

@rfm rfm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great, thanks.

@johnathan-becker
Copy link
Contributor Author

Looks great, thanks.

I will be adding tests just give me a few hours sorry didn't realize how tests worked in this repo. I have actually found a small issue with detachments so we should talk about it when I get the tests up.

@johnathan-becker johnathan-becker force-pushed the johnb-patch-debugging-gdnc branch from a0c72b1 to 6612b02 Compare January 15, 2026 20:42
@johnathan-becker
Copy link
Contributor Author

Ok I have added tests now. I would also like to mention an issue that I ran across writing tests and I have had copilot summarize the issue:

After switching NSTask from fork() to posix_spawn(), the existing test in launch.m:98 started failing. The test expects launched tasks to detach from the controlling terminal, which used to happen automatically with the old fork() + setsid() path.

The failure wasn’t due to NSTask itself, but due to environment restrictions.
POSIX_SPAWN_SETSID (the spawn equivalent of calling setsid() after fork()) is defined on modern glibc, but containers and certain namespace setups prevent session creation:

The flag is valid (glibc ≥ 2.26, kernel ≥ 3.4), but
Inside containers, posix_spawn() can return EINVAL or EPERM when attempting to start a new session
This occurs even on fully up‑to‑date systems (e.g., glibc 2.39, kernel 6.x) when the environment disallows setsid()

So the code was correct, but the environment blocked the operation.

Two coordinated changes fix the issue:

NSTask fallback logic
On glibc ≥ 2.35, NSTask now attempts to use POSIX_SPAWN_SETSID.
If the call fails with EINVAL or EPERM, it retries immediately using only POSIX_SPAWN_SETPGROUP. This preserves expected process‑group behavior even when a new session cannot be created.

The test no longer assumes that glibc ≥ 2.35 guarantees session creation.
It now checks at runtime:
The process group must always change
On systems where POSIX_SPAWN_SETSID should work, the test checks whether getsid(0) == getpid()
If the environment prevents a new session (containers), the test allows this and only fails if the process is still in the parent’s session and isn’t a session leader

So in summary:
NSTask behaves correctly both in normal Linux environments and inside containers
In unrestricted environments, behavior matches the old fork() + setsid() path
In restricted environments, NSTask still creates a new process group but does not detach from the TTY, which is expected
All tests pass with the new logic
No API changes or behavior differences for callers; fallback is fully transparent

@rfm rfm self-requested a review January 17, 2026 15:23
@rfm
Copy link
Contributor

rfm commented Jan 17, 2026

It used to be best practice to detach long-lived background tasks (daemons) from the controlling terminal, presumably because it's really bad user interface to have odd messages written to a terminal. Loss of that feature doesn't seem hugely harmful but I suppose we should really try to find a way to restore it.
Unfortunately tests are failing on 6 out of 7 platforms: to get the patch to a state where it can be accepted we need zero test failures on any system.

@johnathan-becker
Copy link
Contributor Author

It used to be best practice to detach long-lived background tasks (daemons) from the controlling terminal, presumably because it's really bad user interface to have odd messages written to a terminal. Loss of that feature doesn't seem hugely harmful but I suppose we should really try to find a way to restore it. Unfortunately tests are failing on 6 out of 7 platforms: to get the patch to a state where it can be accepted we need zero test failures on any system.

Yes I agree we should bring this functionality back. The thing is most newer versions of Glibc support POSIX_SPAWN_SETSID so this is really only a backstop for older versions of Glibc that are more than 6 years old. I will figure out the test failures thank you for your response.

@johnathan-becker johnathan-becker force-pushed the johnb-patch-debugging-gdnc branch from 6612b02 to 1ed5140 Compare January 20, 2026 21:24
@johnathan-becker
Copy link
Contributor Author

Ok @rfm I believe this was actually a problem with the tests. I think I have fixed this I tried several different linux platforms. Can you please re-run the CI?

@rfm
Copy link
Contributor

rfm commented Jan 20, 2026

Running

@johnathan-becker johnathan-becker force-pushed the johnb-patch-debugging-gdnc branch from 58b621a to abf2342 Compare January 21, 2026 17:48
@johnathan-becker
Copy link
Contributor Author

I’ve made a set of changes that should address the issue. I ran into significant difficulty getting detachment to work reliably across certain platforms, and it appears this behavior is dependent on the glibc version in use. After reviewing the changes more critically, I wasn’t comfortable with the earlier modification to the existing processgroup.m helper, so I’ve reverted that adjustment.
The resulting approach is a compromise: we primarily attempt to use posix_spawn, since some Linux environments are known to deadlock when using fork(). However, when detachment is required and the posix_spawn path fails, we explicitly fall back to the previous fork()-based implementation. This strikes a balance between reducing the risk of deadlocks and preserving existing behavior. While there is still a residual risk that the fallback path could encounter the same issue, it is significantly less likely and avoids introducing a breaking change.

@johnathan-becker
Copy link
Contributor Author

I believe we can rerun the pipelines now and they all pass for me locally.

@rfm rfm merged commit f9610ec into gnustep:master Jan 23, 2026
8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

2 participants