-
Notifications
You must be signed in to change notification settings - Fork 3.1k
libc/powerpc64: Fix swapcontext(3) #1759
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Here's a simple reproducer for those with ppc64 systems. This is largely taken from one of the Go test suites that highlighted the problem, but is straight C and nicely shows the crash. This same code works perfectly on a stock Linux/ppc64le host. |
|
Looks like the CI is failing for some reason completely unrelated to this patch. ⚡ |
|
Yea, something weird is up with CI. Any chance you can turn your 'reproducer' into a kyua test so we don't accidentally break this in the future? |
a583889 to
eda3aaf
Compare
|
@bsdimp Sure! Looks like this was never tested on any arch, should be good to go now. |
eda3aaf to
35d98e9
Compare
|
I think the "correct" route would be to restore the TOC upon return from the context call. Also, r12 should be saved (for ELFv2) in __makecontext(), so it can be restored (recalculate TOC as necessary). |
I had thought so too, but it doesn't work. The compiler generates a prologue on |
|
r1 isn't touched by _ctx_start(), and r2 is loaded by a standard prologue, which requires a sane r12. It looks like makecontext() is not saving off r12, so the context is bogus at the start of _ctx_start(). Something like (we're dying or switching, so all callee-saved registers are available, let's blow away r16 here): |
|
Right, but we know nothing about the previous program that was running when Only after that runs do we start executing inside _ctx_start from If we try to recompute |
|
r12 is the entry point of the target. Since we already know the address of _ctx_start() we can save that off again just to be safe. Actually, my saving _ctx_start() into r16 in makecontext() is silly, since I'm saving off the calculated r2 into r16 in _ctx_start directly, and restoring it, so it's always safe. Now we're not touching r1, and we have our known-good (if ABI is followed...) r2 (saved off in r16). If the ABI isn't followed, then all bets are off anyway. Even Go needs to follow the ELF ABI in order to play well with others. You mentioned in your comment that your change is kind of a hack that works, and I agree that there is a problem to solve, so I'm proposing a solution that's less of a hack, and more of an ABI certainty. A normal function call would result in r2 being restored from the stack upon return, but we don't need to do that in _ctx_start, we can just abuse the "safe" registers instead, and let the linker decide if it needs to do any fixups for r12 in the function calls. We don't know what the target ucontext is, so it's unsafe to switch to it; the TOC might be found later on (think an asm prologue stub to start a lightweight thread), so the only thing we can be certain of is the context that we have, assuming the ABI is followed. |
|
I'm definitely not saying my solution is the absolute correct way to do this, and you do have valid points. I've been thinking, and I wonder if I can just coax the compiler not to emit that prologue on |
|
The prologue is generated by the ENTRY() macro, defined in |
35d98e9 to
4e6cb2c
Compare
|
@chmeeedalf While it wasn't quite as simple as the proposed patch, the basic concepts appear to have worked. Thanks for the feedback, much appreciated. Does this look better? |
4e6cb2c to
1f02533
Compare
|
mc_gpr[2] is never set up in makecontext() (mc_gpr[3 + N] are, as is 1, but others are not). If you set mc->mc_gpr[12] to _ctx_start, then save r2 after the calculation, I think we're good. Creating the stack frame as you did is a good idea, too, for backtracing. |
|
@chmeeedalf Correct me if I'm wrong, but isn't If I try to use the TOC subsequently calculated in the |
|
Ah, I'm wrong, because mc_gpr[2] would've been set up by getcontext(), so it is valid, or valid for libsys, that is. However, %r2 is calculated in _ctx_start() from %r12, so we should make sure that's valid, otherwise it's garbage for any libc calls made from _ctx_start. So I think a hybrid approach between our patches is correct (make sure %r12 is valid going in, set up the stack frame, and push our now-valid %r2 to it, pop it off after the call) |
|
@chmeeedalf Now this is all starting to make more sense. Two bugs in the same general code combining to make a bigger mess. Let me crank on it a bit and update the patch set. 👍 |
|
%r2 is calculated from %r12, which is supposed to point to the function entry point in ELFv2. When making intra-library calls you can skip over that, because %r2 is valid for the entire library, but when passing function pointers around, the function pointer points to that prologue, so it's expecting %r12 to be that entry point. Garbage in, garbage out (mc->mc_gpr[12] is likely pointing to getcontext() at this point, so we end up with a bogus TOC pointer after calculating it in _ctx_start()'s prologue). |
1f02533 to
b3d7a58
Compare
|
@chmeeedalf Indeed! I'm slowly wrapping my head around this obscure part of the C library... Hopefully this looks good. It works fine in testing on my end. |
b3d7a58 to
2edf4fe
Compare
|
Overall looks good, thanks for humoring me on this. One minor nit: Don't restore %r1 before calling _ctx_done, just leave the frame as-is (it's _ctx_start()'s frame, after all). Aside from that, I approve. |
2edf4fe to
54efb8d
Compare
|
@chmeeedalf No worries at all, thanks for the thorough review! I think we now both know this section of the code far better than before. Since we're keeping our stack frame around, I also added the remainder of the ABI required save fields. Technically not needed, but also technically correct. 😉 |
54efb8d to
d2a4d30
Compare
|
You might hate me for this, but I don't think we can create a new stack frame for _ctx_start(), makecontext() has to create the full stack frame, and _ctx_start just use it, in order to allow for overflow arguments. I've just been reading through the makecontext() source, and I think it's broken with regard to how it sets up the stack frame, the 'sizeof(uintptr_t)*(stackargs + 2)' should probably be 'stackargs + 6' instead, in order to maintain alignment for both ELFv1 and ELFv2 (see down below in the 'if (argc > 8)' block). If we size it right, then we just populate the frame as we normally would, if _ctx_start() were making the actual call. So for ELFv2 we need space, allocated in makecontext(), for:
I don't think we're caring about ELFv1 anymore, so we can let it rot for now. If anyone is interested they can do the work to fix that side :) Does the above make sense? We're ~95% there fixing this for all use cases. |
On PowerPC platforms a valid link to the Table of Contents (TOC) is required for PLT lookups to function. This TOC pointer is stored in a dedicated register, and is used along with the stack pointer by both C prologue and PLT lookup code. When calling swapcontext() with uc_link != NULL, a PLT lookup to setcontext(3) is attempted from within the _ctx_done context. The exiting process has usually trashed both r1 and r2 at this point, leading to a crash within the PLT lookup before setcontext(2) is reached to restore the linked context. Save and restore r1 and r2, using r16 as a scratch register to bypass the prologue trampling r2. This ensures the subsequent PLT lookup to setcontext(3) succeeds. Signed-off-by: Timothy Pearson <[email protected]>
|
@chmeeedalf Hey, I'd rather get it right than push something that causes problems down the line... It does make sense, and no, I don't care about ELF v1. If we don't have v1 users we should probably start looking at dropping the code, honestly, but that's for another time. I'll buy the |
d2a4d30 to
33f3d03
Compare
|
Looks like we managed to confuse the style checker -- it thinks the pointer cast is a multiplication. 🤣 |
|
Looks good to me, thanks! I'll push it soon (tonight, probably). |
|
And now this, and PR #1756, are pushed. Thanks for your work! |
On PowerPC platforms a valid link to the Table of Contents (TOC) is required for PLT lookups to function. This TOC pointer is stored in a dedicated register, and is used along with the stack pointer by both C prologue and PLT lookup code. When calling swapcontext() with uc_link != NULL, a PLT lookup to setcontext(3) is attempted from within the _ctx_done context. The exiting process has usually trashed both r1 and r2 at this point, leading to a crash within the PLT lookup before setcontext(2) is reached to restore the linked context. Save and restore r2 as in a regular function. This ensures the subsequent PLT lookup to setcontext(3) succeeds. Signed-off-by: Timothy Pearson <[email protected]> MFC after: 1 week Pull Request: #1759
|
Landed. Thanks |
On PowerPC platforms a valid link to the Table of Contents (TOC) is required for PLT lookups to function. This TOC pointer is stored in a dedicated register, and is used along with the stack pointer by both C prologue and PLT lookup code. When calling swapcontext() with uc_link != NULL, a PLT lookup to setcontext(3) is attempted from within the _ctx_done context. The exiting process has usually trashed both r1 and r2 at this point, leading to a crash within the PLT lookup before setcontext(2) is reached to restore the linked context. Save and restore r2 as in a regular function. This ensures the subsequent PLT lookup to setcontext(3) succeeds. Signed-off-by: Timothy Pearson <[email protected]> MFC after: 1 week Pull Request: #1759 (cherry picked from commit 8efa35f)
On PowerPC platforms a valid link to the Table of Contents (TOC) is required for PLT lookups to function. This TOC pointer is stored in a dedicated register, and is used along with the stack pointer by both C prologue and PLT lookup code.
When calling swapcontext() with uc_link != NULL, a PLT lookup to setcontext(3) is attempted from within the _ctx_done context. The exiting process has usually trashed both r1 and r2 at this point[1], leading to a crash within the PLT lookup before setcontext(2) is reached to restore the linked context.
Restore r1 and r2 from the incoming context to ensure the PLT lookup to setcontext(3) succeeds. As this subsequently calls setcontext(2), which overwrites r1 and r2 from the same context a second time, this should be safe.
[1] For clarity, since the interaction between compiler and context assembly is not obvious, the text of the comment from the patch is reproduced below:
...the current stack and TOC have almost certainly been trashed by the callback function. Since the compiler "knows" (incorrectly) that the callback in _ctx_start() is noreturn, it does not reserve a stack frame in the function prologue, and does not emit an epilogue. Since the prologue always reloads the TOC based on the global entry point, corruption at return is all but guaranteed.
This is not a compiler bug; the compiler is free to use that optimization in this situation (we are literally changing context underneath the compiler here). It is simply something we need to be aware of due to the immediate PLT call before setcontext(2) is reached.