Skip to content

Implement a workaround for ld-prime hitting an assert on large addends#124721

Open
filipnavara wants to merge 5 commits into
dotnet:mainfrom
filipnavara:ld-prime-addend
Open

Implement a workaround for ld-prime hitting an assert on large addends#124721
filipnavara wants to merge 5 commits into
dotnet:mainfrom
filipnavara:ld-prime-addend

Conversation

@filipnavara

@filipnavara filipnavara commented Feb 22, 2026

Copy link
Copy Markdown
Member

We translate the IMAGE_REL_BASED_RELPTR32 relocation into ARM64_RELOC_SUBTRACTOR and ARM64_RELOC_UNSIGNED pair on ARM64. To emulate the behavior of PC relative relocation we bake the section-relative PC offset into the addend with negative sign. The ARM64_RELOC_SUBTRACTOR relocation then subtract the base address of the section and finally the ARM64_RELOC_UNSIGNED relocation adds the target symbol address.

This works fine for addends that fit into signed 20-bit integer, but ld-prime hits an assert when the addend is larger. To workaround it we create anchor labels at fixed 2^19 byte offsets in the section and adjust the addend to be relative to the nearest anchor label. This way we can guarantee the addend for ARM64_RELOC_SUBTRACTOR is
always within signed 20-bit range.

Same logic applies to X86_64_RELOC_SUBTRACTOR + X86_64_RELOC_UNSIGNED pair for x64 targets.

Fixes #119380

@github-actions github-actions Bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Feb 22, 2026
@dotnet-policy-service dotnet-policy-service Bot added the community-contribution Indicates that the PR has been added by a community member label Feb 22, 2026
@filipnavara filipnavara added os-mac-os-x macOS aka OSX os-ios Apple iOS area-NativeAOT-coreclr and removed area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI labels Feb 22, 2026
@filipnavara filipnavara marked this pull request as ready for review February 22, 2026 09:04
@filipnavara filipnavara marked this pull request as draft February 22, 2026 09:47
@filipnavara

filipnavara commented Feb 22, 2026

Copy link
Copy Markdown
Member Author

Needs more work. I'll investigate the unit test failures first.

@filipnavara

filipnavara commented Feb 22, 2026

Copy link
Copy Markdown
Member Author

I don't get the System.Linq.Expressions test failure on Xcode 26.2 anymore. According the log the compiler/linker version on CI is some version of Xcode 26.x Xcode 16.4.

@filipnavara

Copy link
Copy Markdown
Member Author

Going back to the drawing board now.

@filipnavara

Copy link
Copy Markdown
Member Author

@akoeplinger Is there some easy way to switch some CI lane to use newer Xcode? They should be available in the runner images, just not as default.

@akoeplinger

Copy link
Copy Markdown
Member

I assume adding xcode-select to the path mentioned in https://github.com/actions/runner-images/blob/main/images/macos/macos-15-Readme.md#xcode somewhere early in the build should suffice?
You could add it to build.sh (if it's just for testing).

@filipnavara

Copy link
Copy Markdown
Member Author

I assume adding xcode-select to the path mentioned in https://github.com/actions/runner-images/blob/main/images/macos/macos-15-Readme.md#xcode somewhere early in the build should suffice?

Thanks, seems to pass the smoke test with Xcode 26.2. I'll probably open a separate PR to check what is the behavior just with the Xcode bump alone.

@dotnet-policy-service

Copy link
Copy Markdown
Contributor

Draft Pull Request was automatically closed for 30 days of inactivity. Please let us know if you'd like to reopen it.

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds a workaround in the Mach-O object writer to avoid Apple ld-prime assertions when emitting IMAGE_REL_BASED_RELPTR32 as *_RELOC_SUBTRACTOR + *_RELOC_UNSIGNED with large addends, by introducing reusable per-section temporary labels to keep addends within the linker’s expected signed 20-bit range.

Changes:

  • Track and emit per-section temporary labels to bound relocation addends for IMAGE_REL_BASED_RELPTR32 on ARM64 and x64 .eh_frame.
  • Adjust Mach relocation emission to optionally reference a temporary label symbol (instead of the section base symbol) for SUBTRACTOR relocs.
  • Modify build.sh to attempt switching Xcode versions on macOS.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.

File Description
src/coreclr/tools/Common/Compiler/ObjectWriter/MachObjectWriter.cs Adds temporary-label generation/reuse and uses label symbol indices to avoid ld-prime large-addend assertions.
build.sh Adds macOS-only logic to switch Xcode via sudo xcode-select.

Comment thread src/coreclr/tools/Common/Compiler/ObjectWriter/MachObjectWriter.cs Outdated
Comment thread src/coreclr/tools/Common/Compiler/ObjectWriter/MachObjectWriter.cs Outdated
Comment thread build.sh Outdated
@filipnavara

Copy link
Copy Markdown
Member Author

The version from @agocke resulted in corrupted __unwind_info for several of the tests. In case of System.Runtime.Tests the difference was 0x84 stripped bytes. I compiled it on a machine with Xcode 16.4 (which matches the CI) and it used ld-classic for the linking. I'll try to match it to one of the known ld-classic issues next.

@filipnavara

Copy link
Copy Markdown
Member Author

So, the approach with labels at fixed offset (as suggested by @agocke) fails with ld-classic and doesn't fail with ld-prime (Xcode 26.5). The resulting executable contains broken unwinding information which correlates to the following pattern in the source object file:

000000000012fa20 S _Moq_Moq_Properties_Resources__get_NoMatchingCallsBetweenExclusive
000000000012fa20 s lanchor1_2
...
00000000001afa20 S _System_Linq_Expressions_System_Linq_Expressions_Expression_TypeBinaryExpressionProxy___ctor
00000000001afa20 s lanchor1_3

Both of the anchors are marked with N_ALT_ENTRY and at the same time align precisely with the start of a method. This is known to break ld-classic since it produces incorrect atoms and later the unwinding information. We hit this earlier with some assembly helpers. It is likely fixable by avoiding the N_ALT_ENTRY flag for those labels or collapsing those labels to the regular symbol. Not sure if that can be done efficiently though.

@filipnavara

filipnavara commented Jun 5, 2026

Copy link
Copy Markdown
Member Author

Couple more observations:

  • Just stripping N_ALT_ENTRY from anchor symbols overlapping real symbol is not enough to make ld-classic produce valid output. (symbol aliases apparently do work to certain extent but it's not the root issue responsible for the corruption)
  • Using L-prefixed symbols (as opposed to l-prefixed) doesn't work either. While ld-classic does ignore those symbols for the purpose of creating atoms it also has a bug when the symbols are used as relocation targets. It internally resolves them to atom start symbol + addend (ie. nearest preceeding non-temporary symbol + addend) and then incorrectly drops the addend when writing the relocation down.
  • The previous patch incorrectly produced the relocation anchors even for unintialized sections. We need to add !_sections[sectionIndex].IsInFile condition to SectionNeedsRelocAnchors.
  • Lastly, we are hitting a bug where N_ALT_ENTRY inside a code section breaks the unwind info table mapping because the linker incorrectly treats the address as another function start. This is fixed in recent ld-prime, but apparently it was broken even in ld-classic.

@filipnavara

filipnavara commented Jun 5, 2026

Copy link
Copy Markdown
Member Author

I have tried to attack the "relocation anchor grid" approach pretty much from all possible angles, but I think it's impossible to execute due to a combination of ld-classic bugs.

  • We cannot use L-prefixed symbols that don't split atoms because of a bug in relocations. Instead of writing the correct value it maps to the nearest non-temporary symbol below, computes the distance to it and then silently throws it away and corrupts the output.
  • Normal temporary symbols are represented correctly in relocations but they break up atoms. This matters because it produces incorrect LC_FUNCTION_STARTS and unwind info. Consider function F which spans across the anchor boundary A. The output file gets two function starts instead of one - F[start] to A and A to F[end]. Correspondingly, the unwind information gets corrupted and we get it only for F[start] to A.
  • N_ALT_ENTRY ensures the split atoms stay next to each other in the final linker layout but it still corrupts the metadata for the said code.

Given these constraints any artificially added label inside code section is a non-starter.

That said, the current state of the PR is not fully correct either. It generates temporary labels too. While it does so in different fashion it works only as a coincidence of the fact that we don't use IMAGE_REL_BASED_RELPTR32 inside code sections. I guess the only reasonable course of action is to locate the nearest label below the address of the relocation and write the relocation relative to it. We can emit temporary labels in non-code sections if necessary.

Copilot AI review requested due to automatic review settings June 5, 2026 14:28

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated 6 comments.

Comment thread src/coreclr/tools/Common/Compiler/ObjectWriter/MachObjectWriter.cs Outdated
Comment thread src/coreclr/tools/Common/Compiler/ObjectWriter/MachObjectWriter.cs Outdated
Comment thread src/coreclr/tools/Common/Compiler/ObjectWriter/MachObjectWriter.cs Outdated
Comment thread src/coreclr/tools/Common/Compiler/ObjectWriter/MachObjectWriter.cs Outdated
Comment thread src/coreclr/tools/Common/Compiler/ObjectWriter/MachObjectWriter.cs Outdated
Comment thread src/coreclr/tools/Common/Compiler/ObjectWriter/MachObjectWriter.cs Outdated
Copilot AI review requested due to automatic review settings June 5, 2026 17:32

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated 6 comments.

Comment thread src/coreclr/tools/Common/Compiler/ObjectWriter/MachObjectWriter.cs
Comment thread src/coreclr/tools/Common/Compiler/ObjectWriter/MachObjectWriter.cs Outdated
Comment thread src/coreclr/tools/Common/Compiler/ObjectWriter/MachObjectWriter.cs
Comment thread src/coreclr/tools/Common/Compiler/ObjectWriter/MachObjectWriter.cs Outdated
Comment thread build.sh Outdated
Comment thread build.sh Outdated
We translate the IMAGE_REL_BASED_RELPTR32 relocation into ARM64_RELOC_SUBTRACTOR and
ARM64_RELOC_UNSIGNED pair on ARM64. To emulate the behavior of PC relative relocation
we bake the section-relative PC offset into the addend with negative sign. The
ARM64_RELOC_SUBTRACTOR relocation then subtract the base address of the section and
finally the ARM64_RELOC_UNSIGNED relocation adds the target symbol address.

This works fine for addends that fit into signed 20-bit integer, but ld-prime hits an
assert when the addend is larger. To workaround it we create anchor labels at fixed
2^19 byte offsets in the section and adjust the addend to be relative to the nearest
anchor label. This way we can guarantee the addend for ARM64_RELOC_SUBTRACTOR is
always within signed 20-bit range.

Same logic applies to X86_64_RELOC_SUBTRACTOR + X86_64_RELOC_UNSIGNED pair for x64
targets.

Co-authored-by: Andy Gocke <andy@commentout.net>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings June 5, 2026 19:41

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated 4 comments.

// We cannot emit anchor symbols in executable sections as they may split
// existing functions into multiple atoms, and ld-classic corrupts the
// unwinding information for such functions. In theory we could support this
// by using nearest preceeding symbol as an anchor but that requires more
Comment thread src/coreclr/tools/Common/Compiler/ObjectWriter/MachObjectWriter.cs
Comment thread src/coreclr/tools/Common/Compiler/ObjectWriter/MachObjectWriter.cs
// unwinding information for such functions. In theory we could support this
// by using nearest preceeding symbol as an anchor but that requires more
// complex handling and we don't have any such relocs in our current scenarios.
throw new NotSupportedException("Executable sections cannot contain RELPTR32 relocations on Mach-O");

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is intentionally broad. It could even be an "assert" if we decide that's enough.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Assert would be fine, we don't do textual exceptions in the compiler ourside the object writer. They introduce localization burden.

@filipnavara

filipnavara commented Jun 5, 2026

Copy link
Copy Markdown
Member Author

@MichalStrehovsky @agocke I iterated on it a bit. This is now a reduced version based on Andy's idea of generating anchor labels at fixed distances. Unlike the previous iteration we only do that for sections where the problematic relocations are actually used. In addition I added a check that would throw an exception if someone tried to use the relocation in a code section. It's easier to trace back the exception to a comment, run "git blame" and find this PR than to stare at randomly corrupted linker output.

Retested with ld-classic (Xcode 16.4), ld-prime (Xcode 26.2) and ld-prime (Xcode 26.4).

@MichalStrehovsky

Copy link
Copy Markdown
Member

/azp run runtime-nativeaot-outerloop

@azure-pipelines

Copy link
Copy Markdown
Azure Pipelines successfully started running 1 pipeline(s).

@filipnavara

Copy link
Copy Markdown
Member Author

The last Azure Pipelines check for osx-arm64 died (timeout?) without reporting it back to GitHub. Rest of failures seems to be the same across all platforms.

@MichalStrehovsky

Copy link
Copy Markdown
Member

The last Azure Pipelines check for osx-arm64 died (timeout?) without reporting it back to GitHub. Rest of failures seems to be the same across all platforms.

Yeah, #129123 reverted the break. I'll just retrigger.

@MichalStrehovsky

Copy link
Copy Markdown
Member

/azp run runtime-nativeaot-outerloop

@azure-pipelines

Copy link
Copy Markdown
Azure Pipelines successfully started running 1 pipeline(s).

@MichalStrehovsky

Copy link
Copy Markdown
Member

/azp run runtime-extra-platforms

@azure-pipelines

Copy link
Copy Markdown
Azure Pipelines successfully started running 1 pipeline(s).

@filipnavara

Copy link
Copy Markdown
Member Author

The CI for outerloop is still pinned on Xcode 16.4 and ld-classic. The failure in System.Text.Json.SourceGeneration.Roslyn4.4.Tests is thus unrelated to the fix in this PR. I downloaded the Helix payload and double checked that ld64 955.13 was used for the linking. We will need to do separate analysis to see why the unwind information seems to be corrupted there.

Copilot AI review requested due to automatic review settings June 9, 2026 06:00

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

Comment on lines +518 to +524
long labelOffset = offset & ~(RelocAnchorGranularity - 1);
long stored = addend - (offset - labelOffset);
Debug.Assert(stored >= -(1L << RelocAnchorLog2Granularity) && stored < (1L << RelocAnchorLog2Granularity));
BinaryPrimitives.WriteInt32LittleEndian(
data,
BinaryPrimitives.ReadInt32LittleEndian(data) +
(int)(addend - offset));
(int)stored);
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area-NativeAOT-coreclr community-contribution Indicates that the PR has been added by a community member os-ios Apple iOS os-mac-os-x macOS aka OSX

Projects

None yet

Development

Successfully merging this pull request may close these issues.

ld64 crashing while Native AOT compiling Microsoft MCP for .NET 10

5 participants