[AutoBump] Merge with 258a5d49 (Nov 20) (11) #473

jorickert · 2025-02-17T22:13:34Z

No description provided.

This patch updates predicate and backend tests for FEXPA instructions to match [latest spec](https://developer.arm.com/documentation/ddi0602/2024-09/SVE-Instructions/FEXPA--Floating-point-exponential-accelerator-).

These nodes should appear between CALLSEQ_START / CALLSEQ_END. Previously, they could be scheduled after CALLSEQ_END because the nodes didn't update the chain. The change in a test is due to X86 call frame optimizer pass bailing out for a particular call when CALLSEQ_START / CALLSEQ_END are not in the same basic block. This happens because SEG_ALLOCA is expanded into a sequence of basic blocks early. It didn't bail out before because the closing CALLSEQ_END was scheduled before SEG_ALLOCA, in the same basic block as CALLSEQ_START. While here, simplify creation of these nodes: allocating a virtual register and copying `Size` into it were unnecessary.

…lts to improve further folding (llvm#116419) Currently when creating a SHUFPD immediate mask, any undef shuffle elements are set to 0, which can limit options for further shuffle combining. This patch attempts to canonicalize the mask to improve folding: first by detecting a per-lane broadcast style mask (which can allow us to fold to UNPCK instead), and second ensure any undef elements are set to an 'inplace' value to improve chances of the SHUFPD later folding to a BLENDPD (or be bypassed in a SimplifyMultipleUseDemandedVectorElts call). This is very similar to canonicalization we already attempt in getV4X86ShuffleImm for vXi32/vXf32 SHUFPS/SHUFD shuffles.

A follow up to PR llvm#116402 to add a regression test. The original change fixed the reproducer but that was not suitable to use as a regression test. This test case will fail with a LLD prior to llvm#116402. The disassembly for the thunk that starts as a short thunk but is later a long thunk isn't quite right. It is missing a $d mapping symbol. I think this can be fixed, but I've not done that in this patch to keep it test only. It is not a regression introduced in llvm#116402. I've also removed a spurious --threads=1 I noticed in the original test aarch64-thunk-bti.s

FPCLASS is a unary instruction with an immediate operand - update the naming to match similar instructions (e.g. VPSHUFD) by only using the source reg/mem and immediate in the instruction name

…6651) Summary: For Linux systems, we currently use the HSA library to determine the installed GPUs. However, this isn't really necessary and adds a dependency on the HSA runtime as well as a lot of overhead. Instead, this patch uses the `sysfs` interface exposed by `amdkfd` to do this directly.

…lvm#116195) The entries in the dependency matrix can contain a lot of duplicates, which is unnecessary and results in more checks that we can avoid, and this patch adds that.

…vm#112345)

…n function signatures. (llvm#116146) In loongarch64 LP64D ABI, `unsigned 32-bit` types, such as unsigned int, are stored in general-purpose registers as proper sign extensions of their 32-bit values. Therefore, Flang also follows it if a function needs to be interoperable with C. Reference: https://github.com/loongson/la-abi-specs/blob/release/lapcs.adoc#Fundamental-types

Closes llvm#112068.

…#116590) This patch generalizes llvm#81027 to handle pattern `and/or (fcmp ord/uno X, 0), (fcmp pred fabs(X), Y)`. Alive2: https://alive2.llvm.org/ce/z/tsgUrz The correctness is straightforward because `fcmp ord/uno X, 0.0` is equivalent to `fcmp ord/uno fabs(X), 0.0`. We may generalize it to handle fneg as well. Address comment llvm#116065 (review)

…vm#116247) There are lots of places where we try to estimate the runtime vectorisation factor based on the getVScaleForTuning TTI hook. I've added a new getEstimatedRuntimeVF function and taught several places in the vectoriser to use this new function.

This was a typo in llvm#112983 that didn't cause build failures but is still wrong.

Part of llvm#51787. Follow up of llvm#116243. This patch adds constexpr support for the built-in reduce mul function.

In ModuleToObject flow, users may want to add some callback functions invoked with LLVM IR/ISA for debugging or other purposes.

I'll be modifying this test in a future PR.

) Sink shuffle operands of FMul instructions if these are splats, as we can generate lane-indexed variants for these.

…vm#114508) The `rd` operand of AMCAS instructions is both read and written, because of the nature of compare-and-swap operations, but currently it is not declared as such. Fix it for upcoming codegen enablement changes. In order to do that, a piece of LoongArchAsmParser logic that relied on TableGen-erated enum variants being ordered in a specific way needs updating; this will be addressed in a following refactor. No functional change intended. While at it, restore vertical alignment for the definition lines. Suggested-by: tangaac <[email protected]> Link: llvm#114398 (comment)

Add variant with different metadata on all loads, for llvm#115868

…hen there are predicate calls (llvm#116075) On loongarch64 with lsx extension, we select `VBITREV_W` for `v4i32 (xor X, (shl splat(1), Y))`: https://github.com/llvm/llvm-project/blob/8e6630391699116641cf390a10476295b7d4b95c/llvm/lib/Target/LoongArch/LoongArchLSXInstrInfo.td#L1583-L1584 And `vsplat_imm_eq_1` is defined as: https://github.com/llvm/llvm-project/blob/8e6630391699116641cf390a10476295b7d4b95c/llvm/lib/Target/LoongArch/LoongArchLSXInstrInfo.td#L77-L87 For the `(bitconvert (v4i32 (build_vector)))` case, the pattern is expected to be: ``` PATTERN: (xor:{ *:[v4i32] } v4i32:{ *:[v4i32] }:$vj, (shl:{ *:[v4i32] } (bitconvert:{ *:[v4i32] } (build_vector:{ *:[v4i32] }))<<P:Predicate_vsplat_imm_eq_1>>, v4i32:{ *:[v4i32] }:$vk)) RESULT: (VBITREV_W:{ *:[v4i32] } v4i32:{ *:[v4i32] }:$vj, v4i32:{ *:[v4i32] }:$vk) ``` However, `simplifyTree` drops the `bitconvert` node and its predicates: https://github.com/llvm/llvm-project/blob/8e6630391699116641cf390a10476295b7d4b95c/llvm/utils/TableGen/Common/CodeGenDAGPatterns.cpp#L3036-L3062 Then llvm will match `vsplat_imm_eq_1` for any v4i32 splats and cause a miscompilation: ``` PATTERN: (xor:{ *:[v4i32] } v4i32:{ *:[v4i32] }:$vj, (shl:{ *:[v4i32] } (build_vector:{ *:[v4i32] }), v4i32:{ *:[v4i32] }:$vk)) RESULT: (VBITREV_W:{ *:[v4i32] } v4i32:{ *:[v4i32] }:$vj, v4i32:{ *:[v4i32] }:$vk) ``` This patch adds additional checks for predicates associated with the trivial bitconvert node. Unused patterns in the LoongArch target are also removed. Fixes llvm#116008.

Currently, null chunks always follow other aligned chunks, so this patch is NFC. However, it will become observable once support for ARM64X imports is added. The import tables are shared between the native and EC views. They are usually very similar, but in cases where they differ, ARM64X relocations handle the discrepancies. If a DLL is only imported by EC code, the native view will see it as importing zero functions from this DLL (with ARM64X relocations replacing those null chunks with actual imports). In this scenario, the null chunks may appear as the very first chunks, meaning there is nothing else forcing their alignment.

Looks like a few different phrasings got mashed up together.

…vm#116610) This patch improves the code reuse of the actions system and adds several improvements for easier debugging via clang-repl --debug-only=clang-repl. The change inimproves the consistency of the TUKind when actions are handled within a WrapperFrontendAction. In this case instead of falling back to default TU_Complete, we forward to the TUKind of the ASTContext which presumably was created by the intended action. This enables the incremental infrastructure to reuse code. This patch also clones the first llvm::Module because the first PTU now can come from -include A.h and the presumption of llvm::Module being empty does not hold. The changes are a first step to fix the issues with `clang-repl --cuda`.

…16780) A lot of interchange tests unnecessary relied on a build with ASSERTS enabled. Instead, simply check the IR output for both negative and positive tests so that we don't rely on debug messages. This increases test coverage as these tests will now also run with non-assert builds. For a couple of files keeping some of the debug tests was useful, so separated out them out and moved them to a similarly named *-remarks.ll file.

…vm#116545) see llvm#73359 Declarative assemblyFormat ODS is more concise and requires less boilerplate than filling out cpp interfaces. Changes: - updates the AccessChainOp defined in SPIRVMemoryOps.td to use assemblyFormat. - Removes part print/parse from MemoryOps.cpp which is now generated by assemblyFormat - Updates tests to updated format

…lvm#116794) Closes llvm#116775.

The (extended) bit width might not fit into the (non-extended) type, resulting in an incorrect truncation of the compared value. Fix this by using m_SpecificInt(), which is both simpler and handles this correctly. Fixes the assertion failure reported in: llvm#114539 (comment)

Release note llvm#110646 and llvm#114507.

Instead of custom selecting a bunch of instructions, we can expand to generic MIR during legalization.

This reworks the free store implementation in libc's malloc to use a dlmalloc-style binary trie of circularly linked FIFO free lists. This data structure can be maintained in logarithmic time, but it still permits a relatively small implementation compared to other logarithmic-time ordered maps. The implementation doesn't do the various bitwise tricks or optimizations used in actual dlmalloc; it instead optimizes for (relative) readability and minimum code size. Specific optimization can be added as necessary given future profiling.

…lvm#117065) Reverts llvm#106259 Unit tests break on AArch64.

This patch upgrades a unit test to MemProf Version 3 while removing those bits that cannot be upgraded to Version 3. The bits being removed expect instrprof_error::hash_mismatch from a broken MemProf profile that references a frame that doesn't actually exist. Now, Version 3 no longer issues instrprof_error::hash_mismatch. Even if it still issued instrprof_error::hash_mismatch, we would have a couple of hurdles: - InstrProfWriter::addMemProfData will soon require all (or none) of the fields (frames, call stacks, and records) be populated. That is, it won't accept an instance of IndexedMemProfData with frames missing. - writeMemProfV3 asserts that every frame occurs at least once: assert(MemProfData.Frames.size() == FrameHistogram.size()); This patch gives up on instrprof_error::hash_mismatch and tries to trigger instrprof_error::unknown_function with the empty profile.

The Android clang-r536225 compiler identifies as Clang 19, but it is based on commit fc57f88, which predates the official LLVM 19.0.0 release. Some tests need fixes: * The sized delete tests fail because clang-r536225 leaves sized deallocation off by default. * std::array<T[0]> is true when this Android Clang version is used with a trunk libc++, but we expect it to be false in the test. In practice, Clang and libc++ usually come from the same commit on Android.

…vm#116151) Android clang-r536225 identifies as Clang 19 but it predates LLVM 19.0.0. It is based off of fc57f88.

…ames with DWARFTypePrinter (llvm#117071) This is a reland of llvm#112811. Fixed the bot breakage by running ld.lld explicitly.

…alloca and malloc parameters bound" (llvm#117020) Reverts llvm#115522 This caused UBSan errors in multi-stage clang build: https://lab.llvm.org/buildbot/#/builders/25/builds/4241/steps/10/logs/stdio

Similar to the FMLA_VG2_M2Z2Z_H one.

…3626) Now that `-fbasic-block-sections=list` is enabled for Arm, functions may be split aross multiple sections, and CFI information must be handled independently for each section. On x86, this is handled in `llvm/lib/CodeGen/CFIInstrInserter.cpp`. However, this pass does not run on Arm, so we must add logic for it to `llvm/lib/CodeGen/CFIFixup.cpp`.

…_LIBC_FUNCTION_ATTR_func macro. (llvm#116160)

…ible with minimal runtime (llvm#114865) We are currently getting: `clang: error: invalid argument '-fsanitize-minimal-runtime' not allowed with '-fsanitize=implicit-conversion'` when running `-fsanitize=implicit-conversion -fsanitize-minimal-runtime` because `implicit-conversion` now includes `implicit-bitfield-conversion` which is not included in the `integer` check. The `integer` check includes the `implicit-integer-conversion` checks and is supported by the trapping option and because of that compatible with the minimal runtime. It is thus reasonable to make `implicit-bitfield-conversion` compatible with the minimal runtime.

…m#117039)

…vm#114537) Summary: Consolidate the logic in a single function. We do an extra pass over Instructions but this is necessary to untangle things and extract metadata cloning in a future diff. Test Plan: ``` $ ninja check-llvm-unit check-llvm [211/213] Running the LLVM regression tests Testing Time: 106.06s Total Discovered Tests: 62601 Skipped : 17 (0.03%) Unsupported : 2518 (4.02%) Passed : 59911 (95.70%) Expectedly Failed: 155 (0.25%) [212/213] Running lit suite Testing Time: 12.47s Total Discovered Tests: 8474 Skipped: 17 (0.20%) Passed : 8457 (99.80%) ``` Extracted from llvm#109032 (commit 3) (there are more refactors and cleanups in subsequent commits)

When embedding, if `compiler.used` exists, we should re-use it's element type instead of blindly assuming it's an unqualified pointer.

This reverts commit c0efcc0.

…(SignedMax,UnsignedMax] (llvm#116733)" This reverts commit b8e1d4d. Causes failures on the `libc` test suite https://lab.llvm.org/buildbot/#/builders/73/builds/8871

In MSVC, when `/d1initall` is enabled, `__declspec(no_init_all)` can be applied to a type to suppress auto-initialization for all instances of that type or to a function to suppress auto-initialization for all locals within that function. This change does the same for Clang, except that it applies to the `-ftrivial-auto-var-init` flag instead. NOTE: I did not add a Clang-specific spelling for this but would be happy to make a followup PR if folks are interested in that.

) ld64.lld would previously allow you to link against dylibs linked with `-allowable_client`, even if the client's name does not match any allowed client. This change fixes that. See llvm#114146 for related discussion. The test binary `liballowable_client.dylib` was created on macOS with: echo | clang -xc - -dynamiclib -mmacosx-version-min=10.11 -arch x86_64 -Wl,-allowable_client,allowed -o lib/liballowable_client.dylib

…vm#114265) (llvm#117089) This reverts commit 6fb7cdf.

…lvm#116934) The dialect conversion driver has three phases: - **Create** `IRRewrite` objects as the IR is traversed. - **Finalize** `IRRewrite` objects. During this phase, source materializations for mismatching value types are created. (E.g., when `Value` is replaced with a `Value` of different type, but there is a user of the original value that was not modified because it is already legal.) - **Commit** `IRRewrite` objects. During this phase, all remaining IR modifications are materialized. In particular, SSA values are actually being replaced during this phase. This commit removes the "finalize" phase. This simplifies the code base a bit and avoids one traversal over the `IRRewrite` stack. Source materializations are now built during the "commit" phase, right before an SSA value is being replaced. This commit also removes the "inverse mapping" of the conversion value mapping, which was used to predict if an SSA value will be dead at the end of the conversion. This check is replaced with an approximate check that does not require an inverse mapping. (A false positive for `v` can occur if another value `v2` is mapped to `v` and `v2` turns out to be dead at the end of the conversion. This case is not expected to happen very often.) This reduces the complexity of the driver a bit and removes one potential source of bugs. (There have been bugs in the usage of the inverse mapping in the past.) `BlockTypeConversionRewrite` no longer stores a pointer to the type converter. This pointer is now stored in `ReplaceBlockArgRewrite`. This commit is in preparation of merging the 1:1 and 1:N dialect conversion driver. It simplifies the upcoming changes around the conversion value mapping. (API surface of the conversion value mapping is reduced.)

… phase" (llvm#117094) Reverts llvm#116934 This commit broke the build.

…/fromPtr. On arm64e, uses the "wrap" and "unwrap" operations introduced in f14cb49 to sign and strip pointers by default. Signing / striping can be overriden at the toPtr / fromPtr callside by passing an explicit wrap / unwrap operation.

Lukacma and others added 30 commits November 19, 2024 10:29

[AArch64] Update predicate for FEXPA (llvm#116613)

61726ad

This patch updates predicate and backend tests for FEXPA instructions to match [latest spec](https://developer.arm.com/documentation/ddi0602/2024-09/SVE-Instructions/FEXPA--Floating-point-exponential-accelerator-).

[X86] Tidyup up AVX512 FPCLASS instruction naming (llvm#116661)

7dcefb3

FPCLASS is a unary instruction with an immediate operand - update the naming to match similar instructions (e.g. VPSHUFD) by only using the source reg/mem and immediate in the instruction name

[LoopInterchange] Make the entries of the Dependency Matrix unique (l…

cac6f21

…lvm#116195) The entries in the dependency matrix can contain a lot of duplicates, which is unnecessary and results in more checks that we can avoid, and this patch adds that.

[clangd] Let DefineOutline tweak handle member function templates (ll…

8a6a76b

…vm#112345)

[InstCombine] Drop noundef attributes in foldCttzCtlz (llvm#116718)

a59976b

Closes llvm#112068.

[amdgpu-arch] Fix unused StringRef result

55fad5e

[RISCV] Fix FP64 DinX R Regclass (llvm#116688)

c4030c8

This was a typo in llvm#112983 that didn't cause build failures but is still wrong.

[clang] constexpr built-in reduce mul function. (llvm#116626)

b03a747

Part of llvm#51787. Follow up of llvm#116243. This patch adds constexpr support for the built-in reduce mul function.

[MLIR] Add callback functions for ModuleToObject (llvm#116007)

2153672

In ModuleToObject flow, users may want to add some callback functions invoked with LLVM IR/ISA for debugging or other purposes.

[libc++][NFC] Format a pait test

01a1ca7

I'll be modifying this test in a future PR.

[CodeGen][AArch64] Sink splat operands of FMul instructions (llvm#116222

4f0403f

) Sink shuffle operands of FMul instructions if these are splats, as we can generate lane-indexed variants for these.

[InstCombine] Add extra test for preserving !llvm.access.group

9e0ea8c

Add variant with different metadata on all loads, for llvm#115868

[LAA] Add phi test variant for cross-iteration dependence (NFC)

681939e

[llvm][docs] Correct setence in How To Add A Builder

ee4fb3a

Looks like a few different phrasings got mashed up together.

[XCore] Pattern match LADD/LSUB/LMUL/MACCU/MACCS/CRC8 (llvm#116245)

f69646e

[InstCombine] Handle constant GEP expr in SimplifyDemandedUseBits (l…

03d8831

…lvm#116794) Closes llvm#116775.

JDevlieghere and others added 27 commits November 20, 2024 13:25

Add release note for parallel module creation in LLDB (llvm#116857)

4acf935

Release note llvm#110646 and llvm#114507.

[RISCV][GISel] Move G_BRJT expansion to legalization (llvm#73711)

4087b87

Instead of custom selecting a bunch of instructions, we can expand to generic MIR during legalization.

Revert "[libc] Use best-fit binary trie to make malloc logarithmic" (l…

9be475a

…lvm#117065) Reverts llvm#106259 Unit tests break on AArch64.

[libc++] Include headers in <thread> conditionally (llvm#116539)

9ebc6f5

[libc++][Android] BuildKite CI: update Clang and sysroot versions (ll…

1c8ac4c

…vm#116151) Android clang-r536225 identifies as Clang 19 but it predates LLVM 19.0.0. It is based off of fc57f88.

[lldb][dwarf] Compute fully qualified names on simplified template n…

f06c187

…ames with DWARFTypePrinter (llvm#117071) This is a reland of llvm#112811. Fixed the bot breakage by running ld.lld explicitly.

Revert "[llvm] Improve llvm.objectsize computation by computing GEP, …

a44d60f

…alloca and malloc parameters bound" (llvm#117020) Reverts llvm#115522 This caused UBSan errors in multi-stage clang build: https://lab.llvm.org/buildbot/#/builders/25/builds/4241/steps/10/logs/stdio

[AArch64][SME] Fix naming of FMLS_VG4_M4Z2Z_H -> FMLS_VG4_M4Z4Z_H. NFC.

c58c226

Similar to the FMLA_VG2_M2Z2Z_H one.

[libc] Allow each function can have extra attributes by defining LLVM…

1466711

…_LIBC_FUNCTION_ATTR_func macro. (llvm#116160)

[flang][cuda] Adapt ExternalNameConversion to work in gpu module (llv…

ecda140

…m#117039)

[LLVM][NFC] Use used's element type if available (llvm#116804)

53a6a11

When embedding, if `compiler.used` exists, we should re-use it's element type instead of blindly assuming it's an unqualified pointer.

Revert "[libc] support fully OOT build (llvm#101287)"

97e3f62

This reverts commit c0efcc0.

Revert "[AMDGPU] prevent shrinking udiv/urem if either operand is in …

905e831

…(SignedMax,UnsignedMax] (llvm#116733)" This reverts commit b8e1d4d. Causes failures on the `libc` test suite https://lab.llvm.org/buildbot/#/builders/73/builds/8871

[test] Precommit test for llvm#116936

fe33bd0

Revert "[X86] Recognize POP/ADD/SUB modifying rsp in getSPAdjust. (ll…

7b5b019

…vm#114265) (llvm#117089) This reverts commit 6fb7cdf.

Revert "[mlir][Transforms][NFC] Dialect conversion: Remove "finalize"…

4056d93

… phase" (llvm#117094) Reverts llvm#116934 This commit broke the build.

[AutoBump] Merge with 258a5d4 (Nov 20)

1d56e9c

mgehre-amd approved these changes Feb 18, 2025

View reviewed changes

mgehre-amd merged commit f0a4187 into bump_to_68a39081 Mar 12, 2025
5 checks passed

mgehre-amd deleted the bump_to_258a5d49 branch March 12, 2025 10:10

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[AutoBump] Merge with 258a5d49 (Nov 20) (11) #473

[AutoBump] Merge with 258a5d49 (Nov 20) (11) #473

Uh oh!

jorickert commented Feb 17, 2025

Uh oh!

Uh oh!

Uh oh!

[AutoBump] Merge with 258a5d49 (Nov 20) (11) #473

[AutoBump] Merge with 258a5d49 (Nov 20) (11) #473

Uh oh!

Conversation

jorickert commented Feb 17, 2025

Uh oh!

Uh oh!

Uh oh!