forked from llvm/llvm-project
-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[AutoBump] Merge with 258a5d49 (Nov 20) (11) #473
Open
jorickert
wants to merge
367
commits into
bump_to_68a39081
Choose a base branch
from
bump_to_258a5d49
base: bump_to_68a39081
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This patch updates predicate and backend tests for FEXPA instructions to match [latest spec](https://developer.arm.com/documentation/ddi0602/2024-09/SVE-Instructions/FEXPA--Floating-point-exponential-accelerator-).
These nodes should appear between CALLSEQ_START / CALLSEQ_END. Previously, they could be scheduled after CALLSEQ_END because the nodes didn't update the chain. The change in a test is due to X86 call frame optimizer pass bailing out for a particular call when CALLSEQ_START / CALLSEQ_END are not in the same basic block. This happens because SEG_ALLOCA is expanded into a sequence of basic blocks early. It didn't bail out before because the closing CALLSEQ_END was scheduled before SEG_ALLOCA, in the same basic block as CALLSEQ_START. While here, simplify creation of these nodes: allocating a virtual register and copying `Size` into it were unnecessary.
…lts to improve further folding (llvm#116419) Currently when creating a SHUFPD immediate mask, any undef shuffle elements are set to 0, which can limit options for further shuffle combining. This patch attempts to canonicalize the mask to improve folding: first by detecting a per-lane broadcast style mask (which can allow us to fold to UNPCK instead), and second ensure any undef elements are set to an 'inplace' value to improve chances of the SHUFPD later folding to a BLENDPD (or be bypassed in a SimplifyMultipleUseDemandedVectorElts call). This is very similar to canonicalization we already attempt in getV4X86ShuffleImm for vXi32/vXf32 SHUFPS/SHUFD shuffles.
A follow up to PR llvm#116402 to add a regression test. The original change fixed the reproducer but that was not suitable to use as a regression test. This test case will fail with a LLD prior to llvm#116402. The disassembly for the thunk that starts as a short thunk but is later a long thunk isn't quite right. It is missing a $d mapping symbol. I think this can be fixed, but I've not done that in this patch to keep it test only. It is not a regression introduced in llvm#116402. I've also removed a spurious --threads=1 I noticed in the original test aarch64-thunk-bti.s
FPCLASS is a unary instruction with an immediate operand - update the naming to match similar instructions (e.g. VPSHUFD) by only using the source reg/mem and immediate in the instruction name
…6651) Summary: For Linux systems, we currently use the HSA library to determine the installed GPUs. However, this isn't really necessary and adds a dependency on the HSA runtime as well as a lot of overhead. Instead, this patch uses the `sysfs` interface exposed by `amdkfd` to do this directly.
…lvm#116195) The entries in the dependency matrix can contain a lot of duplicates, which is unnecessary and results in more checks that we can avoid, and this patch adds that.
…n function signatures. (llvm#116146) In loongarch64 LP64D ABI, `unsigned 32-bit` types, such as unsigned int, are stored in general-purpose registers as proper sign extensions of their 32-bit values. Therefore, Flang also follows it if a function needs to be interoperable with C. Reference: https://github.com/loongson/la-abi-specs/blob/release/lapcs.adoc#Fundamental-types
…#116590) This patch generalizes llvm#81027 to handle pattern `and/or (fcmp ord/uno X, 0), (fcmp pred fabs(X), Y)`. Alive2: https://alive2.llvm.org/ce/z/tsgUrz The correctness is straightforward because `fcmp ord/uno X, 0.0` is equivalent to `fcmp ord/uno fabs(X), 0.0`. We may generalize it to handle fneg as well. Address comment llvm#116065 (review)
…vm#116247) There are lots of places where we try to estimate the runtime vectorisation factor based on the getVScaleForTuning TTI hook. I've added a new getEstimatedRuntimeVF function and taught several places in the vectoriser to use this new function.
This was a typo in llvm#112983 that didn't cause build failures but is still wrong.
Part of llvm#51787. Follow up of llvm#116243. This patch adds constexpr support for the built-in reduce mul function.
In ModuleToObject flow, users may want to add some callback functions invoked with LLVM IR/ISA for debugging or other purposes.
I'll be modifying this test in a future PR.
…vm#114508) The `rd` operand of AMCAS instructions is both read and written, because of the nature of compare-and-swap operations, but currently it is not declared as such. Fix it for upcoming codegen enablement changes. In order to do that, a piece of LoongArchAsmParser logic that relied on TableGen-erated enum variants being ordered in a specific way needs updating; this will be addressed in a following refactor. No functional change intended. While at it, restore vertical alignment for the definition lines. Suggested-by: tangaac <[email protected]> Link: llvm#114398 (comment)
Add variant with different metadata on all loads, for llvm#115868
…hen there are predicate calls (llvm#116075) On loongarch64 with lsx extension, we select `VBITREV_W` for `v4i32 (xor X, (shl splat(1), Y))`: https://github.com/llvm/llvm-project/blob/8e6630391699116641cf390a10476295b7d4b95c/llvm/lib/Target/LoongArch/LoongArchLSXInstrInfo.td#L1583-L1584 And `vsplat_imm_eq_1` is defined as: https://github.com/llvm/llvm-project/blob/8e6630391699116641cf390a10476295b7d4b95c/llvm/lib/Target/LoongArch/LoongArchLSXInstrInfo.td#L77-L87 For the `(bitconvert (v4i32 (build_vector)))` case, the pattern is expected to be: ``` PATTERN: (xor:{ *:[v4i32] } v4i32:{ *:[v4i32] }:$vj, (shl:{ *:[v4i32] } (bitconvert:{ *:[v4i32] } (build_vector:{ *:[v4i32] }))<<P:Predicate_vsplat_imm_eq_1>>, v4i32:{ *:[v4i32] }:$vk)) RESULT: (VBITREV_W:{ *:[v4i32] } v4i32:{ *:[v4i32] }:$vj, v4i32:{ *:[v4i32] }:$vk) ``` However, `simplifyTree` drops the `bitconvert` node and its predicates: https://github.com/llvm/llvm-project/blob/8e6630391699116641cf390a10476295b7d4b95c/llvm/utils/TableGen/Common/CodeGenDAGPatterns.cpp#L3036-L3062 Then llvm will match `vsplat_imm_eq_1` for any v4i32 splats and cause a miscompilation: ``` PATTERN: (xor:{ *:[v4i32] } v4i32:{ *:[v4i32] }:$vj, (shl:{ *:[v4i32] } (build_vector:{ *:[v4i32] }), v4i32:{ *:[v4i32] }:$vk)) RESULT: (VBITREV_W:{ *:[v4i32] } v4i32:{ *:[v4i32] }:$vj, v4i32:{ *:[v4i32] }:$vk) ``` This patch adds additional checks for predicates associated with the trivial bitconvert node. Unused patterns in the LoongArch target are also removed. Fixes llvm#116008.
Currently, null chunks always follow other aligned chunks, so this patch is NFC. However, it will become observable once support for ARM64X imports is added. The import tables are shared between the native and EC views. They are usually very similar, but in cases where they differ, ARM64X relocations handle the discrepancies. If a DLL is only imported by EC code, the native view will see it as importing zero functions from this DLL (with ARM64X relocations replacing those null chunks with actual imports). In this scenario, the null chunks may appear as the very first chunks, meaning there is nothing else forcing their alignment.
Looks like a few different phrasings got mashed up together.
…vm#116610) This patch improves the code reuse of the actions system and adds several improvements for easier debugging via clang-repl --debug-only=clang-repl. The change inimproves the consistency of the TUKind when actions are handled within a WrapperFrontendAction. In this case instead of falling back to default TU_Complete, we forward to the TUKind of the ASTContext which presumably was created by the intended action. This enables the incremental infrastructure to reuse code. This patch also clones the first llvm::Module because the first PTU now can come from -include A.h and the presumption of llvm::Module being empty does not hold. The changes are a first step to fix the issues with `clang-repl --cuda`.
…16780) A lot of interchange tests unnecessary relied on a build with ASSERTS enabled. Instead, simply check the IR output for both negative and positive tests so that we don't rely on debug messages. This increases test coverage as these tests will now also run with non-assert builds. For a couple of files keeping some of the debug tests was useful, so separated out them out and moved them to a similarly named *-remarks.ll file.
…vm#116545) see llvm#73359 Declarative assemblyFormat ODS is more concise and requires less boilerplate than filling out cpp interfaces. Changes: - updates the AccessChainOp defined in SPIRVMemoryOps.td to use assemblyFormat. - Removes part print/parse from MemoryOps.cpp which is now generated by assemblyFormat - Updates tests to updated format
The (extended) bit width might not fit into the (non-extended) type, resulting in an incorrect truncation of the compared value. Fix this by using m_SpecificInt(), which is both simpler and handles this correctly. Fixes the assertion failure reported in: llvm#114539 (comment)
We've switched to LineLocation from FieldsAre in MemProfUseTest.cpp. This patch does the same thing in InstrProfTest.cpp. llvm/unittests/Transforms/Instrumentation/MemProfUseTest.cpp
llvm#115627) This fixes a missed optimization caused by the `foldBitcastExtElt` pattern interfering with other combine patterns. In the case I was hitting, we have IR that combines two vectors into a new larger vector by extracting elements and inserting them into the new vector. ```llvm define <4 x half> @bitcast_extract_insert_to_shuffle(i32 %a, i32 %b) { %avec = bitcast i32 %a to <2 x half> %a0 = extractelement <2 x half> %avec, i32 0 %a1 = extractelement <2 x half> %avec, i32 1 %bvec = bitcast i32 %b to <2 x half> %b0 = extractelement <2 x half> %bvec, i32 0 %b1 = extractelement <2 x half> %bvec, i32 1 %ins0 = insertelement <4 x half> undef, half %a0, i32 0 %ins1 = insertelement <4 x half> %ins0, half %a1, i32 1 %ins2 = insertelement <4 x half> %ins1, half %b0, i32 2 %ins3 = insertelement <4 x half> %ins2, half %b1, i32 3 ret <4 x half> %ins3 } ``` With the current behavior, `InstCombine` converts each vector extract sequence to ```llvm %tmp = trunc i32 %a to i16 %a0 = bitcast i16 %tmp to half %a1 = extractelement <2 x half> %avec, i32 1 ``` where the extraction of `%a0` is now done by truncating the original integer. While on it's own this is fairly reasonable, in this case it also blocks the pattern which converts `extractelement` - `insertelement` into shuffles which gives the overall simpler result: ```llvm define <4 x half> @bitcast_extract_insert_to_shuffle(i32 %a, i32 %b) { %avec = bitcast i32 %a to <2 x half> %bvec = bitcast i32 %b to <2 x half> %ins3 = shufflevector <2 x half> %avec, <2 x half> %bvec, <4 x i32> <i32 0, i32 1, i32 2, i32 3> ret <4 x half> %ins3 } ``` In this PR I fix the conflict by obeying the `hasOneUse` check even if there is no shift instruction required. In these cases we can't remove the vector completely, so the pattern has less benefit anyway. Also fwiw, I think dropping the `hasOneUse` check for the 0th element might have been a mistake in the first place. Looking at llvm@535c5d5 the commit message only mentions loosening the `isDesirableIntType` requirement and doesn't mention changing the `hasOneUse` check at all.
Instead of custom selecting a bunch of instructions, we can expand to generic MIR during legalization.
This reworks the free store implementation in libc's malloc to use a dlmalloc-style binary trie of circularly linked FIFO free lists. This data structure can be maintained in logarithmic time, but it still permits a relatively small implementation compared to other logarithmic-time ordered maps. The implementation doesn't do the various bitwise tricks or optimizations used in actual dlmalloc; it instead optimizes for (relative) readability and minimum code size. Specific optimization can be added as necessary given future profiling.
…lvm#117065) Reverts llvm#106259 Unit tests break on AArch64.
This patch upgrades a unit test to MemProf Version 3 while removing those bits that cannot be upgraded to Version 3. The bits being removed expect instrprof_error::hash_mismatch from a broken MemProf profile that references a frame that doesn't actually exist. Now, Version 3 no longer issues instrprof_error::hash_mismatch. Even if it still issued instrprof_error::hash_mismatch, we would have a couple of hurdles: - InstrProfWriter::addMemProfData will soon require all (or none) of the fields (frames, call stacks, and records) be populated. That is, it won't accept an instance of IndexedMemProfData with frames missing. - writeMemProfV3 asserts that every frame occurs at least once: assert(MemProfData.Frames.size() == FrameHistogram.size()); This patch gives up on instrprof_error::hash_mismatch and tries to trigger instrprof_error::unknown_function with the empty profile.
The Android clang-r536225 compiler identifies as Clang 19, but it is based on commit fc57f88, which predates the official LLVM 19.0.0 release. Some tests need fixes: * The sized delete tests fail because clang-r536225 leaves sized deallocation off by default. * std::array<T[0]> is true when this Android Clang version is used with a trunk libc++, but we expect it to be false in the test. In practice, Clang and libc++ usually come from the same commit on Android.
…vm#116151) Android clang-r536225 identifies as Clang 19 but it predates LLVM 19.0.0. It is based off of fc57f88.
…ames with DWARFTypePrinter (llvm#117071) This is a reland of llvm#112811. Fixed the bot breakage by running ld.lld explicitly.
…alloca and malloc parameters bound" (llvm#117020) Reverts llvm#115522 This caused UBSan errors in multi-stage clang build: https://lab.llvm.org/buildbot/#/builders/25/builds/4241/steps/10/logs/stdio
Similar to the FMLA_VG2_M2Z2Z_H one.
…3626) Now that `-fbasic-block-sections=list` is enabled for Arm, functions may be split aross multiple sections, and CFI information must be handled independently for each section. On x86, this is handled in `llvm/lib/CodeGen/CFIInstrInserter.cpp`. However, this pass does not run on Arm, so we must add logic for it to `llvm/lib/CodeGen/CFIFixup.cpp`.
…_LIBC_FUNCTION_ATTR_func macro. (llvm#116160)
…ible with minimal runtime (llvm#114865) We are currently getting: `clang: error: invalid argument '-fsanitize-minimal-runtime' not allowed with '-fsanitize=implicit-conversion'` when running `-fsanitize=implicit-conversion -fsanitize-minimal-runtime` because `implicit-conversion` now includes `implicit-bitfield-conversion` which is not included in the `integer` check. The `integer` check includes the `implicit-integer-conversion` checks and is supported by the trapping option and because of that compatible with the minimal runtime. It is thus reasonable to make `implicit-bitfield-conversion` compatible with the minimal runtime.
…vm#114537) Summary: Consolidate the logic in a single function. We do an extra pass over Instructions but this is necessary to untangle things and extract metadata cloning in a future diff. Test Plan: ``` $ ninja check-llvm-unit check-llvm [211/213] Running the LLVM regression tests Testing Time: 106.06s Total Discovered Tests: 62601 Skipped : 17 (0.03%) Unsupported : 2518 (4.02%) Passed : 59911 (95.70%) Expectedly Failed: 155 (0.25%) [212/213] Running lit suite Testing Time: 12.47s Total Discovered Tests: 8474 Skipped: 17 (0.20%) Passed : 8457 (99.80%) ``` Extracted from llvm#109032 (commit 3) (there are more refactors and cleanups in subsequent commits)
When embedding, if `compiler.used` exists, we should re-use it's element type instead of blindly assuming it's an unqualified pointer.
This reverts commit c0efcc0.
…(SignedMax,UnsignedMax] (llvm#116733)" This reverts commit b8e1d4d. Causes failures on the `libc` test suite https://lab.llvm.org/buildbot/#/builders/73/builds/8871
In MSVC, when `/d1initall` is enabled, `__declspec(no_init_all)` can be applied to a type to suppress auto-initialization for all instances of that type or to a function to suppress auto-initialization for all locals within that function. This change does the same for Clang, except that it applies to the `-ftrivial-auto-var-init` flag instead. NOTE: I did not add a Clang-specific spelling for this but would be happy to make a followup PR if folks are interested in that.
) ld64.lld would previously allow you to link against dylibs linked with `-allowable_client`, even if the client's name does not match any allowed client. This change fixes that. See llvm#114146 for related discussion. The test binary `liballowable_client.dylib` was created on macOS with: echo | clang -xc - -dynamiclib -mmacosx-version-min=10.11 -arch x86_64 -Wl,-allowable_client,allowed -o lib/liballowable_client.dylib
…vm#114265) (llvm#117089) This reverts commit 6fb7cdf.
…lvm#116934) The dialect conversion driver has three phases: - **Create** `IRRewrite` objects as the IR is traversed. - **Finalize** `IRRewrite` objects. During this phase, source materializations for mismatching value types are created. (E.g., when `Value` is replaced with a `Value` of different type, but there is a user of the original value that was not modified because it is already legal.) - **Commit** `IRRewrite` objects. During this phase, all remaining IR modifications are materialized. In particular, SSA values are actually being replaced during this phase. This commit removes the "finalize" phase. This simplifies the code base a bit and avoids one traversal over the `IRRewrite` stack. Source materializations are now built during the "commit" phase, right before an SSA value is being replaced. This commit also removes the "inverse mapping" of the conversion value mapping, which was used to predict if an SSA value will be dead at the end of the conversion. This check is replaced with an approximate check that does not require an inverse mapping. (A false positive for `v` can occur if another value `v2` is mapped to `v` and `v2` turns out to be dead at the end of the conversion. This case is not expected to happen very often.) This reduces the complexity of the driver a bit and removes one potential source of bugs. (There have been bugs in the usage of the inverse mapping in the past.) `BlockTypeConversionRewrite` no longer stores a pointer to the type converter. This pointer is now stored in `ReplaceBlockArgRewrite`. This commit is in preparation of merging the 1:1 and 1:N dialect conversion driver. It simplifies the upcoming changes around the conversion value mapping. (API surface of the conversion value mapping is reduced.)
… phase" (llvm#117094) Reverts llvm#116934 This commit broke the build.
…/fromPtr. On arm64e, uses the "wrap" and "unwrap" operations introduced in f14cb49 to sign and strip pointers by default. Signing / striping can be overriden at the toPtr / fromPtr callside by passing an explicit wrap / unwrap operation.
mgehre-amd
approved these changes
Feb 18, 2025
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
No description provided.