[AutoBump] Merge with 776476c2 (Nov 22) (13) #477

jorickert · 2025-02-20T02:11:53Z

No description provided.

….bzl

…zero-call-used-regs (llvm#116995) Previously, with `-fzero-call-used-regs` clang/LLVM would incorrectly emit Neon instructions in streaming functions, and streaming-compatible functions without SVE. With this change: * In streaming functions, Z/p registers will be zeroed * In streaming compatible functions w/o SVE, D registers will be zeroed - (As Neon vector instructions are illegal including `movi v..`)

) Starting with 41e3919 DiagnosticsEngine creation might perform IO. It was implicitly defaulting to getRealFileSystem. This patch makes it explicit by pushing the decision making to callers. It uses ambient VFS if one is available, and keeps using `getRealFileSystem` if there aren't any VFS.

…lvm#116856) This brings the printing of scalable vector constant splats inline with their fixed length counterparts.

…lvm#117009) The relevant bit from the Intel SDM for vinsertps semantics: ``` IF (SRC = REG) THEN COUNT_S := imm8[7:6] ELSE COUNT_S := 0 ``` This is now taken into account.

…lvm#115852)" Reverted for causing: llvm#117145 This reverts commit bdd10d9.

…ConstantInt/FP. (llvm#116787) This fixes the code quality issue reported in llvm#111149.

Following on from llvm#116373, updates "pack-dynamic-inner-tile.mlir" to use TD Ops for all transformations except for lowering to LLVM. This is an intermediate step before introducing vectorization.

…lvm#116793)

@src

Motivating case from https://github.com/torvalds/linux/blob/9852d85ec9d492ebef56dc5f229416c925758edc/drivers/gpu/drm/drm_edid.c#L5238-L5240: ``` define i1 @src(i8 noundef %v13) { entry: %conv1 = zext i8 %v13 to i32 %add = add nsw i32 %conv1, -4 %cmp = icmp ult i32 %add, 3 %cmp4 = icmp slt i8 %v13, 4 %cond = select i1 %cmp4, i1 true, i1 %cmp ret i1 %cond } define i1 @tgt(i8 noundef %v13) { entry: %cmp4 = icmp slt i8 %v13, 7 ret i1 %cmp4 } ```

Part of llvm#51787. Follow up of llvm#116822. This patch adds constexpr support for the built-in reduce `or` and `xor` functions.

/llvm-project/llvm/lib/Transforms/InstCombine/InstCombineCompares.cpp:3190:14: error: unused variable 'CmpBW' [-Werror,-Wunused-variable] unsigned CmpBW = Ty->getScalarSizeInBits(); ^ 1 error generated.

…rrectly identified as unused to fix a build error on z/OS

…t member functions (llvm#114813) Fixes llvm#95707.

@mstorsjo

…16704) Alter the #ifdef values from llvm#110986 and llvm#115292 to use _MSC_VER instead of _WIN32 to stop the pragmas being used on gcc/mingw builds Noticed by @mstorsjo

…lvm#115852)" This reverts commit a1153cd with fixes to lldb breakages. Fixes llvm#117145.

If the size is larger than the index width, truncate it instead of asserting. Longer-term we should consider rejecting types larger than the index size in the verifier, though this is probably tricky in practice (it's address space dependent, and types are owned by the context, not the module). Fixes llvm#116960.

The previous fix llvm@c641497 failed to consider the fact that the call graph update doesn't make any sense if the caller node hasn't been populated in the LazyCallGraph yet. This patch changes to skip this CG update step when that happens.

) When we create a thunk we don't know whether it will be short or long. Move the emission of the long thunk mapping symbol to when we transition to a long thunk. This improves disassembly and binary analysis as tools like BOLT identify thunks by disassembly. This removes a FIXME added in llvm#108989 aarch64-thunk-bti-multipass.s which had a corrupt disassembly due to missing mapping symbols.

This calls the system calls switch_pri and sys_ulock_wait. It also is one of the more straightforwardly rt-unsafe, in that it gives up this thread's timeslice.

…ual_stacks) (llvm#117069) Following the example of tsan, where we took the name This would allow users to determine if they want to see ALL output from rtsan. Additionally, remove the UNLIKELY hint, as it is now up to the flag whether or not it is likely that we go through this conditional.

…CallBI function (llvm#115496) This commit adds an assert statement to the CallBI function to ensure that the interpreter state (S.Current) is correctly reset to the previous frame (FrameBefore) after InterpretBuiltin returns true. This helps catch any potential issues during development and debugging.

Reject them if the base is null, not only if the entire pointer is null. Fixes llvm#113821

This PR is simply adding the Broadcom vendor ID to the SPIRV list. In order to enable the use of this vendor ID in a SPIRV pipeline for the Videocore GPUs.

This commit addresses several Static Analyzer issues related to potential null dereference by replacing dyn_cast<> with cast<> and getAs<> with castAs<> in various parts of the codes. The cast function asserts that the cast is valid, ensuring that the pointer is not null and preventing null dereference errors. The changes are made in the following files: CGBuiltin.cpp: Ensure vector types have exactly 3 elements. CGExpr.cpp: Ensure member declarations are field declarations. AnalysisBasedWarnings.cpp: Ensure operations are member expressions. SemaExprMember.cpp: Ensure base types are extended vector types. These changes ensure that the types are correctly cast and prevent potential null dereference issues, improving the robustness and safety of the code.

This patch removes MemProf format Version 1 now that Version 2 and 3 are working well.

I got asked about this offline and realized we didn't really have tests specific to the VLS frame lowering.

Verify the format is valid and the type is one of the expected i32 vectors. Verify the used vector types at least cover the requirements of the corresponding format operand.

…analysis (llvm#117324) Doug implemented quite literally all of it and has been continuously improving the implementation by handling more language constructs we had initially missed. I spent a lot of time reviewing the implementation of the attributes as well as the analysis pass, so in other words, the two of us are probably best equipped to answer any questions that might arise wrt this part of Clang.

Patch allows to vector scalar instruction + poison values as if poisons are instructions with the same opcode. It allows better vectorization of the repeated values, reduces number of insertelement instructions and serves as a base ground for copyable elements vectorization AVX512, -O3 + LTO JM/ldecod - better vector code Applications/oggenc - better vectorization CINT2017speed/625.x264_s CINT2017rate/525.x264_r - better vector code CFP2017rate/526.blender_r - better vector code CFP2006/447.dealII - small variations Benchmarks/Bullet - extra vector code CFP2017rate/510.parest_r - better vectorization CINT2017rate/502.gcc_r CINT2017speed/602.gcc_s - extra vector code Benchmarks/tramp3d-v4 - small variations CFP2006/453.povray - extra vector code JM/lencod - better vector code CFP2017rate/511.povray_r - extra vector code MemFunctions/MemFunctions - extra vector code LoopVectorization/LoopVectorizationBenchmarks - extra vector code XRay/FDRMode - extra vector code XRay/ReturnReference - extra vector code LCALS/SubsetCLambdaLoops - extra vector code LCALS/SubsetCRawLoops - extra vector code LCALS/SubsetARawLoops - extra vector code LCALS/SubsetALambdaLoops - extra vector code DOE-ProxyApps-C++/miniFE - extra vector code LoopVectorization/LoopInterleavingBenchmarks - extra vector code LCALS/SubsetBLambdaLoops - extra vector code MicroBenchmarks/harris - extra vector code ImageProcessing/Dither - extra vector code MicroBenchmarks/SLPVectorization - extra vector code ImageProcessing/Blur - extra vector code ImageProcessing/Dilate - extra vector code Builtins/Int128 - extra vector code ImageProcessing/Interpolation - extra vector code ImageProcessing/BilateralFiltering - extra vector code ImageProcessing/AnisotropicDiffusion - extra vector code MicroBenchmarks/LoopInterchange - extra code vectorized LCALS/SubsetBRawLoops - extra code vectorized CINT2006/464.h264ref - extra vectorization with wider vectors CFP2017rate/508.namd_r - small variations, extra phis vectorized CFP2006/444.namd - 2 2 x phi replaced by 4 x phi DOE-ProxyApps-C/SimpleMOC - extra code vectorized CINT2017rate/541.leela_r CINT2017speed/641.leela_s - the function better vectorized and inlined Benchmarks/Misc/oourafft - 2 4 x bit reductions replaced by 2 x vector code FreeBench/fourinarow - better vectorization Reviewers: RKSimon Reviewed By: RKSimon Pull Request: llvm#115946

Patch uses getExtendedReduction for reductions of ext-based nodes + adds cost estimation for ctpop-kind reductions into basic implementation and RISCV-V specific vcpop cost estimation. Reviewers: RKSimon, preames Reviewed By: preames Pull Request: llvm#117350

Extend existing store widening pass to widen load instructions. This patch also borrows the alias check algorithm from AMDGPU's load store widening pass. Widened load instruction is inserted before the first candidate load instruction. Widened store instruction is inserted after the last candidate store instruction. This method helps avoid moving uses/defs when replacing load/store instructions with their widened equivalents. The pass has also been extended to * Generate 64-bit widened stores * Handle 32-bit post increment load/store * Handle stores of non-immediate values * Handle stores where the offset is a GlobalValue

A recent commit (23d7a6c) introduced a dependency on libLLVMMC.so. This is to handle the `-print-supported-cpus` option which uses `llvm/MC/SubtargetInfo`. It requires libLLVMMC to be linked into the flang-driver which the previous commit did not do. This fixes that issue.

Summary: Previous patches have made the `rpc.h` header independent of the `libc` internals. This allows us to include it directly rather than providing an indirect C API. This patch only does the work to move the header. A future patch will pull out the `rpc_server` interface and simply replace it with a single function that handles the opcodes.

Turns out there were also errors in the recvfrom unpoisoning logic. This patch fixes those.

Unfortunately there's no upstream frontend for Metal but since the id's are now assigned by the DWARF standard I think it makes sense to have the enums upstream to enable tools like llvm-dwarfdump. This patch therefore uses an AArch64 test with artificially modified debug info to verify that the Metal language id can be used. https://dwarfstd.org/issues/241111.1.html

… ISel (llvm#117375) This removes operands/results either in SDNode description or in ISel code so that they match each other.

DynamicLoader does not use ProcessElfCore NT_FILE entries to get UUID. Use GetModuleSpec to get UUID from Process.

This reverts commit 576865a. Depends on llvm#114827 that was reverted.

This disables `readability-identifier-naming` for the source files, since names don't have to by _Uglified in the source files. We currently don't enforce clang-tidy in the source files, so this is only useful to avoid a bunch of warnings when using an editor that shows the results of clang-tidy.

@Wheest

…ad values removed (llvm#116519) This change is related to discussion: https://discourse.llvm.org/t/question-on-criteria-for-acceptable-ir-in-removedeadvaluespass/83131 I do not know the original reason to disallow the optimization on modules with global private constant. Please let me know what am I missing, I will be happy to make it better. Thank you! CC: @Wheest --------- Co-authored-by: Renat Idrisov <[email protected]>

…m#117066) Leverage the support added to represent allocation contexts in a more compact way via a radix tree in the indexed profile to similarly reduce sizes of the bitcode summaries. For a large target, this reduced the size of the per-module summaries by about 18% and in the distributed combined index files by 28%.

…ies" (llvm#117395) Reverts llvm#117066 This is causing some build bot failures that need investigation.

Add new CLI options for feature parity with ELF w.r.t pass plugins. Most of the changes are ported directly from llvm@0c86198. With this change, it is now possible to load and run external pass plugins during the LTO phase.

Same as SMUL, UMUL produces one result + flags, not two results + flags.

…ries" (llvm#117395) (llvm#117404) This reverts commit fdb050a, and restores ccb4702, with a fix for build bot failures. Specifically, add ProfileData to the dependences of the BitWriter library, which was causing shared library builds of LLVM to fail. Reproduced the failure with a shared library build and confirmed this change fixes that build failure.

metaflow and others added 30 commits November 21, 2024 11:41

[bazel] format utils/bazel/llvm-project-overlay/libc/libc_build_rules…

5bdee35

….bzl

[LLVM][IR] Use splat syntax when printing ConstantExpr based splats. (l…

56c091e

…lvm#116856) This brings the printing of scalable vector constant splats inline with their fixed length counterparts.

[X86] Fix shuffle comment decoding for vinsertps immediate operand (l…

0b06301

…lvm#117009) The relevant bit from the Intel SDM for vinsertps semantics: ``` IF (SRC = REG) THEN COUNT_S := imm8[7:6] ELSE COUNT_S := 0 ``` This is now taken into account.

Revert "[NFC] Explicitly pass a VFS when creating DiagnosticsEngine (l…

a1153cd

…lvm#115852)" Reverted for causing: llvm#117145 This reverts commit bdd10d9.

[LLVM][IR] Refactor ConstantFold:FoldBitCast to fully support vector …

af641ff

…ConstantInt/FP. (llvm#116787) This fixes the code quality issue reported in llvm#111149.

[clang][bytecode] Check FromPtr in BitCastPtr (llvm#117142)

1425fa9

[mlir][linalg][nfc] Update pack-dynamic-inner-tile.mlir (llvm#116788)

d7d6fb1

Following on from llvm#116373, updates "pack-dynamic-inner-tile.mlir" to use TD Ops for all transformations except for lowering to LLVM. This is an intermediate step before introducing vectorization.

[LLVM][IR] Teach extractelement folds about constant ConstantInt/FP. (l…

4872ecf

…lvm#116793)

[clang] constexpr built-in reduce or and xor function. (llvm#116976)

ddb62d2

Part of llvm#51787. Follow up of llvm#116822. This patch adds constexpr support for the built-in reduce `or` and `xor` functions.

[InstCombine] Remove unused variable in InstCombineCompares.cpp (NFC)

aa74649

/llvm-project/llvm/lib/Transforms/InstCombine/InstCombineCompares.cpp:3190:14: error: unused variable 'CmpBW' [-Werror,-Wunused-variable] unsigned CmpBW = Ty->getScalarSizeInBits(); ^ 1 error generated.

[SystemZ][z/OS] Add back removed AutoConvert.h headers that were inco…

9cada10

…rrectly identified as unused to fix a build error on z/OS

[Clang] Eliminate shadowing warnings for parameters of explicit objec…

d23449d

…t member functions (llvm#114813) Fixes llvm#95707.

Adjust MSVC disabled optimization pragmas to be _MSC_VER only (llvm#1…

d800ea7

…16704) Alter the #ifdef values from llvm#110986 and llvm#115292 to use _MSC_VER instead of _WIN32 to stop the pragmas being used on gcc/mingw builds Noticed by @mstorsjo

Reapply "[NFC] Explicitly pass a VFS when creating DiagnosticsEngine (l…

df9a14d

…lvm#115852)" This reverts commit a1153cd with fixes to lldb breakages. Fixes llvm#117145.

[rtsan] Add sched_yield interceptor (llvm#117084)

963b8e3

This calls the system calls switch_pri and sys_ulock_wait. It also is one of the more straightforwardly rt-unsafe, in that it gives up this thread's timeslice.

[rtsan] NFC: Update docs with customizable functions (llvm#117086)

a12e79a

Fix typo "intead"

d6fc7d3

[clang][ExprConst] Reject field access with nullptr base (llvm#113885)

685e41e

Reject them if the base is null, not only if the entire pointer is null. Fixes llvm#113821

Release note lldb completion improvements (llvm#117058)

8bfa87c

[mlir][spirv]: Add Broadcom Vendor (llvm#116600)

e8b5c00

This PR is simply adding the Broadcom vendor ID to the SPIRV list. In order to enable the use of this vendor ID in a SPIRV pipeline for the Videocore GPUs.

[XCore] Use getSignedConstant()

0cb1cca

kazutakahirata and others added 30 commits November 22, 2024 11:53

[memprof] Remove MemProf format Version 1 (llvm#117357)

ad2bdd8

This patch removes MemProf format Version 1 now that Version 2 and 3 are working well.

[RISCV] Add explicit VLS test line for vector spill/fill

6da8ff8

I got asked about this offline and realized we didn't really have tests specific to the VLS frame lowering.

AMDGPU: Add v_smfmac_f32_32x32x64_bf8_bf8 for gfx950 (llvm#117256)

8a5c241

AMDGPU: Add v_smfmac_f32_32x32x64_bf8_fp8 for gfx950 (llvm#117257)

8d3435f

AMDGPU: Add v_smfmac_f32_32x32x32x64_fp8_bf8 for gfx950 (llvm#117258)

90dc644

AMDGPU: Add v_smfmac_f32_32x32x64_fp8_fp8 for gfx950 (llvm#117259)

7d544c6

AMDGPU: Add basic verification for mfma scale intrinsics (llvm#117048)

a05a1d6

Verify the format is valid and the type is one of the expected i32 vectors. Verify the used vector types at least cover the requirements of the corresponding format operand.

[libc] Fix unpoisoning for recvfrom (llvm#117366)

182f9aa

Turns out there were also errors in the recvfrom unpoisoning logic. This patch fixes those.

[SelectionDAG] Fix some SDNode type mismatches between *.td files and…

e131b0d

… ISel (llvm#117375) This removes operands/results either in SDNode description or in ISel code so that they match each other.

[lldb] Fix ELF core debugging (llvm#117070)

1290e95

DynamicLoader does not use ProcessElfCore NT_FILE entries to get UUID. Use GetModuleSpec to get UUID from Process.

Revert "Fix up MCPlusBuilder.cpp to account for W0_HI on AArch64"

2704647

This reverts commit 576865a. Depends on llvm#114827 that was reverted.

[gn build] Port 028d41d

094ef38

[gn build] Port 1434d2a

0ffdaf4

Revert "[MemProf] Use radix tree for alloc contexts in bitcode summar…

fdb050a

…ies" (llvm#117395) Reverts llvm#117066 This is causing some build bot failures that need investigation.

[LLD][MachO] Enable plugin support for LTO (llvm#115690)

b4e000e

Add new CLI options for feature parity with ELF w.r.t pass plugins. Most of the changes are ported directly from llvm@0c86198. With this change, it is now possible to load and run external pass plugins during the LTO phase.

[libc][NFC] Remove template arguments from Block (llvm#117386)

d121d71

[RISCV] Move rvv-cfi-info.ll to rvv directory. NFC

9edbe56

[X86] Fix the type of X86ISD::UMUL (llvm#117377)

aa5dc53

Same as SMUL, UMUL produces one result + flags, not two results + flags.

[AutoBump] Merge with 776476c (Nov 22)

6addf07

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[AutoBump] Merge with 776476c2 (Nov 22) (13) #477

[AutoBump] Merge with 776476c2 (Nov 22) (13) #477

jorickert commented Feb 20, 2025

[AutoBump] Merge with 776476c2 (Nov 22) (13) #477

Are you sure you want to change the base?

[AutoBump] Merge with 776476c2 (Nov 22) (13) #477

Conversation

jorickert commented Feb 20, 2025