forked from llvm/llvm-project
-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[AutoBump] Merge with 776476c2 (Nov 22) (13) #477
Open
jorickert
wants to merge
284
commits into
bump_to_cbc78022
Choose a base branch
from
bump_to_776476c2
base: bump_to_cbc78022
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
…zero-call-used-regs (llvm#116995) Previously, with `-fzero-call-used-regs` clang/LLVM would incorrectly emit Neon instructions in streaming functions, and streaming-compatible functions without SVE. With this change: * In streaming functions, Z/p registers will be zeroed * In streaming compatible functions w/o SVE, D registers will be zeroed - (As Neon vector instructions are illegal including `movi v..`)
) Starting with 41e3919 DiagnosticsEngine creation might perform IO. It was implicitly defaulting to getRealFileSystem. This patch makes it explicit by pushing the decision making to callers. It uses ambient VFS if one is available, and keeps using `getRealFileSystem` if there aren't any VFS.
…lvm#116856) This brings the printing of scalable vector constant splats inline with their fixed length counterparts.
…lvm#117009) The relevant bit from the Intel SDM for vinsertps semantics: ``` IF (SRC = REG) THEN COUNT_S := imm8[7:6] ELSE COUNT_S := 0 ``` This is now taken into account.
…lvm#115852)" Reverted for causing: llvm#117145 This reverts commit bdd10d9.
…ConstantInt/FP. (llvm#116787) This fixes the code quality issue reported in llvm#111149.
Following on from llvm#116373, updates "pack-dynamic-inner-tile.mlir" to use TD Ops for all transformations except for lowering to LLVM. This is an intermediate step before introducing vectorization.
Motivating case from https://github.com/torvalds/linux/blob/9852d85ec9d492ebef56dc5f229416c925758edc/drivers/gpu/drm/drm_edid.c#L5238-L5240: ``` define i1 @src(i8 noundef %v13) { entry: %conv1 = zext i8 %v13 to i32 %add = add nsw i32 %conv1, -4 %cmp = icmp ult i32 %add, 3 %cmp4 = icmp slt i8 %v13, 4 %cond = select i1 %cmp4, i1 true, i1 %cmp ret i1 %cond } define i1 @tgt(i8 noundef %v13) { entry: %cmp4 = icmp slt i8 %v13, 7 ret i1 %cmp4 } ```
Part of llvm#51787. Follow up of llvm#116822. This patch adds constexpr support for the built-in reduce `or` and `xor` functions.
/llvm-project/llvm/lib/Transforms/InstCombine/InstCombineCompares.cpp:3190:14: error: unused variable 'CmpBW' [-Werror,-Wunused-variable] unsigned CmpBW = Ty->getScalarSizeInBits(); ^ 1 error generated.
…rrectly identified as unused to fix a build error on z/OS
…t member functions (llvm#114813) Fixes llvm#95707.
…16704) Alter the #ifdef values from llvm#110986 and llvm#115292 to use _MSC_VER instead of _WIN32 to stop the pragmas being used on gcc/mingw builds Noticed by @mstorsjo
…lvm#115852)" This reverts commit a1153cd with fixes to lldb breakages. Fixes llvm#117145.
If the size is larger than the index width, truncate it instead of asserting. Longer-term we should consider rejecting types larger than the index size in the verifier, though this is probably tricky in practice (it's address space dependent, and types are owned by the context, not the module). Fixes llvm#116960.
The previous fix llvm@c641497 failed to consider the fact that the call graph update doesn't make any sense if the caller node hasn't been populated in the LazyCallGraph yet. This patch changes to skip this CG update step when that happens.
) When we create a thunk we don't know whether it will be short or long. Move the emission of the long thunk mapping symbol to when we transition to a long thunk. This improves disassembly and binary analysis as tools like BOLT identify thunks by disassembly. This removes a FIXME added in llvm#108989 aarch64-thunk-bti-multipass.s which had a corrupt disassembly due to missing mapping symbols.
This calls the system calls switch_pri and sys_ulock_wait. It also is one of the more straightforwardly rt-unsafe, in that it gives up this thread's timeslice.
…ual_stacks) (llvm#117069) Following the example of tsan, where we took the name This would allow users to determine if they want to see ALL output from rtsan. Additionally, remove the UNLIKELY hint, as it is now up to the flag whether or not it is likely that we go through this conditional.
…CallBI function (llvm#115496) This commit adds an assert statement to the CallBI function to ensure that the interpreter state (S.Current) is correctly reset to the previous frame (FrameBefore) after InterpretBuiltin returns true. This helps catch any potential issues during development and debugging.
Reject them if the base is null, not only if the entire pointer is null. Fixes llvm#113821
This PR is simply adding the Broadcom vendor ID to the SPIRV list. In order to enable the use of this vendor ID in a SPIRV pipeline for the Videocore GPUs.
This commit addresses several Static Analyzer issues related to potential null dereference by replacing dyn_cast<> with cast<> and getAs<> with castAs<> in various parts of the codes. The cast function asserts that the cast is valid, ensuring that the pointer is not null and preventing null dereference errors. The changes are made in the following files: CGBuiltin.cpp: Ensure vector types have exactly 3 elements. CGExpr.cpp: Ensure member declarations are field declarations. AnalysisBasedWarnings.cpp: Ensure operations are member expressions. SemaExprMember.cpp: Ensure base types are extended vector types. These changes ensure that the types are correctly cast and prevent potential null dereference issues, improving the robustness and safety of the code.
This patch removes MemProf format Version 1 now that Version 2 and 3 are working well.
I got asked about this offline and realized we didn't really have tests specific to the VLS frame lowering.
Verify the format is valid and the type is one of the expected i32 vectors. Verify the used vector types at least cover the requirements of the corresponding format operand.
…analysis (llvm#117324) Doug implemented quite literally all of it and has been continuously improving the implementation by handling more language constructs we had initially missed. I spent a lot of time reviewing the implementation of the attributes as well as the analysis pass, so in other words, the two of us are probably best equipped to answer any questions that might arise wrt this part of Clang.
Patch allows to vector scalar instruction + poison values as if poisons are instructions with the same opcode. It allows better vectorization of the repeated values, reduces number of insertelement instructions and serves as a base ground for copyable elements vectorization AVX512, -O3 + LTO JM/ldecod - better vector code Applications/oggenc - better vectorization CINT2017speed/625.x264_s CINT2017rate/525.x264_r - better vector code CFP2017rate/526.blender_r - better vector code CFP2006/447.dealII - small variations Benchmarks/Bullet - extra vector code CFP2017rate/510.parest_r - better vectorization CINT2017rate/502.gcc_r CINT2017speed/602.gcc_s - extra vector code Benchmarks/tramp3d-v4 - small variations CFP2006/453.povray - extra vector code JM/lencod - better vector code CFP2017rate/511.povray_r - extra vector code MemFunctions/MemFunctions - extra vector code LoopVectorization/LoopVectorizationBenchmarks - extra vector code XRay/FDRMode - extra vector code XRay/ReturnReference - extra vector code LCALS/SubsetCLambdaLoops - extra vector code LCALS/SubsetCRawLoops - extra vector code LCALS/SubsetARawLoops - extra vector code LCALS/SubsetALambdaLoops - extra vector code DOE-ProxyApps-C++/miniFE - extra vector code LoopVectorization/LoopInterleavingBenchmarks - extra vector code LCALS/SubsetBLambdaLoops - extra vector code MicroBenchmarks/harris - extra vector code ImageProcessing/Dither - extra vector code MicroBenchmarks/SLPVectorization - extra vector code ImageProcessing/Blur - extra vector code ImageProcessing/Dilate - extra vector code Builtins/Int128 - extra vector code ImageProcessing/Interpolation - extra vector code ImageProcessing/BilateralFiltering - extra vector code ImageProcessing/AnisotropicDiffusion - extra vector code MicroBenchmarks/LoopInterchange - extra code vectorized LCALS/SubsetBRawLoops - extra code vectorized CINT2006/464.h264ref - extra vectorization with wider vectors CFP2017rate/508.namd_r - small variations, extra phis vectorized CFP2006/444.namd - 2 2 x phi replaced by 4 x phi DOE-ProxyApps-C/SimpleMOC - extra code vectorized CINT2017rate/541.leela_r CINT2017speed/641.leela_s - the function better vectorized and inlined Benchmarks/Misc/oourafft - 2 4 x bit reductions replaced by 2 x vector code FreeBench/fourinarow - better vectorization Reviewers: RKSimon Reviewed By: RKSimon Pull Request: llvm#115946
Patch uses getExtendedReduction for reductions of ext-based nodes + adds cost estimation for ctpop-kind reductions into basic implementation and RISCV-V specific vcpop cost estimation. Reviewers: RKSimon, preames Reviewed By: preames Pull Request: llvm#117350
Extend existing store widening pass to widen load instructions. This patch also borrows the alias check algorithm from AMDGPU's load store widening pass. Widened load instruction is inserted before the first candidate load instruction. Widened store instruction is inserted after the last candidate store instruction. This method helps avoid moving uses/defs when replacing load/store instructions with their widened equivalents. The pass has also been extended to * Generate 64-bit widened stores * Handle 32-bit post increment load/store * Handle stores of non-immediate values * Handle stores where the offset is a GlobalValue
A recent commit (23d7a6c) introduced a dependency on libLLVMMC.so. This is to handle the `-print-supported-cpus` option which uses `llvm/MC/SubtargetInfo`. It requires libLLVMMC to be linked into the flang-driver which the previous commit did not do. This fixes that issue.
Summary: Previous patches have made the `rpc.h` header independent of the `libc` internals. This allows us to include it directly rather than providing an indirect C API. This patch only does the work to move the header. A future patch will pull out the `rpc_server` interface and simply replace it with a single function that handles the opcodes.
Turns out there were also errors in the recvfrom unpoisoning logic. This patch fixes those.
Unfortunately there's no upstream frontend for Metal but since the id's are now assigned by the DWARF standard I think it makes sense to have the enums upstream to enable tools like llvm-dwarfdump. This patch therefore uses an AArch64 test with artificially modified debug info to verify that the Metal language id can be used. https://dwarfstd.org/issues/241111.1.html
… ISel (llvm#117375) This removes operands/results either in SDNode description or in ISel code so that they match each other.
DynamicLoader does not use ProcessElfCore NT_FILE entries to get UUID. Use GetModuleSpec to get UUID from Process.
This reverts commit 576865a. Depends on llvm#114827 that was reverted.
This disables `readability-identifier-naming` for the source files, since names don't have to by _Uglified in the source files. We currently don't enforce clang-tidy in the source files, so this is only useful to avoid a bunch of warnings when using an editor that shows the results of clang-tidy.
…ad values removed (llvm#116519) This change is related to discussion: https://discourse.llvm.org/t/question-on-criteria-for-acceptable-ir-in-removedeadvaluespass/83131 I do not know the original reason to disallow the optimization on modules with global private constant. Please let me know what am I missing, I will be happy to make it better. Thank you! CC: @Wheest --------- Co-authored-by: Renat Idrisov <[email protected]>
…m#117066) Leverage the support added to represent allocation contexts in a more compact way via a radix tree in the indexed profile to similarly reduce sizes of the bitcode summaries. For a large target, this reduced the size of the per-module summaries by about 18% and in the distributed combined index files by 28%.
…ies" (llvm#117395) Reverts llvm#117066 This is causing some build bot failures that need investigation.
Add new CLI options for feature parity with ELF w.r.t pass plugins. Most of the changes are ported directly from llvm@0c86198. With this change, it is now possible to load and run external pass plugins during the LTO phase.
Same as SMUL, UMUL produces one result + flags, not two results + flags.
…ries" (llvm#117395) (llvm#117404) This reverts commit fdb050a, and restores ccb4702, with a fix for build bot failures. Specifically, add ProfileData to the dependences of the BitWriter library, which was causing shared library builds of LLVM to fail. Reproduced the failure with a shared library build and confirmed this change fixes that build failure.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
No description provided.