Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[AutoBump] Merge with 776476c2 (Nov 22) (13) #477

Open
wants to merge 284 commits into
base: bump_to_cbc78022
Choose a base branch
from

Conversation

jorickert
Copy link

No description provided.

metaflow and others added 30 commits November 21, 2024 11:41
…zero-call-used-regs (llvm#116995)

Previously, with `-fzero-call-used-regs` clang/LLVM would incorrectly
emit Neon instructions in streaming functions, and streaming-compatible
functions without SVE.

With this change:

* In streaming functions, Z/p registers will be zeroed
* In streaming compatible functions w/o SVE, D registers will be zeroed
  - (As Neon vector instructions are illegal including `movi v..`)
)

Starting with 41e3919 DiagnosticsEngine
creation might perform IO. It was implicitly defaulting to
getRealFileSystem. This patch makes it explicit by pushing the decision
making to callers.

It uses ambient VFS if one is available, and keeps using
`getRealFileSystem` if there aren't any VFS.
…lvm#116856)

This brings the printing of scalable vector constant splats inline with
their fixed length counterparts.
…lvm#117009)

The relevant bit from the Intel SDM for vinsertps semantics:
```
IF (SRC = REG) THEN COUNT_S := imm8[7:6] ELSE COUNT_S := 0
```

This is now taken into account.
…ConstantInt/FP. (llvm#116787)

This fixes the code quality issue reported in
llvm#111149.
Following on from llvm#116373, updates "pack-dynamic-inner-tile.mlir" to use
TD Ops for all transformations except for lowering to LLVM.

This is an intermediate step before introducing vectorization.
Motivating case from
https://github.com/torvalds/linux/blob/9852d85ec9d492ebef56dc5f229416c925758edc/drivers/gpu/drm/drm_edid.c#L5238-L5240:
```
define i1 @src(i8 noundef %v13) {
entry:
  %conv1 = zext i8 %v13 to i32
  %add = add nsw i32 %conv1, -4
  %cmp = icmp ult i32 %add, 3
  %cmp4 = icmp slt i8 %v13, 4
  %cond = select i1 %cmp4, i1 true, i1 %cmp
  ret i1 %cond
}

define i1 @tgt(i8 noundef %v13) {
entry:
  %cmp4 = icmp slt i8 %v13, 7
  ret i1 %cmp4
}
```
Part of llvm#51787.
Follow up of llvm#116822.

This patch adds constexpr support for the built-in reduce `or` and `xor`
functions.
/llvm-project/llvm/lib/Transforms/InstCombine/InstCombineCompares.cpp:3190:14:
 error: unused variable 'CmpBW' [-Werror,-Wunused-variable]
    unsigned CmpBW = Ty->getScalarSizeInBits();
             ^
1 error generated.
…rrectly identified as unused to fix a build error on z/OS
…16704)

Alter the #ifdef values from llvm#110986 and llvm#115292 to use _MSC_VER instead of _WIN32 to stop the pragmas being used on gcc/mingw builds

Noticed by @mstorsjo
If the size is larger than the index width, truncate it instead
of asserting.

Longer-term we should consider rejecting types larger than the
index size in the verifier, though this is probably tricky in
practice (it's address space dependent, and types are owned by
the context, not the module).

Fixes llvm#116960.
The previous fix
llvm@c641497
failed to consider the fact that the call graph update doesn't make any
sense if the caller node hasn't been populated in the LazyCallGraph yet.
This patch changes to skip this CG update step when that happens.
)

When we create a thunk we don't know whether it will be short or long.
Move the emission of the long thunk mapping symbol to when we transition
to a long thunk. This improves disassembly and binary analysis as tools
like BOLT identify thunks by disassembly.

This removes a FIXME added in llvm#108989 aarch64-thunk-bti-multipass.s
which had a corrupt disassembly due to missing mapping symbols.
This calls the system calls switch_pri and sys_ulock_wait. It also is
one of the more straightforwardly rt-unsafe, in that it gives up this
thread's timeslice.
…ual_stacks) (llvm#117069)

Following the example of tsan, where we took the name

This would allow users to determine if they want to see ALL output from
rtsan.

Additionally, remove the UNLIKELY hint, as it is now up to the flag whether or
not it is likely that we go through this conditional.
…CallBI function (llvm#115496)

This commit adds an assert statement to the CallBI function to ensure
that the interpreter state (S.Current) is correctly reset to the
previous frame (FrameBefore) after InterpretBuiltin returns true. This
helps catch any potential issues during development and debugging.
Reject them if the base is null, not only if the entire pointer is null.

Fixes llvm#113821
This PR is simply adding the Broadcom vendor ID to the SPIRV list. In
order to enable the use of this vendor ID in a SPIRV pipeline for the
Videocore GPUs.
This commit addresses several Static Analyzer issues related to
potential null dereference by replacing dyn_cast<> with cast<> and
getAs<> with castAs<> in various parts of the codes.

The cast function asserts that the cast is valid, ensuring that the
pointer is not null and preventing null dereference errors.

The changes are made in the following files:
CGBuiltin.cpp: Ensure vector types have exactly 3 elements.
CGExpr.cpp: Ensure member declarations are field declarations.
AnalysisBasedWarnings.cpp: Ensure operations are member expressions.
SemaExprMember.cpp: Ensure base types are extended vector types.

These changes ensure that the types are correctly cast and prevent
potential null dereference issues, improving the robustness and safety
of the code.
kazutakahirata and others added 30 commits November 22, 2024 11:53
This patch removes MemProf format Version 1 now that Version 2 and 3
are working well.
I got asked about this offline and realized we didn't really have
tests specific to the VLS frame lowering.
Verify the format is valid and the type is one of the expected
i32 vectors. Verify the used vector types at least cover the
requirements of the corresponding format operand.
…analysis (llvm#117324)

Doug implemented quite literally all of it and has been continuously
improving the implementation by handling more language constructs we had
initially missed. I spent a lot of time reviewing the implementation of
the attributes as well as the analysis pass, so in other words, the two
of us are probably best equipped to answer any questions that might
arise wrt this part of Clang.
Patch allows to vector scalar instruction + poison values as if poisons
are instructions with the same opcode. It allows better vectorization of
the repeated values, reduces number of insertelement instructions and
serves as a base ground for copyable elements vectorization

AVX512, -O3 + LTO

JM/ldecod - better vector code
Applications/oggenc - better vectorization
CINT2017speed/625.x264_s
CINT2017rate/525.x264_r - better vector code
CFP2017rate/526.blender_r - better vector code
CFP2006/447.dealII - small variations
Benchmarks/Bullet - extra vector code
CFP2017rate/510.parest_r - better vectorization
CINT2017rate/502.gcc_r
CINT2017speed/602.gcc_s - extra vector code
Benchmarks/tramp3d-v4 - small variations
CFP2006/453.povray - extra vector code
JM/lencod - better vector code
CFP2017rate/511.povray_r - extra vector code
MemFunctions/MemFunctions - extra vector code
LoopVectorization/LoopVectorizationBenchmarks - extra vector code
XRay/FDRMode - extra vector code
XRay/ReturnReference - extra vector code
LCALS/SubsetCLambdaLoops - extra vector code
LCALS/SubsetCRawLoops - extra vector code
LCALS/SubsetARawLoops - extra vector code
LCALS/SubsetALambdaLoops - extra vector code
DOE-ProxyApps-C++/miniFE - extra vector code
LoopVectorization/LoopInterleavingBenchmarks - extra vector code
LCALS/SubsetBLambdaLoops - extra vector code
MicroBenchmarks/harris - extra vector code
ImageProcessing/Dither - extra vector code
MicroBenchmarks/SLPVectorization - extra vector code
ImageProcessing/Blur - extra vector code
ImageProcessing/Dilate - extra vector code
Builtins/Int128 - extra vector code
ImageProcessing/Interpolation - extra vector code
ImageProcessing/BilateralFiltering - extra vector code
ImageProcessing/AnisotropicDiffusion - extra vector code
MicroBenchmarks/LoopInterchange - extra code vectorized
LCALS/SubsetBRawLoops - extra code vectorized
CINT2006/464.h264ref - extra vectorization with wider vectors
CFP2017rate/508.namd_r - small variations, extra phis vectorized
CFP2006/444.namd - 2 2 x phi replaced by 4 x phi
DOE-ProxyApps-C/SimpleMOC - extra code vectorized
CINT2017rate/541.leela_r
CINT2017speed/641.leela_s - the function better vectorized and inlined
Benchmarks/Misc/oourafft - 2 4 x bit reductions replaced by 2 x vector code
FreeBench/fourinarow - better vectorization

Reviewers: RKSimon

Reviewed By: RKSimon

Pull Request: llvm#115946
Patch uses getExtendedReduction for reductions of ext-based nodes + adds
cost estimation for ctpop-kind reductions into basic implementation and
RISCV-V specific vcpop cost estimation.

Reviewers: RKSimon, preames

Reviewed By: preames

Pull Request: llvm#117350
Extend existing store widening pass to widen load instructions.

This patch also borrows the alias check algorithm from AMDGPU's load
store widening pass.

Widened load instruction is inserted before the first candidate load
instruction.
Widened store instruction is inserted after the last candidate store
instruction.
This method helps avoid moving uses/defs when replacing load/store
instructions with their widened equivalents.

The pass has also been extended to
* Generate 64-bit widened stores
* Handle 32-bit post increment load/store
* Handle stores of non-immediate values
* Handle stores where the offset is a GlobalValue
A recent commit (23d7a6c) introduced a dependency on
libLLVMMC.so. This is to handle the `-print-supported-cpus` option which
uses `llvm/MC/SubtargetInfo`. It requires libLLVMMC to be linked into
the flang-driver which the previous commit did not do. This fixes that
issue.
Summary:
Previous patches have made the `rpc.h` header independent of the `libc`
internals. This allows us to include it directly rather than providing
an indirect C API. This patch only does the work to move the header. A
future patch will pull out the `rpc_server` interface and simply replace
it with a single function that handles the opcodes.
Turns out there were also errors in the recvfrom unpoisoning logic. This
patch fixes those.
Unfortunately there's no upstream frontend for Metal but since the id's
are now assigned by the DWARF standard I think it makes sense to have
the enums upstream to enable tools like llvm-dwarfdump. This patch
therefore uses an AArch64 test with artificially modified debug info to
verify that the Metal language id can be used.

https://dwarfstd.org/issues/241111.1.html
… ISel (llvm#117375)

This removes operands/results either in SDNode description or in ISel
code so that they match each other.
DynamicLoader does not use ProcessElfCore NT_FILE entries to get
UUID. Use GetModuleSpec to get UUID from Process.
This disables `readability-identifier-naming` for the source files,
since names don't have to by _Uglified in the source files. We currently
don't enforce clang-tidy in the source files, so this is only useful to
avoid a bunch of warnings when using an editor that shows the results of
clang-tidy.
…ad values removed (llvm#116519)

This change is related to discussion:

https://discourse.llvm.org/t/question-on-criteria-for-acceptable-ir-in-removedeadvaluespass/83131

I do not know the original reason to disallow the optimization on
modules with global private constant. Please let me know what am I
missing, I will be happy to make it better. Thank you!

CC: @Wheest

---------

Co-authored-by: Renat Idrisov <[email protected]>
…m#117066)

Leverage the support added to represent allocation contexts in a more
compact way via a radix tree in the indexed profile to similarly reduce
sizes of the bitcode summaries.

For a large target, this reduced the size of the per-module summaries by
about 18% and in the distributed combined index files by 28%.
…ies" (llvm#117395)

Reverts llvm#117066

This is causing some build bot failures that need investigation.
Add new CLI options for feature parity with ELF w.r.t pass plugins.
Most of the changes are ported directly from
llvm@0c86198.
With this change, it is now possible to load and run external pass
plugins during the LTO phase.
Same as SMUL, UMUL produces one result + flags, not two results + flags.
…ries" (llvm#117395) (llvm#117404)

This reverts commit fdb050a, and
restores ccb4702, with a fix for build
bot failures.

Specifically, add ProfileData to the dependences of the BitWriter
library, which was causing shared library builds of LLVM to fail.
Reproduced the failure with a shared library build and confirmed this
change fixes that build failure.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.