Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CIR] Add inline interface to CIR dialect #1164

Open
wants to merge 2,097 commits into
base: main
Choose a base branch
from

Conversation

keryell
Copy link
Collaborator

@keryell keryell commented Nov 25, 2024

This allows the inliner to work with the CIR dialect.

gitoleg and others added 30 commits November 22, 2024 18:05
This PR adds aarch64 big endian support.

Basically the support for aarch64_be itself is expressed only in two
extra cases for the switch statement and changes in the `CIRDataLayout`
are needed to prove that we really support big endian. Hence the idea
for the test - I think the best way for proof is something connected
with bit-fields, so we compare the results of the original codegen and
ours.
This PR splits the old `cir-simplify` pass into two new passes, namely
`cir-canonicalize` and `cir-simplify` (the new `cir-simplify`). The
`cir-canonicalize` pass runs transformations that do not affect
CIR-to-source fidelity much, such as operation folding and redundant
operation elimination. On the other hand, the new `cir-simplify` pass
runs transformations that may significantly change the code and break
high-level code analysis passes, such as more aggresive code
optimizations.

This PR also updates the CIR-to-CIR pipeline to fit these two new
passes. The `cir-canonicalize` pass is moved to the very front of the
pipeline, while the new `cir-simplify` pass is moved to the back of the
pipeline (but still before lowering prepare of course). Additionally,
the new `cir-simplify` now only runs when the user specifies a non-zero
optimization level on the frontend.

Also fixed some typos and resolved some `clang-tidy` complaints along
the way.

Resolves llvm#827 .
Currently the C style cast is not implemented/supported for unions.

This PR adds support for union casts as done in `CGExprAgg.cpp`. I have
also added an extra test in `union-init.c`.
Mistakenly closed llvm#850

llvm#850 (review)
 
This PR fixes array initialization for expression arguments. 

Consider the following code snippet `test.c`: 
```
typedef struct {
  int a;
  int b[2];
} A;

int bar() {
  return 42;
}

void foo() {
  A a = {bar(), {}};
}
```
When ran with `bin/clang test.c -Xclang -fclangir -Xclang -emit-cir -S
-o -`, It produces the following error:
```
~/clangir/clang/lib/CIR/CodeGen/CIRGenExprAgg.cpp:483: void {anonymous}::AggExprEmitter::buildArrayInit(cir::Address, mlir::cir::ArrayType, clang::QualType, clang::Expr*, llvm::ArrayRef<clang::Expr*>, clang::Expr*): Assertion `NumInitElements != 0' failed.
```
The error can be traced back to `CIRGenExprAgg.cpp`, and the fix is
simple. It is possible to have an empty array initialization as an
expression argument!
As title, if element type of vector type is sized, then the vector type
should be deemed sized.
This would enable us generate code for neon without triggering assertion
…eon_vrndaq_v (llvm#871)

as title. 
This also added NeonType support for Float32

Co-authored-by: Guojin He <[email protected]>
It will hit another assert when calling initFullExprCleanup.
This PR fixes the case, when a temporary var is used, and `alloca`
operation is inserted in the block start before the `label` operation.
Implementation: when we search for the `alloca` place in a block, we
take label operations into account as well.

Fix llvm#870

---------

Co-authored-by: Bruno Cardoso Lopes <[email protected]>
__attribute__((annotate()) was only accepting integer literals,
preventing some meta-programming usage for example.
This should be extended to some other kinds of types.

---------

Co-authored-by: Bruno Cardoso Lopes <[email protected]>
Just as the title says, but only covers non-exception path, that's
coming next.
Nothing unblocked yet, just hit next assert in the same path.
… exceptions

Code path still hits an assert sooner, incremental NFC step.
…lvm#878)

Close llvm#876

We've already considered the case that there are random stmt after a
switch case:

```
for (auto *c : compoundStmt->body()) {
      if (auto *switchCase = dyn_cast<SwitchCase>(c)) {
        res = buildSwitchCase(*switchCase, condType, caseAttrs);
      } else if (lastCaseBlock) {
        // This means it's a random stmt following up a case, just
        // emit it as part of previous known case.
        mlir::OpBuilder::InsertionGuard guardCase(builder);
        builder.setInsertionPointToEnd(lastCaseBlock);
        res = buildStmt(c, /*useCurrentScope=*/!isa<CompoundStmt>(c));
      } else {
        llvm_unreachable("statement doesn't belong to any case region, NYI");
      }

      lastCaseBlock = builder.getBlock();

      if (res.failed())
        break;
}
```

However, maybe this is an oversight, in the branch of ` if
(lastCaseBlock)`, the insertion point will be updated automatically when
the RAII object `guardCase` destroys, then we can assign the correct
value for `lastCaseBlock` later. So we will see the weird code pattern
in the issue side.

BTW, I found the codes in CIRGenStmt.cpp are far more less similar with
the ones other code gen places. Is this intentional? And what is the
motivation and guide lines here?
…lvm#882)

As title. 
Notice that for those intrinsics, just like OG, we do not lower to llvm
intrinsics, instead, do vector insert.
The test case is partially from OG
[aarch64-neon-vget.c](https://github.com/llvm/clangir/blob/85bc6407f559221afebe08a60ed2b50bf1edf7fa/clang/test/CodeGen/aarch64-neon-vget.c)
But, I did not do all signed and unsigned int tests because unsigned and
signed of the same width essentially just use the same intrinsic ID thus
exactly same code path as far as this PR concerns.

---------

Co-authored-by: Guojin He <[email protected]>
Reviewers: bcardosolopes

Reviewed By: bcardosolopes

Pull Request: llvm#881
…#877)

We need a target-independent way to distinguish OpenCL kernels in
ClangIR. This PR adds a unit attribute `OpenCLKernelAttr` similar to the
one in Clang AST.

This attribute is attached to the extra attribute dictionary of
`cir.func` operations only. (Not for `cir.call`.)
…'case' (llvm#879)

Motivation example:

```
extern "C" void action1();
extern "C" void action2();

extern "C" void case_follow_label(int v) {
  switch (v) {
    case 1:
    label:
    case 2:
      action1();
      break;
    default:
      action2();
      goto label;
  }
}
```

When we compile it, we will meet:

```
  case Stmt::CaseStmtClass:
  case Stmt::DefaultStmtClass:
    assert(0 &&
           "Should not get here, currently handled directly from SwitchStmt");
    break;
```

in `buildStmt`. The cause is clear. We call `buildStmt` when we build
the label stmt.

To solve this, I think we should be able to build case stmt in
buildStmt. But the new problem is, we need to pass the information like
caseAttr and condType. So I tried to add such informations in
CIRGenFunction as data member.
llvm#884)

as title. 
This PR has simliar test case organization as to
[PR882](llvm#882)
Notice that comparing to OG, this PR combines cases for some pairs of
intrinsics such as
BI__builtin_neon_vget_lane_f32 and BI__builtin_neon_vdups_lane_f32. 
They have the same code generated in OG and CIRGen
OG separate them into different case handling because it passes
mnemonics which are different. CIRGen doesn't pass that so why not
combine them.

Co-authored-by: Guojin He <[email protected]>
as title, this would complete solution to fix issue [LLVM lowering
missing comdat and constant
attributes](llvm#801)
as title. 
Also add function buildCommonNeonBuiltinExpr just like OG's
emitCommonNeonBuiltinExpr. This might help consolidate neon cases and
share common code.
Notice:

- I pretty much keep the skeleton of OG's emitCommonNeonBuiltinExpr at
the cost of that we didn't use a few variables they calculate. They
might help in the future.
- The purpose of having CommonNeonBuiltinExpr is to reduce
implementation code duplication. So far, we only have one type
implemented, and it's hard for CIR to be more generic. But we should see
if in future we can have different types of intrinsics share more
generic code path.

---------

Co-authored-by: Guojin He <[email protected]>
…no override (llvm#893)

As title. 
The test case used is abort(), but it is from the real code. 
Notice: Since CIR implementation for NoReturn Call is pending to
implement, the generated llvm code is like:
`define dso_local void @test() llvm#1  {
  call void @abort(), !dbg !8
  ret void
}`
which is not right, right code should be like, 
`
`define dso_local void @test() llvm#1  {
  call void @abort(), !dbg !8
  unreachable
}`
`
Still send this PR as Noreturn implementation is a separate issue.
orbiri and others added 22 commits November 22, 2024 19:10
It was always the intention for `cir.cmp` operations to return bool
result. Due
to missing constraints, a bug in codegen has slipped in which created
`cir.cmp`
operations with result type that matches the original AST expression
type. In
C, as opposed to C++, boolean expression types are "int". This resulted
with
extra operations being codegened around boolean expressions and their
usage.

This commit both enforces `cir.cmp` in the op definition and fixes the
mentioned bug.
This is the first patch to support TBAA, following the discussion at
llvm#1076 (comment)

- add skeleton for CIRGen, utilizing `decorateOperationWithTBAA`
- add empty implementation in `CIRGenTBAA`
- introduce `CIR_TBAAAttr` with empty body
- attach `CIR_TBAAAttr` to `LoadOp` and `StoreOp`
- no handling of vtable pointer
- no LLVM lowering
)

The title describes the purpose of the PR. It adds initial support for
structures with padding to the call convention lowering for AArch64.

I have also _initial support_ for the missing feature
[FinishLayout](https://github.com/llvm/clangir/blob/5c5d58402bebdb1e851fb055f746662d4e7eb586/clang/lib/AST/RecordLayoutBuilder.cpp#L786)
for records, and the logic is gotten from the original codegen.

Finally, I added a test for verification.
…m#1152)

The function `populateCIRToLLVMConversionPatterns` contains a spaghetti
of LLVM dialect conversion patterns, which results in merge conflicts
very easily. Besides, a few patterns are even registered for more than
once, possibly due to careless resolution of merge conflicts.

This PR attempts to mitigate this problem. Pattern names now are sorted
in alphabetical order, and each source code line now only lists exactly
one pattern name to reduce potential merge conflicts.
…ltinExpr (llvm#1133)

This PR is a NFC as we just NYI every builtID of neon SISD. We will
implement them in subsequent PRs.
…CFG (llvm#1147)

This PR implements NYI in CIRScopeOpFlattening. It seems to me the best
way is to let results of ScopeOp forwarded as block arguments of the
last block split from the cir.scope block.
…der (llvm#1157)

With [llvm-project#116090](llvm/llvm-project#116090)
merged, we can get rid of `#include "../../../../Basic/Targets.h"` now.
This PR moves the lowering code for data member pointers from the
conversion patterns to the implementation of CXXABI because this part
should be ABI-specific.
…ant arrays or pointers (llvm#1136)

This PR adds support for function arguments with structs that contain
constant arrays or pointers for AArch64.

For example, 
```
typedef struct {
  int a[42];
} CAT;

void pass_cat(CAT a) {}
```

As usual, the main ideas are gotten from the original
[CodeGen](https://github.com/llvm/clangir/blob/3aed38cf52e72cb51a907fad9dd53802f6505b81/clang/lib/AST/ASTContext.cpp#L1823),
and I have added a couple of tests.
…ge (64, 128) (llvm#1141)

This PR adds support for the lowering of AArch64 calls with structs
having sizes greater than 64 and less than 128.

The idea is from the original
[CodeGen](https://github.com/llvm/clangir/blob/da601b374deea6665f710f7e432dfa82f457059e/clang/lib/CodeGen/CGCall.cpp#L1329),
where we perform a coercion through memory for these type of calls.

I have added a test for this.
Recommit llvm#1101

I am not sure what happened. But that merged PR doesn't show in the git
log. Maybe the stacked PR may not get successed? But after all, we need
to land it again.

Following off are original commit messages:

---

This is the following of llvm#1100.

After llvm#1100, when we want to use
LongDouble for VAArg, we will be in trouble due to details in X86_64's
ABI and this patch tries to address this.

The practical impact the patch is, after this patch, with
llvm#1088 and a small following up fix,
we can build and run all C's benchmark in SpecCPU 2017. I think it is a
milestone.
…vm#1151)

They are rounding shift of vectors, and shift amount is from the least
significant byte of the corresponding element of the second input
vector. Thus, it is implemented in [its own ASM
](https://godbolt.org/z/v65sbeKaW). These make them not suitable to be
lowered to CIR ShiftOp though it supports vector type now.
Caught by ASAN. We were creating a reference to an object going out of
scope. Incidentally, this feels like the sort of issue the lifetime
checker will be great for detecting :)
…ut` (llvm#1156)

Since LLVM specific data layout string is not proper in ClangIR, this PR
replaces it with existing MLIR DLTI equivalent and eliminate the
redundancy.

Although the constructor of `LowerModule` of TargetLowering library
requires a llvm data layout string, it is not used currently. (I believe
it would also not matter in the future.) Therefore, this PR has no
functional change.
This allows the inliner to work with the CIR dialect.
@keryell
Copy link
Collaborator Author

keryell commented Nov 25, 2024

A motivating use case: Xilinx/mlir-aie@b5f21f8 (#1913)

Copy link
Member

@bcardosolopes bcardosolopes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great to see this getting starter, one question below

@@ -125,7 +160,7 @@ void cir::CIRDialect::initialize() {
#define GET_OP_LIST
#include "clang/CIR/Dialect/IR/CIROps.cpp.inc"
>();
addInterfaces<CIROpAsmDialectInterface>();
addInterfaces<CIRInlinerInterface, CIROpAsmDialectInterface>();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Incremental patches are way to go, I'm just a bit confused here: is this supposed to change any behavior at this point or is this just a skeleton that has no observed behavior? If this is changing behavior I'd expect a testcase and/or implementations for isLegalToInline to be returning false (for now).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does not change the behavior of CIR in this project since it is not used but allows other CIR users to use the inliner.
I was worried someone would ask for a test. 😉 An easy one would have been to just use the inliner pass (instead of the function) in opt except that a lot of other stuff is not yet implemented in CIR.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking two types of tests:

  • A test similar to clang/test/CIR/Transforms/mem2reg.cir.
  • Better yet if we add a flag to cc1 as well, like we do for mem2reg in clang/test/CIR/Transforms/mem2reg.c
  • Thoughts for the future: looks like we need a flag that enable all MLIR passes altogether, something like a -fclangir-mlir-opt of sorts, have you thought about that already?

Copy link
Collaborator Author

@keryell keryell Dec 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, having a test with --inline in the same way as you suggest with --mem2reg would be nice but even with #1203 there are still some crashes with --inline because it requires an analysis to work to compute the benefit of doing the inlining.
I need to investigate this.
In my project I do not have the problem because I call the low-level API which does not compute any cost model and just do the job. 😄

Copy link
Member

@bcardosolopes bcardosolopes Dec 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Btw, it's fine if you initially add one small test that passes while other wild attempts to use might crash/ fail, as long as we don't enable this by default we should be good to incrementally improve it! Basically hook up the driver flags and other setup and inline an empty function, for example.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.