Skip to content

Conversation

amemov
Copy link

@amemov amemov commented Mar 19, 2025

An attempt to resolve #4093

Initial implementation:

  • Defined the op in Linear.cpp

@amemov
Copy link
Author

amemov commented Mar 19, 2025

Hi, this is my first time contributing to the project - if you have any feedback or suggestions, I would really appreciate that.

@zjgarvey
Copy link
Collaborator

Thanks for picking this up.

There isn't any reason to include quantization logic for this op since it doesn't have any qdq fusion implemented in FuseQuantizedOps.cpp.

It would also be a bit better to implement this directly as a linalg.generic op, rather than unsqueezes and a matmul with a reduction dim size of 1. If you were to do the unsqueeze/matmul approach, it would be more appropriate to put this logic in DecomposeComplexOps.cpp.

Also, please do add e2e tests somewhere in ./projects/pt1/python/torch_mlir_e2e_test/test_suite/.

@amemov amemov marked this pull request as ready for review March 22, 2025 23:31
@amemov amemov force-pushed the AtenOuterOp-Lowering branch from cda896e to 2348344 Compare March 24, 2025 13:48
Copy link
Collaborator

@zjgarvey zjgarvey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After a change to the init tensor for the generic, I think this looks good!

Thanks for the changes.

@zjgarvey
Copy link
Collaborator

zjgarvey commented Apr 2, 2025

Also, be sure to run either pre-commit run --all (you will need to install it with pip install pre-commit) or git clang-format to auto format the files.

@amemov
Copy link
Author

amemov commented Apr 3, 2025

I changed it to the init tensor and ran pre-commit - everything looks good on my end.

@amemov amemov requested a review from zjgarvey April 3, 2025 14:43
Copy link
Collaborator

@vivekkhandelwal1 vivekkhandelwal1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @amemov, can you please take a look at the CI failure?

@amemov
Copy link
Author

amemov commented Apr 8, 2025

Hi @amemov, can you please take a look at the CI failure?

Hi @vivekkhandelwal1, I skimmed it briefly before - I didn't see any failures specifically related to torch.outer() lowering that I wrote and to my test case.

I will take a better look at it today, but so far I'm not really sure what exactly I need to modify / add here.

@vivekkhandelwal1
Copy link
Collaborator

Hi @amemov, can you please take a look at the CI failure?

Hi @vivekkhandelwal1, I skimmed it briefly before - I didn't see any failures specifically related to torch.outer() lowering that I wrote and to my test case.

I will take a better look at it today, but so far I'm not really sure what exactly I need to modify / add here.

Hi @amemov, some test(s) is/are crashing for the fx_importer config. Most probably, it will be the one that you have added. In order to find out which test is crashing you need to run the tests serially. You may use the following command:

python -m projects.pt1.e2e_testing.main --config=fx_importer -s

The above command will run all the tests one by one. And, the last test run will be the one that's crashing. Then, you can figure out the fix for that.

@amemov
Copy link
Author

amemov commented Apr 12, 2025

@vivekkhandelwal1
The problem was raised from the test file that I wrote:

torch-mlir/externals/llvm-project/llvm/include/llvm/Support/Casting.h:566: decltype(auto) llvm::cast(const From&) [with To = mlir::RankedTensorType; From = mlir::Type]: Assertion isa<To>(Val) && "cast<Ty>() argument of incompatible type!"' failed

I resolved it by changing the casting and dimensions for operands. On my machine, it now passes AttenOuter test.

@amemov amemov requested a review from vivekkhandelwal1 April 12, 2025 15:19
@vivekkhandelwal1
Copy link
Collaborator

@amemov, a contributor has added the lowering for this same op through decomposition here: #4138.

Although your PR is old so in the case of conflict, you should get a chance to complete it, but their approach (via decomposition) is a better one. Can you and @ivanamitreski find a solution out of this?

@amemov amemov requested a review from vivekkhandelwal1 June 5, 2025 22:46
@vivekkhandelwal1
Copy link
Collaborator

@amemov Please resolve the remaining comments.

@amemov
Copy link
Author

amemov commented Jun 16, 2025

@zjgarvey could you take at the changes? Thank you!

@amemov
Copy link
Author

amemov commented Jun 25, 2025

@zjgarvey

@zjgarvey
Copy link
Collaborator

The tests are failing, so I'd recommend running your new tests locally with each of the different configs.

@michizhou
Copy link

@amemov I am part of a group which has adapted this conversion pattern as part of our lowering process for a MoE model down to the Linalg MLIR dialect. To fully integrate this pattern into our codebase, we need to pull it from the production Torch-MLIR. If you can address the test failures as suggested by the reviewer so it can be merged, we would greatly appreciate it. Thanks!

@amemov
Copy link
Author

amemov commented Sep 3, 2025

The tests are failing, so I'd recommend running your new tests locally with each of the different configs.

@zjgarvey , I looked into the test failures and they're all unrelated to my AttenOuterOp changes. The 6 failing tests are all FX importer issues - nothing to do with the decomposition patterns I added.

I verified my implementation works correctly by testing the decomposition manually as well, so I'm a little confused. I tried release and debug builds - both work on my end. Also, if you look at the failures in the CI it doesn't as well that the problem is due to the proposed decomposition

@zjgarvey
Copy link
Collaborator

zjgarvey commented Sep 3, 2025

The tests are failing, so I'd recommend running your new tests locally with each of the different configs.

@zjgarvey , I looked into the test failures and they're all unrelated to my AttenOuterOp changes. The 6 failing tests are all FX importer issues - nothing to do with the decomposition patterns I added.

I verified my implementation works correctly by testing the decomposition manually as well, so I'm a little confused. I tried release and debug builds - both work on my end. Also, if you look at the failures in the CI it doesn't as well that the problem is due to the proposed decomposition

Can you sync with main and we can rerun the Ci make sure it's not a failure specific to your changes?

We won't be able to merge this unless the CI passes.

root and others added 3 commits September 3, 2025 15:35
- Defined the op in Linear.cpp

TODO:
- Testing, and perhaps add some test(-s) inside torch-mlir?
 - Rewrote the ConvertAtenOuterOp without unsqueezing
 - Replaced linalg::MatmulOp with linalg::GenericOp for buidling result of the op
 - Added error messages for
 - Added test case in e2e tests - placed in matmul.py
@amemov amemov force-pushed the AtenOuterOp-Lowering branch from 8759641 to f153429 Compare September 3, 2025 22:37
@amemov
Copy link
Author

amemov commented Sep 4, 2025

@zjgarvey , I re-ran the tests on my machine after synching my branch with the main (and fetching these changes) - the errors I see are not due to AttenOuterOp lowering:

(mlir_venv) [ashepelev@thinkpadx1-carbon build]$ ninja check-torch-mlir
[0/1] Running the torch-mlir regression tests
Enabling sparsity propagation tests
Enabling Torch v2.3+ tests
Skipping onnx tests.. no onnx
FAIL: TORCH_MLIR :: python/fx_importer/sparsity/sparse_test.py (43 of 117)
******************** TEST 'TORCH_MLIR :: python/fx_importer/sparsity/sparse_test.py' FAILED ********************
Exit Code: 2

Command Output (stderr):
--
/home/ashepelev/LLVM-Stuff/torch-mlir/mlir_venv/bin/python3.13 /home/ashepelev/LLVM-Stuff/torch-mlir/test/python/fx_importer/sparsity/sparse_test.py | FileCheck /home/ashepelev/LLVM-Stuff/torch-mlir/test/python/fx_importer/sparsity/sparse_test.py # RUN: at line 6
+ /home/ashepelev/LLVM-Stuff/torch-mlir/mlir_venv/bin/python3.13 /home/ashepelev/LLVM-Stuff/torch-mlir/test/python/fx_importer/sparsity/sparse_test.py
+ FileCheck /home/ashepelev/LLVM-Stuff/torch-mlir/test/python/fx_importer/sparsity/sparse_test.py
Traceback (most recent call last):
  File "/home/ashepelev/LLVM-Stuff/torch-mlir/test/python/fx_importer/sparsity/sparse_test.py", line 19, in <module>
    from torch_mlir_e2e_test.linalg_on_tensors_backends.refbackend import (
        RefBackendLinalgOnTensorsBackend,
    )
ModuleNotFoundError: No module named 'torch_mlir_e2e_test.linalg_on_tensors_backends.refbackend'
FileCheck error: '<stdin>' is empty.
FileCheck command line:  FileCheck /home/ashepelev/LLVM-Stuff/torch-mlir/test/python/fx_importer/sparsity/sparse_test.py

--

********************
********************
Failed Tests (1):

  TORCH_MLIR :: python/fx_importer/sparsity/sparse_test.py


Testing Time: 14.54s

Total Discovered Tests: 117
  Unsupported:   9 (7.69%)
  Passed     : 107 (91.45%)
  Failed     :   1 (0.85%)
FAILED: [code=1] tools/torch-mlir/test/CMakeFiles/check-torch-mlir /home/ashepelev/LLVM-Stuff/torch-mlir/build/tools/torch-mlir/test/CMakeFiles/check-torch-mlir 
cd /home/ashepelev/LLVM-Stuff/torch-mlir/build/tools/torch-mlir/test && /home/ashepelev/LLVM-Stuff/torch-mlir/mlir_venv/bin/python3.13 /home/ashepelev/LLVM-Stuff/torch-mlir/build/./bin/llvm-lit -sv /home/ashepelev/LLVM-Stuff/torch-mlir/build/tools/torch-mlir/test
ninja: build stopped: subcommand failed.

@zjgarvey
Copy link
Collaborator

zjgarvey commented Sep 4, 2025

I think this is due to some missing build flags. I think you need to enable the jit IR importer and something else. The development.md doc should have some info about the cmake config for local testing.

Then I'd do the projects/pt1/tools/e2e_test.sh -s -c <failing config> -v to see what is failing.

@amemov
Copy link
Author

amemov commented Sep 5, 2025

@zjgarvey , I triple checked - all of my files are now in sync with the most recent changes. When I run
ninja check-torch-mlir-all
I get 0 errors. In the last CI build all errors that I've seen are related to ONNX bf16? Not sure if it is due to the fact that when the last CI ran only my branch of my fork was synched with upstream changes ( but main branch of the fork was not ) - but anyways, at least with the command above I don't see any errors.

Also for reference here is the build command I used - it includes everything to enable e2e testing:

cmake -GNinja -Bbuild \
-DCMAKE_BUILD_TYPE=RelWithDebInfo \
-DLLVM_ENABLE_ASSERTIONS=ON \
-DPython3_FIND_VIRTUALENV=ONLY \
-DPython_FIND_VIRTUALENV=ONLY \
-DMLIR_ENABLE_BINDINGS_PYTHON=ON \
-DLLVM_TARGETS_TO_BUILD=host \
-DLLVM_ENABLE_PROJECTS=mlir \
-DLLVM_EXTERNAL_PROJECTS="torch-mlir" \
-DLLVM_EXTERNAL_TORCH_MLIR_SOURCE_DIR="$PWD" \
-DTORCH_MLIR_ENABLE_PYTORCH_EXTENSIONS=ON \
-DTORCH_MLIR_ENABLE_JIT_IR_IMPORTER=ON \
-DCMAKE_C_COMPILER=clang \
-DCMAKE_CXX_COMPILER=clang++ \
-DCMAKE_EXE_LINKER_FLAGS_INIT="--ld-path=ld.lld" \
-DCMAKE_MODULE_LINKER_FLAGS_INIT="--ld-path=ld.lld" \
-DCMAKE_SHARED_LINKER_FLAGS_INIT="--ld-path=ld.lld" \
externals/llvm-project/llvm

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Missing torch-to-linalg lowering of AtenOuterOp
4 participants