-
Notifications
You must be signed in to change notification settings - Fork 338
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make llvm a submodule? #254
Comments
I have some concerns going down this road: for example, assuming we'd then treat llvm like any other submodule, this would mean that rebuilding project from scratch (i.e., However, I'm thinking whether giving the option of downloading & installing llvm (with the right commit hash) automatically though CMake would address the problem equally well? Personally I think it would even make sense to make this the default build behavior. And for developers and CI infra, we can still build our own llvm & reuse them across various onnx-mlir builds. |
Rebuilding llvm takes an hour on my machine so I agree that's considerably longer than the 3 minutes OTOH, whenever I use make clean, things are pretty broken, I could probably do with making sure my llvm is up to date. I would point out that trying to "reuse llvm across various builds" describes what led to the problems which led to my suggestion. Even different checkouts of onnx-mlir might want different LLVM build flags/configs/hashes, and it can get tricky to recall which externally-managed LLVM one is using. |
That sounds reasonable, I can take a crack at it fairly soon.
The combination of cmake & tablegen confuses me (among other folks); there're also other cmake updates we need to make fairly soon because there wasn't any official guidance on how to integrate with mlir through CMake in the early days. Hopefully once we change to adopt the official mlir cmake best practices we'll unbreak
I don't see any ways around it - developers have to amortize the cost of building llvm. The problems exist as long as we cache (and consequently amortize the cost of building) llvm because dealing with (and know when to invalidate) cache often pose hairy issues. I guess providing users with the option to not involve any caching facilities (through ExternalProject) will alleviate this issue to some extent.
I'm confused why recalling this information is necessary; for what it's worth, we do programmatically ensure (via DocCheck) that the LLVM commit hash specified in our README always accurately describes the one that is known to work with the current state of onnx-mlir as tested in our CI infra. So to know which LLVM commit hash to use, one can inspect the README file from the onnx-mlir source directory one intends to build. Alternatively, if you are looking to mechanized this process of cloning the right version of LLVM, you can re-use the script here (https://github.com/onnx/onnx-mlir/blob/master/utils/clone-mlir.sh), which is also guaranteed (i.e., tested in our CI infra) to be correct & up-to-date. |
Thanks for this super-detailed reply! The use case I was thinking of was where LLVM needs to be built with e.g. clang, or with exceptions enabled, because the downstream project is built with clang/exceptions. Then it's not just the commit ID, but also the cmake arguments that need to match. Just to understand: if |
Hmm, my impression is that the CMake arguments we pass to LLVM barely changes; also I believe onnx-mlir is fairly forgiving with respect to various LLVM build configurations. If that is not the case for you, can you give me a specific scenario where the config required by your downstream project conflicts with that of onnx-mlir? We may be able to fix it.
Yes, but I feel like this is less of a concern given how infrequently we change LLVM. Since the software dependency goes exclusively unidirectionally from onnx-mlir to LLVM and not the other way around, only changes directly made in LLVM would necessitate rebuilding LLVM. In such cases, it should be obvious that LLVM needs a separate rebuild. Also, I hope that you're okay with using ExternalProject instead of git submodule to package LLVM. This helps avoid cloning LLVM by default when we invoke |
Yes, that seems like a good thing to implement, if only to gain experience.
So, here's my dev folder: dev
- llvm-project
- build-with-exceptions
- build
- onnx-mlir
- build # uses llvm-project/build
- my-other-mlir-project
- build # uses llvm-project/build-with-exceptions Now, when I configure dev
- llvm-project-onnx-mlir
- build
- llvm-project-my-other-mlir-project
- build
- onnx-mlir
- build # uses llvm-project-onnx-mlir/build
- my-other-mlir-project
- build # uses llvm-project-my-other-mlir-project/build at which point, why don't I just do this: dev
- onnx-mlir
- llvm-project # submodule
- build
- build # uses llvm-project/build
- my-other-mlir-project
- llvm-project # submodule
- build
- build # uses llvm-project-my-other-mlir-project/build And I think you're saying that in your workflow, the original folder should have looked like this: dev
- llvm-project-onnx-mlir
- build
- llvm-project-my-other-mlir-project
- build
- onnx-mlir-checkout1
- build # uses llvm-project-onnx-mlir/build
- onnx-mlir-checkout2
- build # uses llvm-project-onnx-mlir/build
- onnx-mlir-checkout3
- build # uses llvm-project-onnx-mlir/build
- onnx-mlir-checkout4
- build # uses llvm-project-onnx-mlir/build
- my-other-mlir-project
- build # uses llvm-project-my-other-mlir-project/build Is that right? If so, I don't know how common this workflow is -- I normally just have one checkout and flit between branches, (and almost never |
Hey guys, So I've added llvm-project as an external project to my CMake. But to avoid needing to recompile LLVM if I nuke the build folder, I've set the So my project tree looks like this:
this then allows me to compile llvm-project with In this way, I link my project to the I've also added llvm-project as a submodule so that switching commits will automatically update llvm-project to the appropriate commit. However, if you want to keep repo cloning times low, setting up on a side note, while doing this, it may be good to do the same treatment to protobuf, because recently while building a new docker container, the latest protobuf release results in compilation errors, I've git bisected the last working tag down to |
Apologies for my belated response.
@awf I totally agree with your comments, and the latter organization clearly makes more sense.
Yes, I do sometimes juggle between multiple repos at the same time, mostly because I like to try out various people's fork referenced in their submitted PRs. OTOH, a perhaps more significant factor of consideration for me is to be able to reuse the same @lszinv thanks for your suggestion! Can you share your setup (especially the CMake file)? That looks like a very nice solution to this problem and can probably provide some concrete basis for further discussion.
You mean download protobuf as an external project as well? I can imagine providing this as a CMake option, although given that the protobuf dependency is not nearly as brittle as the llvm one and the time it takes to compile protobuf is also fairly slow (last time I did it, it was tens of minutes on a reasonably beefy machine?), I'm reluctant to make it the default behavior. Would such an arrangement work for your situation? |
Sorry for the delay on the response. I've included a gist of an example cmake on how I use LLVM as an external project: |
Any updated on this issue? @tjingrant |
FYI, in project Verona, we're using We're storing the pre-build binaries in a blob storage and the cached directory remains in our project's build directory to avoid downloading every time. Only adds download + unpack times to our CI builds, and nothing to recurring local builds (only the first build has download + unpack). |
As we depend on a specific checkout of llvm, and on building llvm with specific cmake settings, should we not just submodule llvm?
The text was updated successfully, but these errors were encountered: