Skip to content

Enable link-time optimization when building Rust library for production #1

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

Calinou
Copy link

@Calinou Calinou commented Mar 27, 2021

This decreases the compiled library size and improves performance.

codegen-units is also set to 1 to further optimize the compiled library.

There's more advice here: https://likebike.com/posts/How_To_Write_Fast_Rust_Code.html

Note: I haven't tried to run the Godot project as a whole with these changes. It's likely that it will work as-is, but I recommend cloning this branch locally to test before merging.

Preview

Linux x86_64 (rustc nightly from 2021-03-26). Both libraries had their debug symbols removed by running strip on them.

Before (LTO disabled, codegen-units = 16)

2,672,984 target/release/libbitmapflow_rust.so

After (LTO enabled, codegen-units = 16)

2,132,240 target/release/libbitmapflow_rust.so

After (LTO enabled, codegen-units = 1) – this PR

2,066,704 target/release/libbitmapflow_rust.so

@Calinou Calinou force-pushed the rust-release-enable-lto branch 2 times, most recently from 61ae68d to 9c5e197 Compare March 27, 2021 01:52
@Bauxitedev
Copy link
Owner

Bauxitedev commented Mar 27, 2021

Cool, thanks. Before I merge this I have a question. Have you noticed any regressions in compile times? A little increase is fine, but I'd rather not have compile times that increase significantly...

Also, do these flags work on Windows as well?

@Calinou
Copy link
Author

Calinou commented Mar 27, 2021

Have you noticed any regressions in compile times?

I haven't done thorough benchmarking, but I'd expect a release build to take 30 seconds to 1 minute longer. This doesn't impact debug builds where you do most of the development anyway.

Also, do these flags work on Windows as well?

They should, but I haven't tested them there.

@Bauxitedev
Copy link
Owner

Bauxitedev commented Mar 28, 2021

Hmm just tested it, compile times increase significantly, going from ~15 sec to ~1 minute on my 4th gen i5 CPU. I'd love to use this for the final binary, but having this enabled during development really hinders my development speed. Do you think there is any way to somehow only enable this flag when publishing the final binary? E.g. I have a publish_windows.py script in the repo that exports an EXE file with all required DLLs into a zip file, do you think it would be somehow possible to compile bitmapflow-rust using these flags only when you run the script?

@Calinou
Copy link
Author

Calinou commented Mar 28, 2021

Hmm just tested it, compile times increase significantly, going from ~15 sec to ~1 minute on my 4th gen i5 CPU. I'd love to use this for the final binary, but having this enabled during development really hinders my development speed. Do you think there is any way to somehow only enable this flag when publishing the final binary?

Are you sure this affects build times when using cargo build instead of cargo build --release? The default is to build in debug mode.

It's possible to set build flags in the build script, but I'd prefer if this was always done for release builds.
In the build script, set the CARGO_PROFILE_RELEASE_LTO environment variable to fat and RUSTFLAGS to -C codegen-units=1 then build with cargo build --release.

@Bauxitedev
Copy link
Owner

Bauxitedev commented Mar 29, 2021

I always develop in release mode, because 1. it gives me a better picture of the real world performance of my program during development (profiling should always be done in release mode), and 2. I don't have to worry about accidentally copying the wrong DLL and shipping debug builds. If I need to debug something I can use the trusty godot_dbg!() macro 😄

I think I'll go for the environment variable for now, to avoid slowing down development.

Update: just tested running in debug mode, it doesn't even work on my end 😄 Getting a bazillion linker errors everywhere. Not sure if that's a configuration problem, I'll have to look into it.

This decreases the compiled library size and improves performance.

codegen-units is also set to 1 to further optimize the compiled library.
@Calinou Calinou force-pushed the rust-release-enable-lto branch from 9c5e197 to 7743366 Compare March 29, 2021 12:26
@Calinou Calinou changed the title Enable link-time optimization when building Rust library in release mode Enable link-time optimization when building Rust library for production Mar 29, 2021
@Calinou
Copy link
Author

Calinou commented Mar 29, 2021

@Bauxitedev Could you test the publish_windows.py script included in this PR? I also modified the Docker file, but it appears to only build for Linux.

@Bauxitedev
Copy link
Owner

I'll check it out soon. You can ignore the dockerfile for now, I used it for testing purposes only, it doesn't serve any purpose anymore.

@Bauxitedev
Copy link
Owner

Bauxitedev commented Mar 30, 2021

Just tested, the python build script doesn't seem to work for me. The cargo build step fails with:

   Compiling autocfg v1.0.1
   Compiling libc v0.2.88
   Compiling proc-macro2 v1.0.24
   Compiling unicode-xid v0.2.1
error: couldn't create a temp dir: Permission denied. (os error 5) at path "C:\\Windows\\rustcBRVLvu"

error: aborting due to previous error

error: could not compile `unicode-xid`

It seems to be trying to create a temp directory in the Windows folder, even though it doesn't have permission to do so (the script shouldn't run with admin privileges). Any ideas?

Also it would be nice if the script stopped entirely if this step fails (easy fix, just add check=True to the argument list of subprocess.run)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants