Skip to content

cmake: use -flto=auto compiler flag when supported, rework fast-math disablement #80

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 4 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 0 additions & 1 deletion .appveyor.yml
Original file line number Diff line number Diff line change
Expand Up @@ -53,7 +53,6 @@ build_script:

cmake -Wdev -G"%generator%" -A"%platform%" -S. -Bbuild -DCMAKE_CONFIGURATION_TYPES=Release
-DBUILD_CRUNCH=ON -DBUILD_EXAMPLES=ON
-DUSE_FAST_MATH=OFF

cmake --build build --config Release

Expand Down
1 change: 0 additions & 1 deletion .azure-pipelines.yml
Original file line number Diff line number Diff line change
Expand Up @@ -125,7 +125,6 @@ steps:
if [ -z "${SOURCE_DIR:-}" ]; then
cmake_args+=(-DBUILD_CRUNCH=ON -DBUILD_EXAMPLES=ON -DBUILD_SHARED_LIBS=ON)
fi
cmake_args+=(-DUSE_FAST_MATH=OFF)
cmake -S"${SOURCE_DIR:-.}" -Bbuild "${cmake_args[@]}"
cmake --build build --config Release
displayName: 'Build'
Expand Down
54 changes: 45 additions & 9 deletions CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,8 @@ cmake_minimum_required(VERSION 3.5)

set(CMAKE_CXX_STANDARD 11)

include(CheckCXXCompilerFlag)

set(CRUNCH_PROJECT_NAME crunch)
set(CRUNCH_LIBRARY_NAME crn)
set(CRUNCH_EXE_NAME crunch)
Expand Down Expand Up @@ -50,10 +52,18 @@ macro(set_linker_flag FLAG)
endif()
endmacro()

macro(try_cxx_flag PROP FLAG)
check_CXX_compiler_flag(${FLAG} FLAG_${PROP})

if (FLAG_${PROP})
set_cxx_flag(${FLAG})
endif()
endmacro()

# This option decides if crunch is dynamically linked against libcrn.so
# statically linked against libcrn.o, enabling it always build libcrn.so.
# This option is a builtin CMake one, the name means “build executables
# against shader libraries”, not “build the shared libraries”.
# against shared libraries”, not “build the shared libraries”.
option(BUILD_SHARED_LIBS "Link executables against shared library" OFF)
# Always build libcrn.so even if crunch is linked to libcrn statically.
option(BUILD_SHARED_LIBCRN "Build shared libcrn" OFF)
Expand Down Expand Up @@ -85,6 +95,9 @@ if (MSVC)
# and https://devblogs.microsoft.com/cppblog/the-fpcontract-flag-and-changes-to-fp-modes-in-vs2022/
# By default, MSVC doesn't enable the /fp:fast option.
set_cxx_flag("/fp:fast")
else()
# Precise model (/fp:precise) should do safe contractions, but we should not trust that (see below).
set_cxx_flag("/fp:strict")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/fp:precise would be a better default for MSVC. The documentation says /fp:strict is mostly needed if you want floating point exceptions. I did a little test with 57 PNG files and got 246.162s (self-reported) with master and 259.846s with this branch.

endif()

if (USE_LTO)
Expand Down Expand Up @@ -112,22 +125,45 @@ else()
set_cxx_flag("-O3" RELWITHDEBINFO)
endif()

try_cxx_flag(FNO_MATH_ERRNO "-fno-math-errno")

if (USE_FAST_MATH)
# By default, GCC uses -ffp-contract=fast with -std=gnu* and uses -ffp-contract=off with -std=c*.
# See https://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html
# By default, GCC doesn't enable the -ffast-math option.
set_cxx_flag("-ffast-math -fno-math-errno -ffp-contract=fast")
try_cxx_flag(FFAST_MATH "-ffast-math")

# GCC.
try_cxx_flag(FFP_CONTRACT_FAST "-ffp-contract=fast")
# Clang.
try_cxx_flag(FFP_MODEL_FAST "-ffp-model=agressive")
# ICC.
try_cxx_flag(FP_MODEL_FAST_2 "-fp-model=fast=2")
try_cxx_flag(QSIMD_HONOR_FP_MODEL "-qsimd-honor-fp-model")
else()
try_cxx_flag(FNO_FAST_MATH "-fno-fast-math")

# By default, GCC uses -ffp-contract=fast with -std=gnu* and uses -ffp-contract=off with -std=c*.
# By default, GCC uses -std=gnu* and then enables -ffp-contract=fast even if -ffast-math is not enabled.
set_cxx_flag("-ffp-contract=off")
# See https://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html
# GCC fast contractions (-ffp-contract=fast) should be safe, but aren't on arm64 with GCC 12.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you mean by "aren't safe"?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I should have been more verbose at the time because I forgot the exact issue.

I guess it meant it doesn't produce the same result as compiling without any fast stuff.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given the purpose of my patch is to maximize the chance the files are reproducible, I probably noticed that on ARM such option broke the reproducibility. It's probably a similar problem than using x87 instead of SSE on x86, maybe some ARM fused operations break IEEE compliance.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For sure, since I mentioned a specific GCC version, that was the result of me testing that specific compiler on the said hardware, and I was testing the reproducibility of converted files. GCC 12 is the Debian Bookworm GCC, and I use Debian Bookworm on my Arm boards.

Copy link
Member

@slipher slipher Jul 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's unobvious so it should be explained in the comment.

As we've discussed in the past, I don't think floating point reproducibility is a good goal -- the language and compilers don't make any attempt to provide such guarantees. Platforms using x87 floating point are an easy example where you're not going to achieve it. But for GCC/Clang those options to disable fast math are good in any case, since fast math makes the software too unreliable.

# Clang precise contractions (-ffp-contract=precise) should be safe, but aren't on arm64 with Clang 14.

# GCC.
try_cxx_flag(FFP_CONTRACT_OFF "-ffp-contract=off")
# Clang
try_cxx_flag(FFP_MODEL_STRICT "-ffp-model=strict")
# ICC.
try_cxx_flag(FP_MODEL_STRICT "-fp-model=strict")
try_cxx_flag(QSIMD_HONOR_FP_MODEL "-qsimd-honor-fp-model")
endif()

# It should be done at the very end because it copies all compiler flags
# to the linker flags.
if (USE_LTO)
set_cxx_flag("-flto" RELEASE)
set_cxx_flag("-flto" RELWITHDEBINFO)
set_cxx_flag("-flto" MINSIZEREL)
try_cxx_flag(FLTO_AUTO "-flto=auto")

if (NOT FLAG_FLTO_AUTO)
try_cxx_flag(FLTO "-flto")
endif()

set_linker_flag("${CMAKE_CXX_FLAGS}" RELEASE)
set_linker_flag("${CMAKE_CXX_FLAGS}" RELWITHDEBINFO)
set_linker_flag("${CMAKE_CXX_FLAGS}" MINSIZEREL)
Expand Down