Description
CUDA kernels can be compiled with NVRTC's --use_fast_math compiler flag by using wp.set_module_options({"fast_math": True}), but for CPU kernels this currently has no effect. Likewise for the fuse_fp option.
They should be implemented to behave as closely as possible to the CUDA options.
These options should also be added to config.py. Since #1307, the config options are handled more centrally and uniformly.
Context
This is expected to provide a speedup for compute-limited CPU workloads.