Skip to content

[REQ] Support compiling CPU kernels with fast_math and fuse_fp optimizations #1321

@c0d1f1ed

Description

@c0d1f1ed

Description

CUDA kernels can be compiled with NVRTC's --use_fast_math compiler flag by using wp.set_module_options({"fast_math": True}), but for CPU kernels this currently has no effect. Likewise for the fuse_fp option.

They should be implemented to behave as closely as possible to the CUDA options.

These options should also be added to config.py. Since #1307, the config options are handled more centrally and uniformly.

Context

This is expected to provide a speedup for compute-limited CPU workloads.

Metadata

Metadata

Assignees

Labels

feature requestRequest for something to be added

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions