Add on-the-fly bfloat16->float16 conversion pass #740

mklimenk · 2025-07-11T14:15:06Z

This PR is adding a functionality to convert bfloat16 models to float16 models. It's using a lot of functionality from the QDQ scales fix introduced recently.

To be added:

Tests
(Potentially) refactoring to better separate QDQ stripping from bfloat16 conversion

https://jira.devtools.intel.com/browse/CVS-170592

sfatimar · 2025-07-14T04:42:24Z

onnxruntime/core/providers/openvino/contexts.h

@@ -110,7 +111,7 @@ struct ProviderInfo {
  const ConfigOptions* config_options{NULL};
  const std::unordered_set<std::string> valid_provider_keys = {"device_type", "device_id", "device_luid", "cache_dir", "precision",
                                                               "load_config", "context", "num_of_threads", "model_priority", "num_streams", "enable_opencl_throttling", "enable_qdq_optimizer",
-                                                               "enable_causallm", "disable_dynamic_shapes", "reshape_input"};
+                                                               "enable_bfloat16_optimizer", "enable_causallm", "disable_dynamic_shapes", "reshape_input"};


Why is a separate provider option needed, is it possible to detect the model has bfloat16 datatype and intrinsically enable optimization ?

@sfatimar, because at some point OpenVINO might enable the native execution of bfloat16 models. This is a workaround until this functionality is enabled. Let's discuss it with Mayuresh and act accordingly.
Regarding the graph optimizations link you've shared: strictly speaking, it's not the same kind of optimizations as are those in the list. It's more like the QDQ scales fix we implemented earlier.

We cannot have external provider options as workaround because it impacts external users and apps and need to be given a deprecation notice 2 releases in advance. I would prefer it to be handled internally

sfatimar

Please check with Mayuresh if ProviderOption can be avoided by intrinsically detecting BFloat16 or adding ep specific optimization pass https://onnxruntime.ai/docs/performance/model-optimizations/graph-optimizations.html#extended-graph-optimizations

mklimenk · 2025-07-15T10:02:05Z

Tests are prepared and pushed to a separate branch until we have a confirmation whether we want them in onnxruntime or in the internal testing repo

Add on-the-fly bfloat16->float16 conversion pass

d062cf1

sfatimar reviewed Jul 14, 2025

View reviewed changes

sfatimar requested changes Jul 14, 2025

View reviewed changes

sfatimar requested a review from MayureshV1 July 14, 2025 04:46

Fix undetected bfloat16 initializers

4a2c571

mklimenk mentioned this pull request Jul 14, 2025

[Draft] Add self-detecting on-the-fly bfloat16->float16 conversion pass #741

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add on-the-fly bfloat16->float16 conversion pass #740

Add on-the-fly bfloat16->float16 conversion pass #740

Uh oh!

mklimenk commented Jul 11, 2025 •

edited

Loading

Uh oh!

sfatimar Jul 14, 2025

Uh oh!

mklimenk Jul 14, 2025

Uh oh!

sfatimar Jul 14, 2025

Uh oh!

sfatimar left a comment

Uh oh!

mklimenk commented Jul 15, 2025

Uh oh!

Uh oh!

Add on-the-fly bfloat16->float16 conversion pass #740

Are you sure you want to change the base?

Add on-the-fly bfloat16->float16 conversion pass #740

Uh oh!

Conversation

mklimenk commented Jul 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sfatimar Jul 14, 2025

Choose a reason for hiding this comment

Uh oh!

mklimenk Jul 14, 2025

Choose a reason for hiding this comment

Uh oh!

sfatimar Jul 14, 2025

Choose a reason for hiding this comment

Uh oh!

sfatimar left a comment

Choose a reason for hiding this comment

Uh oh!

mklimenk commented Jul 15, 2025

Uh oh!

Uh oh!

mklimenk commented Jul 11, 2025 •

edited

Loading