Skip to content

Add on-the-fly bfloat16->float16 conversion pass #740

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: ovep-develop
Choose a base branch
from

Conversation

mklimenk
Copy link

@mklimenk mklimenk commented Jul 11, 2025

This PR is adding a functionality to convert bfloat16 models to float16 models. It's using a lot of functionality from the QDQ scales fix introduced recently.

To be added:

  • Tests
  • (Potentially) refactoring to better separate QDQ stripping from bfloat16 conversion

https://jira.devtools.intel.com/browse/CVS-170592

@@ -110,7 +111,7 @@ struct ProviderInfo {
const ConfigOptions* config_options{NULL};
const std::unordered_set<std::string> valid_provider_keys = {"device_type", "device_id", "device_luid", "cache_dir", "precision",
"load_config", "context", "num_of_threads", "model_priority", "num_streams", "enable_opencl_throttling", "enable_qdq_optimizer",
"enable_causallm", "disable_dynamic_shapes", "reshape_input"};
"enable_bfloat16_optimizer", "enable_causallm", "disable_dynamic_shapes", "reshape_input"};

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is a separate provider option needed, is it possible to detect the model has bfloat16 datatype and intrinsically enable optimization ?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sfatimar, because at some point OpenVINO might enable the native execution of bfloat16 models. This is a workaround until this functionality is enabled. Let's discuss it with Mayuresh and act accordingly.
Regarding the graph optimizations link you've shared: strictly speaking, it's not the same kind of optimizations as are those in the list. It's more like the QDQ scales fix we implemented earlier.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We cannot have external provider options as workaround because it impacts external users and apps and need to be given a deprecation notice 2 releases in advance. I would prefer it to be handled internally

Copy link

@sfatimar sfatimar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please check with Mayuresh if ProviderOption can be avoided by intrinsically detecting BFloat16 or adding ep specific optimization pass https://onnxruntime.ai/docs/performance/model-optimizations/graph-optimizations.html#extended-graph-optimizations

@sfatimar sfatimar requested a review from MayureshV1 July 14, 2025 04:46
@mklimenk
Copy link
Author

Tests are prepared and pushed to a separate branch until we have a confirmation whether we want them in onnxruntime or in the internal testing repo

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants