-
Notifications
You must be signed in to change notification settings - Fork 46
Add on-the-fly bfloat16->float16 conversion pass #740
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: ovep-develop
Are you sure you want to change the base?
Add on-the-fly bfloat16->float16 conversion pass #740
Conversation
@@ -110,7 +111,7 @@ struct ProviderInfo { | |||
const ConfigOptions* config_options{NULL}; | |||
const std::unordered_set<std::string> valid_provider_keys = {"device_type", "device_id", "device_luid", "cache_dir", "precision", | |||
"load_config", "context", "num_of_threads", "model_priority", "num_streams", "enable_opencl_throttling", "enable_qdq_optimizer", | |||
"enable_causallm", "disable_dynamic_shapes", "reshape_input"}; | |||
"enable_bfloat16_optimizer", "enable_causallm", "disable_dynamic_shapes", "reshape_input"}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is a separate provider option needed, is it possible to detect the model has bfloat16 datatype and intrinsically enable optimization ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sfatimar, because at some point OpenVINO might enable the native execution of bfloat16 models. This is a workaround until this functionality is enabled. Let's discuss it with Mayuresh and act accordingly.
Regarding the graph optimizations link you've shared: strictly speaking, it's not the same kind of optimizations as are those in the list. It's more like the QDQ scales fix we implemented earlier.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We cannot have external provider options as workaround because it impacts external users and apps and need to be given a deprecation notice 2 releases in advance. I would prefer it to be handled internally
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please check with Mayuresh if ProviderOption can be avoided by intrinsically detecting BFloat16 or adding ep specific optimization pass https://onnxruntime.ai/docs/performance/model-optimizations/graph-optimizations.html#extended-graph-optimizations
Tests are prepared and pushed to a separate branch until we have a confirmation whether we want them in onnxruntime or in the internal testing repo |
This PR is adding a functionality to convert bfloat16 models to float16 models. It's using a lot of functionality from the QDQ scales fix introduced recently.
To be added:
https://jira.devtools.intel.com/browse/CVS-170592