You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am using the method of creating adapters depicted here which I have got to work when using the CPU EP, however when using DML I get the following error when calling adapters.LoadAdapter:
Unhandled exception. System.Exception: D:\a\_work\1\s\onnxruntime\core\session\lora_adapters.cc:94 onnxruntime::lora::LoraAdapter::InitializeParamsValues Data transfer is not available for the specified device allocator, it also must not be a CPU allocator
I have tested the olive auto-opt call both with and without the --use_model_builder option but they both get the same result. I have also tried using the convert-adapters olive call instead but the resulting adapters do not work with CPU EP either (see aside).
If I run the model without the adapter on CPU EP it runs fine as well, whereas when I run the model without the adapter on DML I get the following error when calling AppendTokenSequences:
Unhandled exception. System.Exception: Non-zero status code returned while running DmlFusedNode_0_5 node. Name:'DmlFusedNode_0_5' Status Message: D:\a\_work\1\s\onnxruntime\core\providers\dml\DmlExecutionProvider\src\MLOperatorAuthorImpl.cpp(2839)\onnxruntime.dll!00007FFE495DF44C: (caller: 00007FFE495EEEC9) Exception(1) tid(2bb4) 80070057 The parameter is incorrect.
The same does not happen when using ORTGenAi's model_builder.py and passing in an adapter path, but then you cannot use multiple LoRA weights as it is tied into the onnx model permanently.
(Aside) The adapters (when used via CPU EP) appear to have significant quality degradation. I can see that convert-adapters does lora scaling (alpha/rank) but I cannot find whether the auto-opt call is doing the same. Creating adapters via convert-adapters does not work with CPU EP either as the keys are not being renamed appropriately getting an invalid key/name/parameter error (.layers.0.self_attn. rather than .layers.0.attn.).
The text was updated successfully, but these errors were encountered:
ambroser53
changed the title
[DML] Olive generated adapters not working
[DML] Olive generated adapters not working with OrtGenAi
Jan 10, 2025
I am using the method of creating adapters depicted here which I have got to work when using the CPU EP, however when using DML I get the following error when calling
adapters.LoadAdapter
:Unhandled exception. System.Exception: D:\a\_work\1\s\onnxruntime\core\session\lora_adapters.cc:94 onnxruntime::lora::LoraAdapter::InitializeParamsValues Data transfer is not available for the specified device allocator, it also must not be a CPU allocator
I have tested the
olive auto-opt
call both with and without the--use_model_builder
option but they both get the same result. I have also tried using theconvert-adapters
olive call instead but the resulting adapters do not work with CPU EP either (see aside).If I run the model without the adapter on CPU EP it runs fine as well, whereas when I run the model without the adapter on DML I get the following error when calling
AppendTokenSequences
:Unhandled exception. System.Exception: Non-zero status code returned while running DmlFusedNode_0_5 node. Name:'DmlFusedNode_0_5' Status Message: D:\a\_work\1\s\onnxruntime\core\providers\dml\DmlExecutionProvider\src\MLOperatorAuthorImpl.cpp(2839)\onnxruntime.dll!00007FFE495DF44C: (caller: 00007FFE495EEEC9) Exception(1) tid(2bb4) 80070057 The parameter is incorrect.
The same does not happen when using ORTGenAi's
model_builder.py
and passing in an adapter path, but then you cannot use multiple LoRA weights as it is tied into the onnx model permanently.OS: Windows 11 x64
GPU: RTX 4090
API: C#
MODEL: Qwen/Qwen2.5-1.5B
(Aside) The adapters (when used via CPU EP) appear to have significant quality degradation. I can see that
convert-adapters
does lora scaling (alpha/rank) but I cannot find whether theauto-opt
call is doing the same. Creating adapters viaconvert-adapters
does not work with CPU EP either as the keys are not being renamed appropriately getting an invalid key/name/parameter error (.layers.0.self_attn.
rather than.layers.0.attn.
).The text was updated successfully, but these errors were encountered: