[DML] Using Olive generated adapters throws Data transfer is not available for the specified device allocator
error
#1181
Labels
Data transfer is not available for the specified device allocator
error
#1181
Sorry for double posting as I've already put an issue on the Olive repo but it may be better to post on here.
Describe the bug
I am using the method of creating adapters depicted here and here which I have got to work when using the CPU EP, however when using DML I get the following error when calling adapters.LoadAdapter:
Unhandled exception. System.Exception: D:\a\_work\1\s\onnxruntime\core\session\lora_adapters.cc:94 onnxruntime::lora::LoraAdapter::InitializeParamsValues Data transfer is not available for the specified device allocator, it also must not be a CPU allocator
I have tested the olive auto-opt call both with and without the --use_model_builder option but they both get the same result. I have also tried using the convert-adapters olive call instead but the resulting adapters do not work with CPU EP either (see aside below).
If I run the model without the adapter on CPU EP it runs fine as well, whereas when I run the model without the adapter on DML I get the following error when calling AppendTokenSequences:
Unhandled exception. System.Exception: Non-zero status code returned while running DmlFusedNode_0_5 node. Name:'DmlFusedNode_0_5' Status Message: D:\a\_work\1\s\onnxruntime\core\providers\dml\DmlExecutionProvider\src\MLOperatorAuthorImpl.cpp(2839)\onnxruntime.dll!00007FFE495DF44C: (caller: 00007FFE495EEEC9) Exception(1) tid(2bb4) 80070057 The parameter is incorrect.
The same does not happen when using ORTGenAi's model_builder.py and passing in an adapter path, but then you cannot use multiple LoRA weights as it is tied into the onnx model permanently.
OS: Windows 11 x64
GPU: RTX 4090
API: C#
MODEL: Qwen/Qwen2.5-1.5B
(Aside) The adapters (when used via CPU EP) appear to have significant quality degradation. I can see that convert-adapters does lora scaling (alpha/rank) but I cannot find whether the auto-opt call is doing the same. Creating adapters via convert-adapters does not work with CPU EP either as the keys are not being renamed appropriately getting an invalid key/name/parameter error (.layers.0.self_attn. rather than .layers.0.attn.).
To Reproduce
Steps to reproduce the behavior:
Qwen/Qwen2.5-1.5B
to train a set of LoRA weights with peftgpu
and provider to DmlExecutionProvider.Here is the exact call I use:
convert-adapters
to export with more adapters. Here is my exact command:Expected behavior
The adapters are loaded when passed the
.onnx_adapter
path and can be used interchangeably for inference with DML EP.The text was updated successfully, but these errors were encountered: