[DML] Using Olive generated adapters throws `Data transfer is not available for the specified device allocator` error #1181

ambroser53 · 2025-01-13T16:21:58Z

Sorry for double posting as I've already put an issue on the Olive repo but it may be better to post on here.

Describe the bug

I am using the method of creating adapters depicted here and here which I have got to work when using the CPU EP, however when using DML I get the following error when calling adapters.LoadAdapter:

Unhandled exception. System.Exception: D:\a\_work\1\s\onnxruntime\core\session\lora_adapters.cc:94 onnxruntime::lora::LoraAdapter::InitializeParamsValues Data transfer is not available for the specified device allocator, it also must not be a CPU allocator

I have tested the olive auto-opt call both with and without the --use_model_builder option but they both get the same result. I have also tried using the convert-adapters olive call instead but the resulting adapters do not work with CPU EP either (see aside below).

If I run the model without the adapter on CPU EP it runs fine as well, whereas when I run the model without the adapter on DML I get the following error when calling AppendTokenSequences:

Unhandled exception. System.Exception: Non-zero status code returned while running DmlFusedNode_0_5 node. Name:'DmlFusedNode_0_5' Status Message: D:\a\_work\1\s\onnxruntime\core\providers\dml\DmlExecutionProvider\src\MLOperatorAuthorImpl.cpp(2839)\onnxruntime.dll!00007FFE495DF44C: (caller: 00007FFE495EEEC9) Exception(1) tid(2bb4) 80070057 The parameter is incorrect.

The same does not happen when using ORTGenAi's model_builder.py and passing in an adapter path, but then you cannot use multiple LoRA weights as it is tied into the onnx model permanently.

OS: Windows 11 x64
GPU: RTX 4090
API: C#
MODEL: Qwen/Qwen2.5-1.5B

(Aside) The adapters (when used via CPU EP) appear to have significant quality degradation. I can see that convert-adapters does lora scaling (alpha/rank) but I cannot find whether the auto-opt call is doing the same. Creating adapters via convert-adapters does not work with CPU EP either as the keys are not being renamed appropriately getting an invalid key/name/parameter error (.layers.0.self_attn. rather than .layers.0.attn.).

To Reproduce
Steps to reproduce the behavior:

Use Qwen/Qwen2.5-1.5B to train a set of LoRA weights with peft
Use the auto-opt call from here setting device to gpu and provider to DmlExecutionProvider.
Here is the exact call I use:

olive auto-opt \
   --model_name_or_path Qwen/Qwen2.5-1.5B \
   --adapter_path ./output/pony_base_0.9_lilqwen_r64_a128_b128_fp16-v0 \
   --device gpu \
   --provider DmlExecutionProvider \
   --use_ort_genai \
   --output_path ./onnx/lilqwen-dml-fp16-adaptable-modelbuilder \
   --log_level 1 --precision fp16 --use_model_builder

(optional) attempt to use convert-adapters to export with more adapters. Here is my exact command:

olive convert-adapters -a ./output/pony_base_0.9_lilqwen_r64_a128_b128_fp16-v0/checkpoint-98 \
  --dtype float16 -o ./onnx/lilqwen-dml-fp16-adaptable-modelbuilder/model/base16.onnx_adapter

Use following code and see error:

import onnxruntime_genai as og

model_checkpoint = "./onnx/lilqwen-dml-fp16-adaptable-modelbuilder/model"

ort_model = og.Model(model_checkpoint)

tokenizer = og.Tokenizer(ort_model)
tokenizer_stream = tokenizer.create_stream()

sample_input = "Test input: "
tokens = tokenizer.encode(sample_input)

gen_params = og.GeneratorParams(ort_model)

gen_params.set_search_options(max_length=200)
generator = og.Generator(ort_model, gen_params)
adapters = og.Adapters(ort_model)

adapters.load(f"{model_checkpoint}/adapter_weights.onnx_adapter", "default")

generator.append_tokens(tokens)
generator.set_active_adapter(adapters, "default")
try:
   while not generator.is_done():
     generator.generate_next_token()

     new_token = generator.get_next_tokens()[0]
     print(tokenizer_stream.decode(new_token), end='', flush=True)
except KeyboardInterrupt:
    print("  --control+c pressed, aborting generation--")

del generator

Expected behavior
The adapters are loaded when passed the .onnx_adapter path and can be used interchangeably for inference with DML EP.

The text was updated successfully, but these errors were encountered:

ambroser53 · 2025-01-13T18:19:01Z

When compiling olive from source (missed that part of this) I get a different error when using adapters.load with dml EP:
Unhandled exception. System.Exception: D:\a\_work\1\s\onnxruntime\lora\adapter_format_utils.cc:54 onnxruntime::adapters::utils::MemoryMapAdapterFile [ONNXRuntimeError] : 1 : FAIL : open file adapter_weights.onnx_adapter fail, errcode = 2 - unknown error

microsoft-github-policy-service bot added the ep:DML label Jan 13, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[DML] Using Olive generated adapters throws `Data transfer is not available for the specified device allocator` error #1181

[DML] Using Olive generated adapters throws `Data transfer is not available for the specified device allocator` error #1181

ambroser53 commented Jan 13, 2025

ambroser53 commented Jan 13, 2025

[DML] Using Olive generated adapters throws Data transfer is not available for the specified device allocator error #1181

[DML] Using Olive generated adapters throws Data transfer is not available for the specified device allocator error #1181

Comments

ambroser53 commented Jan 13, 2025

ambroser53 commented Jan 13, 2025

[DML] Using Olive generated adapters throws `Data transfer is not available for the specified device allocator` error #1181

[DML] Using Olive generated adapters throws `Data transfer is not available for the specified device allocator` error #1181