Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DML] Using Olive generated adapters throws Data transfer is not available for the specified device allocator error #1181

Open
ambroser53 opened this issue Jan 13, 2025 · 1 comment
Labels

Comments

@ambroser53
Copy link

Sorry for double posting as I've already put an issue on the Olive repo but it may be better to post on here.

Describe the bug

I am using the method of creating adapters depicted here and here which I have got to work when using the CPU EP, however when using DML I get the following error when calling adapters.LoadAdapter:

Unhandled exception. System.Exception: D:\a\_work\1\s\onnxruntime\core\session\lora_adapters.cc:94 onnxruntime::lora::LoraAdapter::InitializeParamsValues Data transfer is not available for the specified device allocator, it also must not be a CPU allocator

I have tested the olive auto-opt call both with and without the --use_model_builder option but they both get the same result. I have also tried using the convert-adapters olive call instead but the resulting adapters do not work with CPU EP either (see aside below).

If I run the model without the adapter on CPU EP it runs fine as well, whereas when I run the model without the adapter on DML I get the following error when calling AppendTokenSequences:

Unhandled exception. System.Exception: Non-zero status code returned while running DmlFusedNode_0_5 node. Name:'DmlFusedNode_0_5' Status Message: D:\a\_work\1\s\onnxruntime\core\providers\dml\DmlExecutionProvider\src\MLOperatorAuthorImpl.cpp(2839)\onnxruntime.dll!00007FFE495DF44C: (caller: 00007FFE495EEEC9) Exception(1) tid(2bb4) 80070057 The parameter is incorrect.

The same does not happen when using ORTGenAi's model_builder.py and passing in an adapter path, but then you cannot use multiple LoRA weights as it is tied into the onnx model permanently.

OS: Windows 11 x64
GPU: RTX 4090
API: C#
MODEL: Qwen/Qwen2.5-1.5B

(Aside) The adapters (when used via CPU EP) appear to have significant quality degradation. I can see that convert-adapters does lora scaling (alpha/rank) but I cannot find whether the auto-opt call is doing the same. Creating adapters via convert-adapters does not work with CPU EP either as the keys are not being renamed appropriately getting an invalid key/name/parameter error (.layers.0.self_attn. rather than .layers.0.attn.).

To Reproduce
Steps to reproduce the behavior:

  1. Use Qwen/Qwen2.5-1.5B to train a set of LoRA weights with peft
  2. Use the auto-opt call from here setting device to gpu and provider to DmlExecutionProvider.
    Here is the exact call I use:
olive auto-opt \
   --model_name_or_path Qwen/Qwen2.5-1.5B \
   --adapter_path ./output/pony_base_0.9_lilqwen_r64_a128_b128_fp16-v0 \
   --device gpu \
   --provider DmlExecutionProvider \
   --use_ort_genai \
   --output_path ./onnx/lilqwen-dml-fp16-adaptable-modelbuilder \
   --log_level 1 --precision fp16 --use_model_builder
  1. (optional) attempt to use convert-adapters to export with more adapters. Here is my exact command:
olive convert-adapters -a ./output/pony_base_0.9_lilqwen_r64_a128_b128_fp16-v0/checkpoint-98 \
  --dtype float16 -o ./onnx/lilqwen-dml-fp16-adaptable-modelbuilder/model/base16.onnx_adapter
  1. Use following code and see error:
import onnxruntime_genai as og

model_checkpoint = "./onnx/lilqwen-dml-fp16-adaptable-modelbuilder/model"

ort_model = og.Model(model_checkpoint)

tokenizer = og.Tokenizer(ort_model)
tokenizer_stream = tokenizer.create_stream()

sample_input = "Test input: "
tokens = tokenizer.encode(sample_input)

gen_params = og.GeneratorParams(ort_model)

gen_params.set_search_options(max_length=200)
generator = og.Generator(ort_model, gen_params)
adapters = og.Adapters(ort_model)

adapters.load(f"{model_checkpoint}/adapter_weights.onnx_adapter", "default")

generator.append_tokens(tokens)
generator.set_active_adapter(adapters, "default")
try:
   while not generator.is_done():
     generator.generate_next_token()

     new_token = generator.get_next_tokens()[0]
     print(tokenizer_stream.decode(new_token), end='', flush=True)
except KeyboardInterrupt:
    print("  --control+c pressed, aborting generation--")

del generator

Expected behavior
The adapters are loaded when passed the .onnx_adapter path and can be used interchangeably for inference with DML EP.

@ambroser53
Copy link
Author

When compiling olive from source (missed that part of this) I get a different error when using adapters.load with dml EP:
Unhandled exception. System.Exception: D:\a\_work\1\s\onnxruntime\lora\adapter_format_utils.cc:54 onnxruntime::adapters::utils::MemoryMapAdapterFile [ONNXRuntimeError] : 1 : FAIL : open file adapter_weights.onnx_adapter fail, errcode = 2 - unknown error

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant