-
Notifications
You must be signed in to change notification settings - Fork 5.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Loading a Lora on quantized model ? TorchaoLoraLinear.__init__() missing 1 required keyword-only argument: 'get_apply_tensor_subclass' #10621
Comments
i think you have to use a quantization config during pipeline init but then #10578 |
that works, thanks! |
But what is the equivalent of float8dq or float8_dynamic_activation_float8_weight in bitsandbytes quantization config for diffusers ? |
and per row ? |
@christopher5106 TorchAO supports additional quantization options and algorithms than BnB. They have separate implementations of kernels for what they do, so there is not really any equivalent way of comparing them, or finding features of one in the other. |
So I have to use TorchAO and my initial issue was about loading lora on a quantized model with Torchao. So the issue remains open |
well it's already being tracked but i couldn't find it 🤷 for torchAO you'll have to quantise after loading, and i think the issue there is possibly in PEFT.. |
that means that unloading/loading no more possible after quantization, right ? couldnt we quantize the lora separately before to merge it ? |
oh, you could try that using get_peft_model style workarounds |
in the PEFT doc, it says that torch.compile with quantization (bitsandbytes) is not supported. What about other quantizations, such as Torchao ? |
@bghira I get the same error with import torch
from diffusers import FluxTransformer2DModel
from peft import LoraConfig
torch.set_float32_matmul_precision("high")
torch.backends.cuda.matmul.allow_tf32 = True
torch.backends.cudnn.allow_tf32 = True
torch.backends.cudnn.benchmark = True
torch.backends.cudnn.benchmark_limit = 20
flux_transformer = FluxTransformer2DModel.from_pretrained(
"black-forest-labs/FLUX.1-schnell", subfolder="transformer"
).to("cuda", torch.float16)
from torchao.quantization import quantize_
from torchao.quantization import float8_dynamic_activation_float8_weight
quantize_(flux_transformer, float8_dynamic_activation_float8_weight())
target_modules = [
"x_embedder",
"attn.to_k",
"attn.to_q",
"attn.to_v",
"attn.to_out.0",
"attn.add_k_proj",
"attn.add_q_proj",
"attn.add_v_proj",
"attn.to_add_out",
"ff.net.0.proj",
"ff.net.2",
"ff_context.net.0.proj",
"ff_context.net.2",
]
transformer_lora_config = LoraConfig(
r=16,
lora_alpha=16,
init_lora_weights=True,
target_modules=target_modules,
lora_bias=False,
)
flux_transformer.add_adapter(transformer_lora_config)
Same error if I write it |
gives the following error:
The text was updated successfully, but these errors were encountered: