-
-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Can't load SD 3.5 checkpoint // sd3-sd3.5-flux branch #3017
Comments
Small addition: the same thing happens with branch sd3-flux.1 on a completly new windows 11 install where only necessary drivers and the Windows Pre-requirements are installed. The LoRA training works with SDXL, but can't load SD3.5 model. |
The SD3 checkbox doesn't do anything. The train button defaults to train_network.py. It also defaults to network_module = "networks.lora". This is not a proper fix, but you can edit lora_gui.py to default to the SD3 scripts. kohya_ss\kohya_gui\lora_gui.py Then you will need to add the clips to the Additional Parameters since there won't be any gui boxes to put them in. --clip_l "./models/clip/clip_l.safetensors" --clip_g "./models/clip/clip_g.safetensors" --t5xxl "./models/clip/t5xxl_fp16.safetensors" No idea why neither branch that "supports" SD3.5 actually supports it. Technically the scripts do, but the GUI does not. |
Thanks for your help. Unfortunately, it still does not work. Is it a problem that I run it within an Anaconda environment?
import network module: networks.lora_sd3 |
Hi,
I am using the sd3-sd3.5-flux branch. It seems to me that (for some reason) the correct python files are not used. Please note, that everything works with SDXL
SDXL -> working
INFO loading model for process 0/1 sdxl_train_util.py:32
INFO load StableDiffusion checkpoint: sdxl_train_util.py:73
./sd_xl_base
_1.0.safetensors
INFO building U-Net sdxl_model_util.py:198
INFO loading U-Net from checkpoint sdxl_model_util.py:202
SD 3.5 -> not working:
INFO loading model for process 0/1 train_util.py:5359
INFO load StableDiffusion checkpoint: ./sd3.5_large_fp8_scaled.safetensors train_util.py:5315
Traceback (most recent call last):
File ".\kohya_ss\sd-scripts\train_network.py", line 1513, in
trainer.train(args)
File ".\kohya_ss\sd-scripts\train_network.py", line 413, in train
model_version, text_encoder, vae, unet = self.load_target_model(args, weight_dtype, accelerator)
File ".\kohya_ss\sd-scripts\train_network.py", line 128, in load_target_model
text_encoder, vae, unet, _ = train_util.load_target_model(args, weight_dtype, accelerator)
File ".\kohya_ss\sd-scripts\library\train_util.py", line 5361, in load_target_model
text_encoder, vae, unet, load_stable_diffusion_format = _load_target_model(
File ".\kohya_ss\sd-scripts\library\train_util.py", line 5316, in _load_target_model
text_encoder, vae, unet = model_util.load_models_from_stable_diffusion_checkpoint(
File ".\kohya_ss\sd-scripts\library\model_util.py", line 1005, in load_models_from_stable_diffusion_checkpoint
converted_unet_checkpoint = convert_ldm_unet_checkpoint(v2, state_dict, unet_config)
File ".\kohya_ss\sd-scripts\library\model_util.py", line 267, in convert_ldm_unet_checkpoint
new_checkpoint["time_embedding.linear_1.weight"] = unet_state_dict["time_embed.0.weight"]
KeyError: 'time_embed.0.weight'
Traceback (most recent call last):
File ".\lib\runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File ".\lib\runpy.py", line 86, in run_code
exec(code, run_globals)
File ".\Scripts\accelerate.EXE_main.py", line 7, in
sys.exit(main())
File ".\lib\site-packages\accelerate\commands\accelerate_cli.py", line 48, in main
args.func(args)
File ".\lib\site-packages\accelerate\commands\launch.py", line 1106, in launch_command
simple_launcher(args)
File ".\lib\site-packages\accelerate\commands\launch.py", line 704, in simple_launcher
raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
subprocess.CalledProcessError: Command '['.\python.exe', './kohya_ss/sd-scripts/train_network.py', '--config_file', './Output/config_lora-20241216-151851.toml']' returned non-zero exit status 1.
15:19:02-995047
INFO Training has ended.
Training commands:
bucket_no_upscale = true
bucket_reso_steps = 64
cache_latents = true
caption_extension = ".txt"
clip_skip = 1
dynamo_backend = "no"
enable_bucket = true
epoch = 1
gradient_accumulation_steps = 1
huber_c = 0.1
huber_schedule = "snr"
loss_type = "l2"
lr_scheduler = "constant"
lr_scheduler_args = []
lr_scheduler_num_cycles = 1
lr_scheduler_power = 1
max_bucket_reso = 2048
max_data_loader_n_workers = 0
max_grad_norm = 1
max_timestep = 1000
max_token_length = 225
max_train_steps = 1600
min_bucket_reso = 256
mixed_precision = "fp16"
network_alpha = 1
network_args = []
network_dim = 8
network_module = "networks.lora"
network_train_unet_only = true
noise_offset_type = "Original"
optimizer_args = []
optimizer_type = "AdamW8bit"
output_dir = "./Output"
output_name = "Last_01"
pretrained_model_name_or_path =
"./sd3.5_large_fp8_scaled.safetensors"
prior_loss_weight = 1
resolution = "2048,2048"
sample_prompts = ".\sample/prompt.txt"
sample_sampler = "euler_a"
save_every_n_epochs = 1
save_model_as = "safetensors"
save_precision = "fp16"
text_encoder_lr = []
train_batch_size = 1
train_data_dir = "./img"
unet_lr = 0.0001
wandb_run_name = "Last_01"
xformers = true
The text was updated successfully, but these errors were encountered: