Can't load SD 3.5 checkpoint // sd3-sd3.5-flux branch #3017

ZfE0QQ6ds92W · 2024-12-16T14:23:17Z

Hi,

I am using the sd3-sd3.5-flux branch. It seems to me that (for some reason) the correct python files are not used. Please note, that everything works with SDXL

SDXL -> working
INFO loading model for process 0/1 sdxl_train_util.py:32
INFO load StableDiffusion checkpoint: sdxl_train_util.py:73
./sd_xl_base
_1.0.safetensors
INFO building U-Net sdxl_model_util.py:198
INFO loading U-Net from checkpoint sdxl_model_util.py:202

SD 3.5 -> not working:

INFO loading model for process 0/1 train_util.py:5359
INFO load StableDiffusion checkpoint: ./sd3.5_large_fp8_scaled.safetensors train_util.py:5315
Traceback (most recent call last):
File ".\kohya_ss\sd-scripts\train_network.py", line 1513, in
trainer.train(args)
File ".\kohya_ss\sd-scripts\train_network.py", line 413, in train
model_version, text_encoder, vae, unet = self.load_target_model(args, weight_dtype, accelerator)
File ".\kohya_ss\sd-scripts\train_network.py", line 128, in load_target_model
text_encoder, vae, unet, _ = train_util.load_target_model(args, weight_dtype, accelerator)
File ".\kohya_ss\sd-scripts\library\train_util.py", line 5361, in load_target_model
text_encoder, vae, unet, load_stable_diffusion_format = _load_target_model(
File ".\kohya_ss\sd-scripts\library\train_util.py", line 5316, in _load_target_model
text_encoder, vae, unet = model_util.load_models_from_stable_diffusion_checkpoint(
File ".\kohya_ss\sd-scripts\library\model_util.py", line 1005, in load_models_from_stable_diffusion_checkpoint
converted_unet_checkpoint = convert_ldm_unet_checkpoint(v2, state_dict, unet_config)
File ".\kohya_ss\sd-scripts\library\model_util.py", line 267, in convert_ldm_unet_checkpoint
new_checkpoint["time_embedding.linear_1.weight"] = unet_state_dict["time_embed.0.weight"]
KeyError: 'time_embed.0.weight'
Traceback (most recent call last):
File ".\lib\runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File ".\lib\runpy.py", line 86, in run_code
exec(code, run_globals)
File ".\Scripts\accelerate.EXE_main.py", line 7, in
sys.exit(main())
File ".\lib\site-packages\accelerate\commands\accelerate_cli.py", line 48, in main
args.func(args)
File ".\lib\site-packages\accelerate\commands\launch.py", line 1106, in launch_command
simple_launcher(args)
File ".\lib\site-packages\accelerate\commands\launch.py", line 704, in simple_launcher
raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
subprocess.CalledProcessError: Command '['.\python.exe', './kohya_ss/sd-scripts/train_network.py', '--config_file', './Output/config_lora-20241216-151851.toml']' returned non-zero exit status 1.
15:19:02-995047
INFO Training has ended.

Training commands:
bucket_no_upscale = true
bucket_reso_steps = 64
cache_latents = true
caption_extension = ".txt"
clip_skip = 1
dynamo_backend = "no"
enable_bucket = true
epoch = 1
gradient_accumulation_steps = 1
huber_c = 0.1
huber_schedule = "snr"
loss_type = "l2"
lr_scheduler = "constant"
lr_scheduler_args = []
lr_scheduler_num_cycles = 1
lr_scheduler_power = 1
max_bucket_reso = 2048
max_data_loader_n_workers = 0
max_grad_norm = 1
max_timestep = 1000
max_token_length = 225
max_train_steps = 1600
min_bucket_reso = 256
mixed_precision = "fp16"
network_alpha = 1
network_args = []
network_dim = 8
network_module = "networks.lora"
network_train_unet_only = true
noise_offset_type = "Original"
optimizer_args = []
optimizer_type = "AdamW8bit"
output_dir = "./Output"
output_name = "Last_01"
pretrained_model_name_or_path =
"./sd3.5_large_fp8_scaled.safetensors"
prior_loss_weight = 1
resolution = "2048,2048"
sample_prompts = ".\sample/prompt.txt"
sample_sampler = "euler_a"
save_every_n_epochs = 1
save_model_as = "safetensors"
save_precision = "fp16"
text_encoder_lr = []
train_batch_size = 1
train_data_dir = "./img"
unet_lr = 0.0001
wandb_run_name = "Last_01"
xformers = true

ZfE0QQ6ds92W · 2024-12-18T21:39:10Z

Small addition: the same thing happens with branch sd3-flux.1 on a completly new windows 11 install where only necessary drivers and the Windows Pre-requirements are installed. The LoRA training works with SDXL, but can't load SD3.5 model.

Impudence12 · 2024-12-22T14:29:13Z

The SD3 checkbox doesn't do anything. The train button defaults to train_network.py. It also defaults to network_module = "networks.lora". This is not a proper fix, but you can edit lora_gui.py to default to the SD3 scripts.

kohya_ss\kohya_gui\lora_gui.py
Line 1150 run_cmd.append(rf"{scriptdir}/sd-scripts/train_network.py") change to run_cmd.append(rf"{scriptdir}/sd-scripts/sd3_train_network.py")
Line 1267 network_module = "networks.lora" change to network_module = "networks.lora_sd3"

Then you will need to add the clips to the Additional Parameters since there won't be any gui boxes to put them in.

--clip_l "./models/clip/clip_l.safetensors" --clip_g "./models/clip/clip_g.safetensors" --t5xxl "./models/clip/t5xxl_fp16.safetensors"

No idea why neither branch that "supports" SD3.5 actually supports it. Technically the scripts do, but the GUI does not.

ZfE0QQ6ds92W · 2024-12-23T07:51:42Z

Thanks for your help. Unfortunately, it still does not work. Is it a problem that I run it within an Anaconda environment?

	    INFO     Building VAE                                                                                                                                                        sd3_utils.py:258
                INFO     Loading state dict...                                                                                                                                               sd3_utils.py:260
                INFO     Loaded VAE: <All keys matched successfully>                                                                                                                         sd3_utils.py:262

import network module: networks.lora_sd3
INFO [Dataset 0] train_util.py:2495
INFO caching latents with caching strategy. train_util.py:1048
INFO caching latents... train_util.py:1097
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████|
Traceback (most recent call last):
File ".\sd-scripts\sd3_train_network.py", line 480, in
trainer.train(args)
File ".\sd-scripts\train_network.py", line 461, in train
self.cache_text_encoder_outputs_if_needed(args, accelerator, unet, vae, text_encoders, train_dataset_group, weight_dtype)
File ".\sd-scripts\sd3_train_network.py", line 264, in cache_text_encoder_outputs_if_needed
text_encoders[1].to(accelerator.device, dtype=weight_dtype)
File "C:\Users\thoma.conda\envs\Kohya_sd35_large\lib\site-packages\transformers\modeling_utils.py", line 2905, in to
return super().to(*args, **kwargs)
File "C:\Users[...].conda\envs\Kohya_sd35_large\lib\site-packages\torch\nn\modules\module.py", line 1340, in to
return self._apply(convert)
File "C:\Users[...].conda\envs\Kohya_sd35_large\lib\site-packages\torch\nn\modules\module.py", line 900, in _apply
module._apply(fn)
File "C:\Users[...].conda\envs\Kohya_sd35_large\lib\site-packages\torch\nn\modules\module.py", line 900, in _apply
module._apply(fn)
File "C:\Users[...].conda\envs\Kohya_sd35_large\lib\site-packages\torch\nn\modules\module.py", line 900, in _apply
module._apply(fn)
File "C:\Users[...].conda\envs\Kohya_sd35_large\lib\site-packages\torch\nn\modules\module.py", line 927, in _apply
param_applied = fn(param)
File "C:\Users[...].conda\envs\Kohya_sd35_large\lib\site-packages\torch\nn\modules\module.py", line 1333, in convert
raise NotImplementedError(
NotImplementedError: Cannot copy out of meta tensor; no data! Please use torch.nn.Module.to_empty() instead of torch.nn.Module.to() when moving module from meta to a different device.
Traceback (most recent call last):
File "C:\Users[...].conda\envs\Kohya_sd35_large\lib\runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "C:\Users[...].conda\envs\Kohya_sd35_large\lib\runpy.py", line 86, in run_code
exec(code, run_globals)
File "C:\Users[...].conda\envs\Kohya_sd35_large\Scripts\accelerate.EXE_main.py", line 7, in
sys.exit(main())
File "C:\Users[...].conda\envs\Kohya_sd35_large\lib\site-packages\accelerate\commands\accelerate_cli.py", line 48, in main
args.func(args)
File "C:\Users[...].conda\envs\Kohya_sd35_large\lib\site-packages\accelerate\commands\launch.py", line 1106, in launch_command
simple_launcher(args)
File "C:\Users[...].conda\envs\Kohya_sd35_large\lib\site-packages\accelerate\commands\launch.py", line 704, in simple_launcher
raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
subprocess.CalledProcessError: Command '['C:\Users\[...]\.conda\envs\Kohya_sd35_large\python.exe', './sd-scripts/sd3_train_network.py', '--config_file', './config_lora-20241222-214855.toml', '--clip_l', './Models/clip/clip_l.safetensors', '--clip_g', './Models/clip/clip_g.safetensors', '--t5xxl', './Models/clip/t5xxl_fp16.safetensors']' returned non-zero exit status 1.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Can't load SD 3.5 checkpoint // sd3-sd3.5-flux branch #3017

Can't load SD 3.5 checkpoint // sd3-sd3.5-flux branch #3017

ZfE0QQ6ds92W commented Dec 16, 2024

ZfE0QQ6ds92W commented Dec 18, 2024

Impudence12 commented Dec 22, 2024

ZfE0QQ6ds92W commented Dec 23, 2024

Can't load SD 3.5 checkpoint // sd3-sd3.5-flux branch #3017

Can't load SD 3.5 checkpoint // sd3-sd3.5-flux branch #3017

Comments

ZfE0QQ6ds92W commented Dec 16, 2024

ZfE0QQ6ds92W commented Dec 18, 2024

Impudence12 commented Dec 22, 2024

ZfE0QQ6ds92W commented Dec 23, 2024