RuntimeError: cuDNN Frontend error #3049

techidsk · 2025-01-14T08:58:35Z

2025-01-14 16:41:25 INFO     Checking the state dict: Diffusers or BFL, dev or schnell                                                                                           flux_utils.py:43
                    INFO     Building Flux model dev from BFL checkpoint                                                                                                        flux_utils.py:101
2025-01-14 16:41:26 INFO     Loading state dict from /home/techidsk/code/kohya_ss/models/flux1-dev.safetensors                                                                  flux_utils.py:118
                    INFO     Loaded Flux: <All keys matched successfully>                                                                                                       flux_utils.py:137
                    INFO     Cast FLUX model to fp8. This may take a while. You can reduce the time by using fp8 checkpoint. /                                          flux_train_network.py:101
                             FLUXモデルをfp8に変換しています。これには時間がかかる場合があります。fp8チェックポイントを使用することで時間を短縮できます。
2025-01-14 16:41:43 INFO     Building CLIP-L                                                                                                                                    flux_utils.py:179
                    INFO     Loading state dict from /home/techidsk/code/ComfyUI/models/clip/clip_l.safetensors                                                                 flux_utils.py:275
                    INFO     Loaded CLIP-L: <All keys matched successfully>                                                                                                     flux_utils.py:278
                    INFO     Loading state dict from /home/techidsk/code/ComfyUI/models/clip/t5xxl_fp16.safetensors                                                             flux_utils.py:330
2025-01-14 16:41:44 INFO     Loaded T5xxl: <All keys matched successfully>                                                                                                      flux_utils.py:333
                    INFO     Building AutoEncoder                                                                                                                               flux_utils.py:144
                    INFO     Loading state dict from /home/techidsk/code/kohya_ss/models/ae.safetensors                                                                         flux_utils.py:149
                    INFO     Loaded AE: <All keys matched successfully>                                                                                                         flux_utils.py:152
import network module: networks.lora_flux
                    INFO     [Dataset 0]                                                                                                                                       train_util.py:2495
                    INFO     caching latents with caching strategy.                                                                                                            train_util.py:1048
                    INFO     caching latents...                                                                                                                                train_util.py:1097
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 123/123 [00:00<00:00, 2214.25it/s]
                    INFO     move vae and unet to cpu to save memory                                                                                                    flux_train_network.py:203
                    INFO     move text encoders to gpu                                                                                                                  flux_train_network.py:211
2025-01-14 16:41:55 INFO     [Dataset 0]                                                                                                                                       train_util.py:2517
                    INFO     caching Text Encoder outputs with caching strategy.                                                                                               train_util.py:1231
                    INFO     checking cache validity...                                                                                                                        train_util.py:1242
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 123/123 [00:00<00:00, 182232.21it/s]
                    INFO     caching Text Encoder outputs...                                                                                                                   train_util.py:1273
  0%|                                                                                                                                                                    | 0/123 [00:00<?, ?it/s]Could not load library libcuda.so. Error: libcuda.so: cannot open shared object file: No such file or directory
Could not load library libcuda.so. Error: libcuda.so: cannot open shared object file: No such file or directory
Could not load library libcuda.so. Error: libcuda.so: cannot open shared object file: No such file or directory
  0%|                                                                                                                                                                    | 0/123 [00:00<?, ?it/s]
Traceback (most recent call last):
  File "/home/techidsk/code/kohya_ss/sd-scripts/flux_train_network.py", line 583, in <module>
    trainer.train(args)
  File "/home/techidsk/code/kohya_ss/sd-scripts/train_network.py", line 461, in train
    self.cache_text_encoder_outputs_if_needed(args, accelerator, unet, vae, text_encoders, train_dataset_group, weight_dtype)
  File "/home/techidsk/code/kohya_ss/sd-scripts/flux_train_network.py", line 223, in cache_text_encoder_outputs_if_needed
    dataset.new_cache_text_encoder_outputs(text_encoders, accelerator)
  File "/home/techidsk/code/kohya_ss/sd-scripts/library/train_util.py", line 2518, in new_cache_text_encoder_outputs
    dataset.new_cache_text_encoder_outputs(models, accelerator)
  File "/home/techidsk/code/kohya_ss/sd-scripts/library/train_util.py", line 1276, in new_cache_text_encoder_outputs
    caching_strategy.cache_batch_outputs(tokenize_strategy, models, text_encoding_strategy, batch)
  File "/home/techidsk/code/kohya_ss/sd-scripts/library/strategy_flux.py", line 162, in cache_batch_outputs
    l_pooled, t5_out, txt_ids, _ = flux_text_encoding_strategy.encode_tokens(tokenize_strategy, models, tokens_and_masks)
  File "/home/techidsk/code/kohya_ss/sd-scripts/library/strategy_flux.py", line 68, in encode_tokens
    l_pooled = clip_l(l_tokens.to(clip_l.device))["pooler_output"]
  File "/home/techidsk/miniconda3/envs/kohya/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/techidsk/miniconda3/envs/kohya/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/techidsk/miniconda3/envs/kohya/lib/python3.10/site-packages/transformers/models/clip/modeling_clip.py", line 986, in forward
    return self.text_model(
  File "/home/techidsk/miniconda3/envs/kohya/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/techidsk/miniconda3/envs/kohya/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/techidsk/miniconda3/envs/kohya/lib/python3.10/site-packages/transformers/models/clip/modeling_clip.py", line 890, in forward
    encoder_outputs = self.encoder(
  File "/home/techidsk/miniconda3/envs/kohya/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/techidsk/miniconda3/envs/kohya/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/techidsk/miniconda3/envs/kohya/lib/python3.10/site-packages/transformers/models/clip/modeling_clip.py", line 813, in forward
    layer_outputs = encoder_layer(
  File "/home/techidsk/miniconda3/envs/kohya/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/techidsk/miniconda3/envs/kohya/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/techidsk/miniconda3/envs/kohya/lib/python3.10/site-packages/transformers/models/clip/modeling_clip.py", line 548, in forward
    hidden_states, attn_weights = self.self_attn(
  File "/home/techidsk/miniconda3/envs/kohya/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/techidsk/miniconda3/envs/kohya/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/techidsk/miniconda3/envs/kohya/lib/python3.10/site-packages/transformers/models/clip/modeling_clip.py", line 480, in forward
    attn_output = torch.nn.functional.scaled_dot_product_attention(
RuntimeError: cuDNN Frontend error: [cudnn_frontend] Error: No execution plans support the graph.
                    WARNING  Retrying (Retry(total=2, connect=None, read=None, redirect=None, status=None)) after connection broken by 'SSLEOFError(8, '[SSL:               connectionpool.py:870
                             UNEXPECTED_EOF_WHILE_READING] EOF occurred in violation of protocol (_ssl.c:1007)')': /api/4504800232407040/envelope/
2025-01-14 16:41:56 WARNING  Retrying (Retry(total=1, connect=None, read=None, redirect=None, status=None)) after connection broken by 'SSLEOFError(8, '[SSL:               connectionpool.py:870
                             UNEXPECTED_EOF_WHILE_READING] EOF occurred in violation of protocol (_ssl.c:1007)')': /api/4504800232407040/envelope/
                    WARNING  Retrying (Retry(total=0, connect=None, read=None, redirect=None, status=None)) after connection broken by 'SSLEOFError(8, '[SSL:               connectionpool.py:870
                             UNEXPECTED_EOF_WHILE_READING] EOF occurred in violation of protocol (_ssl.c:1007)')': /api/4504800232407040/envelope/
2025-01-14 16:41:57 WARNING  Retrying (Retry(total=2, connect=None, read=None, redirect=None, status=None)) after connection broken by 'SSLEOFError(8, '[SSL:               connectionpool.py:870
                             UNEXPECTED_EOF_WHILE_READING] EOF occurred in violation of protocol (_ssl.c:1007)')': /api/4504800232407040/envelope/
                    WARNING  Retrying (Retry(total=1, connect=None, read=None, redirect=None, status=None)) after connection broken by 'SSLEOFError(8, '[SSL:               connectionpool.py:870
                             UNEXPECTED_EOF_WHILE_READING] EOF occurred in violation of protocol (_ssl.c:1007)')': /api/4504800232407040/envelope/
Traceback (most recent call last):
  File "/home/techidsk/miniconda3/envs/kohya/bin/accelerate", line 8, in <module>
    sys.exit(main())
  File "/home/techidsk/miniconda3/envs/kohya/lib/python3.10/site-packages/accelerate/commands/accelerate_cli.py", line 48, in main
    args.func(args)
  File "/home/techidsk/miniconda3/envs/kohya/lib/python3.10/site-packages/accelerate/commands/launch.py", line 1106, in launch_command
    simple_launcher(args)
  File "/home/techidsk/miniconda3/envs/kohya/lib/python3.10/site-packages/accelerate/commands/launch.py", line 704, in simple_launcher
    raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
subprocess.CalledProcessError: Command '['/home/techidsk/miniconda3/envs/kohya/bin/python3.10', '/home/techidsk/code/kohya_ss/sd-scripts/flux_train_network.py', '--config_file', '/home/techidsk/code/kohya_ss/outputs/0114_shaoxing/config_lora-20250114-164115.toml']' returned non-zero exit status 1.

I used sd3.1 + flux branch, found the error may be occured by Pytorch 2.5.

When I use pip install pytorch 2.4, and restart the gui, it auto reinstall Pytorch 2.5.

The text was updated successfully, but these errors were encountered:

techidsk · 2025-01-14T09:04:24Z

Downgrade pytorch to 2.4 workds.
pip install torch==2.4.1 torchvision==0.19.1 torchaudio==2.4.1

Use python kohya_gui.py --noverify restart will ignore requirements

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RuntimeError: cuDNN Frontend error #3049

RuntimeError: cuDNN Frontend error #3049

techidsk commented Jan 14, 2025

techidsk commented Jan 14, 2025

RuntimeError: cuDNN Frontend error #3049

RuntimeError: cuDNN Frontend error #3049

Comments

techidsk commented Jan 14, 2025

techidsk commented Jan 14, 2025