2026-04-17 10:24:25,709 - main - INFO - ============================================================
2026-04-17 10:24:25,709 - main - INFO - voicebox-server starting up...
2026-04-17 10:24:25,710 - main - INFO - Python version: 3.12.10 (tags/v3.12.10:0cc8128, Apr 8 2025, 12:21:36) [MSC v.1943 64 bit (AMD64)]
2026-04-17 10:24:25,710 - main - INFO - Executable: C:\Program Files\Voicebox\voicebox-server.exe
2026-04-17 10:24:25,710 - main - INFO - Arguments: ['\\?\C:\Program Files\Voicebox\voicebox-server.exe', '--data-dir', 'C:\Users\cosmi\AppData\Roaming\sh.voicebox.app', '--port', '17493', '--parent-pid', '8908']
2026-04-17 10:24:25,710 - main - INFO - ============================================================
2026-04-17 10:24:25,710 - main - INFO - Importing argparse...
2026-04-17 10:24:25,711 - main - INFO - Importing uvicorn...
2026-04-17 10:24:25,790 - main - INFO - Standard library imports successful
2026-04-17 10:24:25,790 - main - INFO - Importing backend.config...
2026-04-17 10:24:25,791 - main - INFO - Importing backend.database...
2026-04-17 10:24:25,977 - main - INFO - Importing backend.main (this may take a while due to torch/transformers)...
2026-04-17 10:24:28,213 - main - INFO - Backend imports successful
2026-04-17 10:24:28,214 - main - INFO - Backend variant: CPU
server.py:278: DeprecationWarning:
on_event is deprecated, use lifespan event handlers instead.
Read more about it in the
FastAPI docs for Lifespan Events.
2026-04-17 10:24:28,214 - main - INFO - Parsed arguments: host=127.0.0.1, port=17493, data_dir=C:\Users\cosmi\AppData\Roaming\sh.voicebox.app
2026-04-17 10:24:28,214 - main - INFO - Setting data directory to: C:\Users\cosmi\AppData\Roaming\sh.voicebox.app
2026-04-17 10:24:28,215 - backend.config - INFO - Data directory set to: C:\Users\cosmi\AppData\Roaming\sh.voicebox.app
2026-04-17 10:24:28,215 - main - INFO - Initializing database...
2026-04-17 10:24:28,252 - backend.database.migrations - INFO - Added voice_type column to profiles
2026-04-17 10:24:28,259 - backend.database.migrations - INFO - Added preset_engine column to profiles
2026-04-17 10:24:28,266 - backend.database.migrations - INFO - Added preset_voice_id column to profiles
2026-04-17 10:24:28,274 - backend.database.migrations - INFO - Added design_prompt column to profiles
2026-04-17 10:24:28,282 - backend.database.migrations - INFO - Added default_engine column to profiles
2026-04-17 10:24:28,294 - backend.database.migrations - INFO - Normalized 1 stored file paths
2026-04-17 10:24:28,325 - main - INFO - Database initialized successfully
2026-04-17 10:24:28,325 - main - INFO - Starting uvicorn server on 127.0.0.1:17493...
INFO: Started server process [44612]
INFO: Waiting for application startup.
2026-04-17 10:24:28,351 - backend.app - INFO - Voicebox v0.4.0 starting up
2026-04-17 10:24:28,351 - backend.app - INFO - Python 3.12.10 on Windows 11 (AMD64)
2026-04-17 10:24:28,362 - backend.app - INFO - Database: C:\Users\cosmi\AppData\Roaming\sh.voicebox.app\voicebox.db
2026-04-17 10:24:28,363 - backend.app - INFO - Data directory: C:\Users\cosmi\AppData\Roaming\sh.voicebox.app
2026-04-17 10:24:28,366 - backend.app - INFO - Profiles: 1, Generations: 2
2026-04-17 10:24:28,368 - backend.app - INFO - Backend: PYTORCH
2026-04-17 10:24:28,368 - backend.app - INFO - GPU: None (CPU only)
2026-04-17 10:24:28,372 - backend.app - INFO - Model cache: C:\Users\cosmi.cache\huggingface\hub
2026-04-17 10:24:28,372 - backend.app - INFO - Ready
2026-04-17 10:24:28,373 - watchdog - INFO - Parent watchdog started, monitoring PID 8908, server PID 44612
2026-04-17 10:24:28,373 - watchdog - INFO - Parent PID 8908 initial check: alive=True
INFO: Application startup complete.
INFO: Uvicorn running on http://127.0.0.1:17493 (Press CTRL+C to quit)
INFO: 127.0.0.1:51894 - "OPTIONS /profiles HTTP/1.1" 200 OK
INFO: 127.0.0.1:58406 - "OPTIONS /history?limit=20 HTTP/1.1" 200 OK
INFO: 127.0.0.1:56233 - "OPTIONS /effects/presets HTTP/1.1" 200 OK
INFO: 127.0.0.1:65402 - "OPTIONS /tasks/active HTTP/1.1" 200 OK
INFO: 127.0.0.1:51894 - "GET /tasks/active HTTP/1.1" 200 OK
INFO: 127.0.0.1:65402 - "GET /profiles HTTP/1.1" 200 OK
INFO: 127.0.0.1:56233 - "GET /history?limit=20 HTTP/1.1" 200 OK
INFO: 127.0.0.1:58406 - "GET /effects/presets HTTP/1.1" 200 OK
INFO: 127.0.0.1:58406 - "OPTIONS /profiles/c66eb3ab-b68e-44c3-90a2-b5dbcda6c44e HTTP/1.1" 200 OK
INFO: 127.0.0.1:58406 - "GET /profiles/c66eb3ab-b68e-44c3-90a2-b5dbcda6c44e HTTP/1.1" 200 OK
INFO: 127.0.0.1:58406 - "OPTIONS /health HTTP/1.1" 200 OK
INFO: 127.0.0.1:58406 - "GET /health HTTP/1.1" 200 OK
INFO: 127.0.0.1:58406 - "OPTIONS /models/status HTTP/1.1" 200 OK
INFO: 127.0.0.1:52655 - "OPTIONS /models/cache-dir HTTP/1.1" 200 OK
INFO: 127.0.0.1:51968 - "GET /tasks/active HTTP/1.1" 200 OK
INFO: 127.0.0.1:58406 - "GET /models/status HTTP/1.1" 200 OK
INFO: 127.0.0.1:52655 - "GET /models/cache-dir HTTP/1.1" 200 OK
INFO: 127.0.0.1:52655 - "GET /tasks/active HTTP/1.1" 200 OK
INFO: 127.0.0.1:52655 - "GET /models/status HTTP/1.1" 200 OK
INFO: 127.0.0.1:51874 - "OPTIONS /generate/ef6ab366-29db-4f51-868a-5a1b33db9c83/retry HTTP/1.1" 200 OK
INFO: 127.0.0.1:51874 - "POST /generate/ef6ab366-29db-4f51-868a-5a1b33db9c83/retry HTTP/1.1" 200 OK
INFO: 127.0.0.1:51874 - "GET /history?limit=20 HTTP/1.1" 200 OK
INFO: 127.0.0.1:53015 - "GET /generate/ef6ab366-29db-4f51-868a-5a1b33db9c83/status HTTP/1.1" 200 OK
"sox" no se reconoce como un comando interno o externo,
programa o archivo por lotes ejecutable.
2026-04-17 10:24:53,916 - sox - WARNING - SoX could not be found!
If you do not have SoX, proceed here:
- - - http://sox.sourceforge.net/ - - -
If you do (or think that you should) have SoX, double-check your
path variables.
2026-04-17 10:24:54,498 - backend.backends.pytorch_backend - INFO - Loading TTS model 1.7B on cpu...
2026-04-17 10:24:54,498 - backend.utils.hf_offline_patch - INFO - [offline-guard] qwen-tts-1.7B is cached � forcing HF_HUB_OFFLINE=1
2026-04-17 10:24:55,173 - backend.utils.progress - ERROR - Marked qwen-tts-1.7B as error: Unrecognized model in Qwen/Qwen3-TTS-12Hz-1.7B-Base. Should have a model_type key in its config.json, or contain one of the following strings in its name: aimv2, aimv2_vision_model, albert, align, altclip, apertus, arcee, aria, aria_text, audio-spectrogram-transformer, autoformer, aya_vision, bamba, bark, bart, beit, bert, bert-generation, big_bird, bigbird_pegasus, biogpt, bit, bitnet, blenderbot, blenderbot-small, blip, blip-2, blip_2_qformer, bloom, blt, bridgetower, bros, camembert, canine, chameleon, chinese_clip, chinese_clip_vision_model, clap, clip, clip_text_model, clip_vision_model, clipseg, clvp, code_llama, codegen, cohere, cohere2, cohere2_vision, colpali, colqwen2, conditional_detr, convbert, convnext, convnextv2, cpmant, csm, ctrl, cvt, d_fine, dab-detr, dac, data2vec-audio, data2vec-text, data2vec-vision, dbrx, deberta, deberta-v2, decision_transformer, deepseek_v2, deepseek_v3, deepseek_vl, deepseek_vl_hybrid, deformable_detr, deit, depth_anything, depth_pro, deta, detr, dia, diffllama, dinat, dinov2, dinov2_with_registers, dinov3_convnext, dinov3_vit, distilbert, doge, donut-swin, dots1, dpr, dpt, edgetam, edgetam_video, edgetam_vision_model, efficientformer, efficientloftr, efficientnet, electra, emu3, encodec, encoder-decoder, eomt, ernie, ernie4_5, ernie4_5_moe, ernie_m, esm, evolla, exaone4, falcon, falcon_h1, falcon_mamba, fastspeech2_conformer, fastspeech2_conformer_with_hifigan, flaubert, flava, flex_olmo, florence2, fnet, focalnet, fsmt, funnel, fuyu, gemma, gemma2, gemma3, gemma3_text, gemma3n, gemma3n_audio, gemma3n_text, gemma3n_vision, git, glm, glm4, glm4_moe, glm4v, glm4v_moe, glm4v_moe_text, glm4v_text, glpn, got_ocr2, gpt-sw3, gpt2, gpt_bigcode, gpt_neo, gpt_neox, gpt_neox_japanese, gpt_oss, gptj, gptsan-japanese, granite, granite_speech, granitemoe, granitemoehybrid, granitemoeshared, granitevision, graphormer, grounding-dino, groupvit, helium, hgnet_v2, hiera, hubert, hunyuan_v1_dense, hunyuan_v1_moe, ibert, idefics, idefics2, idefics3, idefics3_vision, ijepa, imagegpt, informer, instructblip, instructblipvideo, internvl, internvl_vision, jamba, janus, jetmoe, jukebox, kosmos-2, kosmos-2.5, kyutai_speech_to_text, layoutlm, layoutlmv2, layoutlmv3, led, levit, lfm2, lfm2_vl, lightglue, lilt, llama, llama4, llama4_text, llava, llava_next, llava_next_video, llava_onevision, longcat_flash, longformer, longt5, luke, lxmert, m2m_100, mamba, mamba2, marian, markuplm, mask2former, maskformer, maskformer-swin, mbart, mctct, mega, megatron-bert, metaclip_2, mgp-str, mimi, minimax, ministral, mistral, mistral3, mixtral, mlcd, mllama, mm-grounding-dino, mobilebert, mobilenet_v1, mobilenet_v2, mobilevit, mobilevitv2, modernbert, modernbert-decoder, moonshine, moshi, mpnet, mpt, mra, mt5, musicgen, musicgen_melody, mvp, nat, nemotron, nezha, nllb-moe, nougat, nystromformer, olmo, olmo2, olmo3, olmoe, omdet-turbo, oneformer, open-llama, openai-gpt, opt, ovis2, owlv2, owlvit, paligemma, parakeet_ctc, parakeet_encoder, patchtsmixer, patchtst, pegasus, pegasus_x, perceiver, perception_encoder, perception_lm, persimmon, phi, phi3, phi4_multimodal, phimoe, pix2struct, pixtral, plbart, poolformer, pop2piano, prompt_depth_anything, prophetnet, pvt, pvt_v2, qdqbert, qwen2, qwen2_5_omni, qwen2_5_vl, qwen2_5_vl_text, qwen2_audio, qwen2_audio_encoder, qwen2_moe, qwen2_vl, qwen2_vl_text, qwen3, qwen3_moe, qwen3_next, qwen3_omni_moe, qwen3_vl, qwen3_vl_moe, qwen3_vl_moe_text, qwen3_vl_text, rag, realm, recurrent_gemma, reformer, regnet, rembert, resnet, retribert, roberta, roberta-prelayernorm, roc_bert, roformer, rt_detr, rt_detr_resnet, rt_detr_v2, rwkv, sam, sam2, sam2_hiera_det_model, sam2_video, sam2_vision_model, sam_hq, sam_hq_vision_model, sam_vision_model, seamless_m4t, seamless_m4t_v2, seed_oss, segformer, seggpt, sew, sew-d, shieldgemma2, siglip, siglip2, siglip2_vision_model, siglip_vision_model, smollm3, smolvlm, smolvlm_vision, speech-encoder-decoder, speech_to_text, speech_to_text_2, speecht5, splinter, squeezebert, stablelm, starcoder2, superglue, superpoint, swiftformer, swin, swin2sr, swinv2, switch_transformers, t5, t5gemma, table-transformer, tapas, textnet, time_series_transformer, timesfm, timesformer, timm_backbone, timm_wrapper, trajectory_transformer, transfo-xl, trocr, tvlt, tvp, udop, umt5, unispeech, unispeech-sat, univnet, upernet, van, vaultgemma, video_llava, videomae, vilt, vipllava, vision-encoder-decoder, vision-text-dual-encoder, visual_bert, vit, vit_hybrid, vit_mae, vit_msn, vitdet, vitmatte, vitpose, vitpose_backbone, vits, vivit, vjepa2, voxtral, voxtral_encoder, wav2vec2, wav2vec2-bert, wav2vec2-conformer, wavlm, whisper, xclip, xcodec, xglm, xlm, xlm-prophetnet, xlm-roberta, xlm-roberta-xl, xlnet, xlstm, xmod, yolos, yoso, zamba, zamba2, zoedepth, qwen3_tts
Traceback (most recent call last):
File "backend\services\generation.py", line 63, in run_generation
File "backend\backends_init_.py", line 400, in load_engine_model
File "backend\backends\pytorch_backend.py", line 86, in load_model_async
File "asyncio\threads.py", line 25, in to_thread
File "concurrent\futures\thread.py", line 59, in run
File "backend\backends\pytorch_backend.py", line 111, in _load_model_sync
File "qwen_tts\inference\qwen3_tts_model.py", line 112, in from_pretrained
model = AutoModel.from_pretrained(pretrained_model_name_or_path, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "transformers\models\auto\auto_factory.py", line 549, in from_pretrained
config, kwargs = AutoConfig.from_pretrained(
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "transformers\models\auto\configuration_auto.py", line 1380, in from_pretrained
raise ValueError(
ValueError: Unrecognized model in Qwen/Qwen3-TTS-12Hz-1.7B-Base. Should have a model_type key in its config.json, or contain one of the following strings in its name: aimv2, aimv2_vision_model, albert, align, altclip, apertus, arcee, aria, aria_text, audio-spectrogram-transformer, autoformer, aya_vision, bamba, bark, bart, beit, bert, bert-generation, big_bird, bigbird_pegasus, biogpt, bit, bitnet, blenderbot, blenderbot-small, blip, blip-2, blip_2_qformer, bloom, blt, bridgetower, bros, camembert, canine, chameleon, chinese_clip, chinese_clip_vision_model, clap, clip, clip_text_model, clip_vision_model, clipseg, clvp, code_llama, codegen, cohere, cohere2, cohere2_vision, colpali, colqwen2, conditional_detr, convbert, convnext, convnextv2, cpmant, csm, ctrl, cvt, d_fine, dab-detr, dac, data2vec-audio, data2vec-text, data2vec-vision, dbrx, deberta, deberta-v2, decision_transformer, deepseek_v2, deepseek_v3, deepseek_vl, deepseek_vl_hybrid, deformable_detr, deit, depth_anything, depth_pro, deta, detr, dia, diffllama, dinat, dinov2, dinov2_with_registers, dinov3_convnext, dinov3_vit, distilbert, doge, donut-swin, dots1, dpr, dpt, edgetam, edgetam_video, edgetam_vision_model, efficientformer, efficientloftr, efficientnet, electra, emu3, encodec, encoder-decoder, eomt, ernie, ernie4_5, ernie4_5_moe, ernie_m, esm, evolla, exaone4, falcon, falcon_h1, falcon_mamba, fastspeech2_conformer, fastspeech2_conformer_with_hifigan, flaubert, flava, flex_olmo, florence2, fnet, focalnet, fsmt, funnel, fuyu, gemma, gemma2, gemma3, gemma3_text, gemma3n, gemma3n_audio, gemma3n_text, gemma3n_vision, git, glm, glm4, glm4_moe, glm4v, glm4v_moe, glm4v_moe_text, glm4v_text, glpn, got_ocr2, gpt-sw3, gpt2, gpt_bigcode, gpt_neo, gpt_neox, gpt_neox_japanese, gpt_oss, gptj, gptsan-japanese, granite, granite_speech, granitemoe, granitemoehybrid, granitemoeshared, granitevision, graphormer, grounding-dino, groupvit, helium, hgnet_v2, hiera, hubert, hunyuan_v1_dense, hunyuan_v1_moe, ibert, idefics, idefics2, idefics3, idefics3_vision, ijepa, imagegpt, informer, instructblip, instructblipvideo, internvl, internvl_vision, jamba, janus, jetmoe, jukebox, kosmos-2, kosmos-2.5, kyutai_speech_to_text, layoutlm, layoutlmv2, layoutlmv3, led, levit, lfm2, lfm2_vl, lightglue, lilt, llama, llama4, llama4_text, llava, llava_next, llava_next_video, llava_onevision, longcat_flash, longformer, longt5, luke, lxmert, m2m_100, mamba, mamba2, marian, markuplm, mask2former, maskformer, maskformer-swin, mbart, mctct, mega, megatron-bert, metaclip_2, mgp-str, mimi, minimax, ministral, mistral, mistral3, mixtral, mlcd, mllama, mm-grounding-dino, mobilebert, mobilenet_v1, mobilenet_v2, mobilevit, mobilevitv2, modernbert, modernbert-decoder, moonshine, moshi, mpnet, mpt, mra, mt5, musicgen, musicgen_melody, mvp, nat, nemotron, nezha, nllb-moe, nougat, nystromformer, olmo, olmo2, olmo3, olmoe, omdet-turbo, oneformer, open-llama, openai-gpt, opt, ovis2, owlv2, owlvit, paligemma, parakeet_ctc, parakeet_encoder, patchtsmixer, patchtst, pegasus, pegasus_x, perceiver, perception_encoder, perception_lm, persimmon, phi, phi3, phi4_multimodal, phimoe, pix2struct, pixtral, plbart, poolformer, pop2piano, prompt_depth_anything, prophetnet, pvt, pvt_v2, qdqbert, qwen2, qwen2_5_omni, qwen2_5_vl, qwen2_5_vl_text, qwen2_audio, qwen2_audio_encoder, qwen2_moe, qwen2_vl, qwen2_vl_text, qwen3, qwen3_moe, qwen3_next, qwen3_omni_moe, qwen3_vl, qwen3_vl_moe, qwen3_vl_moe_text, qwen3_vl_text, rag, realm, recurrent_gemma, reformer, regnet, rembert, resnet, retribert, roberta, roberta-prelayernorm, roc_bert, roformer, rt_detr, rt_detr_resnet, rt_detr_v2, rwkv, sam, sam2, sam2_hiera_det_model, sam2_video, sam2_vision_model, sam_hq, sam_hq_vision_model, sam_vision_model, seamless_m4t, seamless_m4t_v2, seed_oss, segformer, seggpt, sew, sew-d, shieldgemma2, siglip, siglip2, siglip2_vision_model, siglip_vision_model, smollm3, smolvlm, smolvlm_vision, speech-encoder-decoder, speech_to_text, speech_to_text_2, speecht5, splinter, squeezebert, stablelm, starcoder2, superglue, superpoint, swiftformer, swin, swin2sr, swinv2, switch_transformers, t5, t5gemma, table-transformer, tapas, textnet, time_series_transformer, timesfm, timesformer, timm_backbone, timm_wrapper, trajectory_transformer, transfo-xl, trocr, tvlt, tvp, udop, umt5, unispeech, unispeech-sat, univnet, upernet, van, vaultgemma, video_llava, videomae, vilt, vipllava, vision-encoder-decoder, vision-text-dual-encoder, visual_bert, vit, vit_hybrid, vit_mae, vit_msn, vitdet, vitmatte, vitpose, vitpose_backbone, vits, vivit, vjepa2, voxtral, voxtral_encoder, wav2vec2, wav2vec2-bert, wav2vec2-conformer, wavlm, whisper, xclip, xcodec, xglm, xlm, xlm-prophetnet, xlm-roberta, xlm-roberta-xl, xlnet, xlstm, xmod, yolos, yoso, zamba, zamba2, zoedepth, qwen3_tts
Warning: flash-attn is not installed. Will only run the manual PyTorch version. Please install flash-attn for faster inference.
INFO: 127.0.0.1:53015 - "GET /history?limit=20 HTTP/1.1" 200 OK
INFO: 127.0.0.1:53015 - "GET /tasks/active HTTP/1.1" 200 OK
INFO: 127.0.0.1:56957 - "GET /tasks/active HTTP/1.1" 200 OK
2026-04-17 10:24:25,709 - main - INFO - ============================================================
2026-04-17 10:24:25,709 - main - INFO - voicebox-server starting up...
2026-04-17 10:24:25,710 - main - INFO - Python version: 3.12.10 (tags/v3.12.10:0cc8128, Apr 8 2025, 12:21:36) [MSC v.1943 64 bit (AMD64)]
2026-04-17 10:24:25,710 - main - INFO - Executable: C:\Program Files\Voicebox\voicebox-server.exe
2026-04-17 10:24:25,710 - main - INFO - Arguments: ['\\?\C:\Program Files\Voicebox\voicebox-server.exe', '--data-dir', 'C:\Users\cosmi\AppData\Roaming\sh.voicebox.app', '--port', '17493', '--parent-pid', '8908']
2026-04-17 10:24:25,710 - main - INFO - ============================================================
2026-04-17 10:24:25,710 - main - INFO - Importing argparse...
2026-04-17 10:24:25,711 - main - INFO - Importing uvicorn...
2026-04-17 10:24:25,790 - main - INFO - Standard library imports successful
2026-04-17 10:24:25,790 - main - INFO - Importing backend.config...
2026-04-17 10:24:25,791 - main - INFO - Importing backend.database...
2026-04-17 10:24:25,977 - main - INFO - Importing backend.main (this may take a while due to torch/transformers)...
2026-04-17 10:24:28,213 - main - INFO - Backend imports successful
2026-04-17 10:24:28,214 - main - INFO - Backend variant: CPU
server.py:278: DeprecationWarning:
on_event is deprecated, use lifespan event handlers instead.
Read more about it in the
FastAPI docs for Lifespan Events.
2026-04-17 10:24:28,214 - main - INFO - Parsed arguments: host=127.0.0.1, port=17493, data_dir=C:\Users\cosmi\AppData\Roaming\sh.voicebox.app
2026-04-17 10:24:28,214 - main - INFO - Setting data directory to: C:\Users\cosmi\AppData\Roaming\sh.voicebox.app
2026-04-17 10:24:28,215 - backend.config - INFO - Data directory set to: C:\Users\cosmi\AppData\Roaming\sh.voicebox.app
2026-04-17 10:24:28,215 - main - INFO - Initializing database...
2026-04-17 10:24:28,252 - backend.database.migrations - INFO - Added voice_type column to profiles
2026-04-17 10:24:28,259 - backend.database.migrations - INFO - Added preset_engine column to profiles
2026-04-17 10:24:28,266 - backend.database.migrations - INFO - Added preset_voice_id column to profiles
2026-04-17 10:24:28,274 - backend.database.migrations - INFO - Added design_prompt column to profiles
2026-04-17 10:24:28,282 - backend.database.migrations - INFO - Added default_engine column to profiles
2026-04-17 10:24:28,294 - backend.database.migrations - INFO - Normalized 1 stored file paths
2026-04-17 10:24:28,325 - main - INFO - Database initialized successfully
2026-04-17 10:24:28,325 - main - INFO - Starting uvicorn server on 127.0.0.1:17493...
INFO: Started server process [44612]
INFO: Waiting for application startup.
2026-04-17 10:24:28,351 - backend.app - INFO - Voicebox v0.4.0 starting up
2026-04-17 10:24:28,351 - backend.app - INFO - Python 3.12.10 on Windows 11 (AMD64)
2026-04-17 10:24:28,362 - backend.app - INFO - Database: C:\Users\cosmi\AppData\Roaming\sh.voicebox.app\voicebox.db
2026-04-17 10:24:28,363 - backend.app - INFO - Data directory: C:\Users\cosmi\AppData\Roaming\sh.voicebox.app
2026-04-17 10:24:28,366 - backend.app - INFO - Profiles: 1, Generations: 2
2026-04-17 10:24:28,368 - backend.app - INFO - Backend: PYTORCH
2026-04-17 10:24:28,368 - backend.app - INFO - GPU: None (CPU only)
2026-04-17 10:24:28,372 - backend.app - INFO - Model cache: C:\Users\cosmi.cache\huggingface\hub
2026-04-17 10:24:28,372 - backend.app - INFO - Ready
2026-04-17 10:24:28,373 - watchdog - INFO - Parent watchdog started, monitoring PID 8908, server PID 44612
2026-04-17 10:24:28,373 - watchdog - INFO - Parent PID 8908 initial check: alive=True
INFO: Application startup complete.
INFO: Uvicorn running on http://127.0.0.1:17493 (Press CTRL+C to quit)
INFO: 127.0.0.1:51894 - "OPTIONS /profiles HTTP/1.1" 200 OK
INFO: 127.0.0.1:58406 - "OPTIONS /history?limit=20 HTTP/1.1" 200 OK
INFO: 127.0.0.1:56233 - "OPTIONS /effects/presets HTTP/1.1" 200 OK
INFO: 127.0.0.1:65402 - "OPTIONS /tasks/active HTTP/1.1" 200 OK
INFO: 127.0.0.1:51894 - "GET /tasks/active HTTP/1.1" 200 OK
INFO: 127.0.0.1:65402 - "GET /profiles HTTP/1.1" 200 OK
INFO: 127.0.0.1:56233 - "GET /history?limit=20 HTTP/1.1" 200 OK
INFO: 127.0.0.1:58406 - "GET /effects/presets HTTP/1.1" 200 OK
INFO: 127.0.0.1:58406 - "OPTIONS /profiles/c66eb3ab-b68e-44c3-90a2-b5dbcda6c44e HTTP/1.1" 200 OK
INFO: 127.0.0.1:58406 - "GET /profiles/c66eb3ab-b68e-44c3-90a2-b5dbcda6c44e HTTP/1.1" 200 OK
INFO: 127.0.0.1:58406 - "OPTIONS /health HTTP/1.1" 200 OK
INFO: 127.0.0.1:58406 - "GET /health HTTP/1.1" 200 OK
INFO: 127.0.0.1:58406 - "OPTIONS /models/status HTTP/1.1" 200 OK
INFO: 127.0.0.1:52655 - "OPTIONS /models/cache-dir HTTP/1.1" 200 OK
INFO: 127.0.0.1:51968 - "GET /tasks/active HTTP/1.1" 200 OK
INFO: 127.0.0.1:58406 - "GET /models/status HTTP/1.1" 200 OK
INFO: 127.0.0.1:52655 - "GET /models/cache-dir HTTP/1.1" 200 OK
INFO: 127.0.0.1:52655 - "GET /tasks/active HTTP/1.1" 200 OK
INFO: 127.0.0.1:52655 - "GET /models/status HTTP/1.1" 200 OK
INFO: 127.0.0.1:51874 - "OPTIONS /generate/ef6ab366-29db-4f51-868a-5a1b33db9c83/retry HTTP/1.1" 200 OK
INFO: 127.0.0.1:51874 - "POST /generate/ef6ab366-29db-4f51-868a-5a1b33db9c83/retry HTTP/1.1" 200 OK
INFO: 127.0.0.1:51874 - "GET /history?limit=20 HTTP/1.1" 200 OK
INFO: 127.0.0.1:53015 - "GET /generate/ef6ab366-29db-4f51-868a-5a1b33db9c83/status HTTP/1.1" 200 OK
"sox" no se reconoce como un comando interno o externo,
programa o archivo por lotes ejecutable.
2026-04-17 10:24:53,916 - sox - WARNING - SoX could not be found!
If you do not have SoX, proceed here:
- - - http://sox.sourceforge.net/ - - -
If you do (or think that you should) have SoX, double-check your
path variables.
2026-04-17 10:24:54,498 - backend.backends.pytorch_backend - INFO - Loading TTS model 1.7B on cpu...
2026-04-17 10:24:54,498 - backend.utils.hf_offline_patch - INFO - [offline-guard] qwen-tts-1.7B is cached � forcing HF_HUB_OFFLINE=1
2026-04-17 10:24:55,173 - backend.utils.progress - ERROR - Marked qwen-tts-1.7B as error: Unrecognized model in Qwen/Qwen3-TTS-12Hz-1.7B-Base. Should have a
model_typekey in its config.json, or contain one of the following strings in its name: aimv2, aimv2_vision_model, albert, align, altclip, apertus, arcee, aria, aria_text, audio-spectrogram-transformer, autoformer, aya_vision, bamba, bark, bart, beit, bert, bert-generation, big_bird, bigbird_pegasus, biogpt, bit, bitnet, blenderbot, blenderbot-small, blip, blip-2, blip_2_qformer, bloom, blt, bridgetower, bros, camembert, canine, chameleon, chinese_clip, chinese_clip_vision_model, clap, clip, clip_text_model, clip_vision_model, clipseg, clvp, code_llama, codegen, cohere, cohere2, cohere2_vision, colpali, colqwen2, conditional_detr, convbert, convnext, convnextv2, cpmant, csm, ctrl, cvt, d_fine, dab-detr, dac, data2vec-audio, data2vec-text, data2vec-vision, dbrx, deberta, deberta-v2, decision_transformer, deepseek_v2, deepseek_v3, deepseek_vl, deepseek_vl_hybrid, deformable_detr, deit, depth_anything, depth_pro, deta, detr, dia, diffllama, dinat, dinov2, dinov2_with_registers, dinov3_convnext, dinov3_vit, distilbert, doge, donut-swin, dots1, dpr, dpt, edgetam, edgetam_video, edgetam_vision_model, efficientformer, efficientloftr, efficientnet, electra, emu3, encodec, encoder-decoder, eomt, ernie, ernie4_5, ernie4_5_moe, ernie_m, esm, evolla, exaone4, falcon, falcon_h1, falcon_mamba, fastspeech2_conformer, fastspeech2_conformer_with_hifigan, flaubert, flava, flex_olmo, florence2, fnet, focalnet, fsmt, funnel, fuyu, gemma, gemma2, gemma3, gemma3_text, gemma3n, gemma3n_audio, gemma3n_text, gemma3n_vision, git, glm, glm4, glm4_moe, glm4v, glm4v_moe, glm4v_moe_text, glm4v_text, glpn, got_ocr2, gpt-sw3, gpt2, gpt_bigcode, gpt_neo, gpt_neox, gpt_neox_japanese, gpt_oss, gptj, gptsan-japanese, granite, granite_speech, granitemoe, granitemoehybrid, granitemoeshared, granitevision, graphormer, grounding-dino, groupvit, helium, hgnet_v2, hiera, hubert, hunyuan_v1_dense, hunyuan_v1_moe, ibert, idefics, idefics2, idefics3, idefics3_vision, ijepa, imagegpt, informer, instructblip, instructblipvideo, internvl, internvl_vision, jamba, janus, jetmoe, jukebox, kosmos-2, kosmos-2.5, kyutai_speech_to_text, layoutlm, layoutlmv2, layoutlmv3, led, levit, lfm2, lfm2_vl, lightglue, lilt, llama, llama4, llama4_text, llava, llava_next, llava_next_video, llava_onevision, longcat_flash, longformer, longt5, luke, lxmert, m2m_100, mamba, mamba2, marian, markuplm, mask2former, maskformer, maskformer-swin, mbart, mctct, mega, megatron-bert, metaclip_2, mgp-str, mimi, minimax, ministral, mistral, mistral3, mixtral, mlcd, mllama, mm-grounding-dino, mobilebert, mobilenet_v1, mobilenet_v2, mobilevit, mobilevitv2, modernbert, modernbert-decoder, moonshine, moshi, mpnet, mpt, mra, mt5, musicgen, musicgen_melody, mvp, nat, nemotron, nezha, nllb-moe, nougat, nystromformer, olmo, olmo2, olmo3, olmoe, omdet-turbo, oneformer, open-llama, openai-gpt, opt, ovis2, owlv2, owlvit, paligemma, parakeet_ctc, parakeet_encoder, patchtsmixer, patchtst, pegasus, pegasus_x, perceiver, perception_encoder, perception_lm, persimmon, phi, phi3, phi4_multimodal, phimoe, pix2struct, pixtral, plbart, poolformer, pop2piano, prompt_depth_anything, prophetnet, pvt, pvt_v2, qdqbert, qwen2, qwen2_5_omni, qwen2_5_vl, qwen2_5_vl_text, qwen2_audio, qwen2_audio_encoder, qwen2_moe, qwen2_vl, qwen2_vl_text, qwen3, qwen3_moe, qwen3_next, qwen3_omni_moe, qwen3_vl, qwen3_vl_moe, qwen3_vl_moe_text, qwen3_vl_text, rag, realm, recurrent_gemma, reformer, regnet, rembert, resnet, retribert, roberta, roberta-prelayernorm, roc_bert, roformer, rt_detr, rt_detr_resnet, rt_detr_v2, rwkv, sam, sam2, sam2_hiera_det_model, sam2_video, sam2_vision_model, sam_hq, sam_hq_vision_model, sam_vision_model, seamless_m4t, seamless_m4t_v2, seed_oss, segformer, seggpt, sew, sew-d, shieldgemma2, siglip, siglip2, siglip2_vision_model, siglip_vision_model, smollm3, smolvlm, smolvlm_vision, speech-encoder-decoder, speech_to_text, speech_to_text_2, speecht5, splinter, squeezebert, stablelm, starcoder2, superglue, superpoint, swiftformer, swin, swin2sr, swinv2, switch_transformers, t5, t5gemma, table-transformer, tapas, textnet, time_series_transformer, timesfm, timesformer, timm_backbone, timm_wrapper, trajectory_transformer, transfo-xl, trocr, tvlt, tvp, udop, umt5, unispeech, unispeech-sat, univnet, upernet, van, vaultgemma, video_llava, videomae, vilt, vipllava, vision-encoder-decoder, vision-text-dual-encoder, visual_bert, vit, vit_hybrid, vit_mae, vit_msn, vitdet, vitmatte, vitpose, vitpose_backbone, vits, vivit, vjepa2, voxtral, voxtral_encoder, wav2vec2, wav2vec2-bert, wav2vec2-conformer, wavlm, whisper, xclip, xcodec, xglm, xlm, xlm-prophetnet, xlm-roberta, xlm-roberta-xl, xlnet, xlstm, xmod, yolos, yoso, zamba, zamba2, zoedepth, qwen3_ttsTraceback (most recent call last):
File "backend\services\generation.py", line 63, in run_generation
File "backend\backends_init_.py", line 400, in load_engine_model
File "backend\backends\pytorch_backend.py", line 86, in load_model_async
File "asyncio\threads.py", line 25, in to_thread
File "concurrent\futures\thread.py", line 59, in run
File "backend\backends\pytorch_backend.py", line 111, in _load_model_sync
File "qwen_tts\inference\qwen3_tts_model.py", line 112, in from_pretrained
model = AutoModel.from_pretrained(pretrained_model_name_or_path, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "transformers\models\auto\auto_factory.py", line 549, in from_pretrained
config, kwargs = AutoConfig.from_pretrained(
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "transformers\models\auto\configuration_auto.py", line 1380, in from_pretrained
raise ValueError(
ValueError: Unrecognized model in Qwen/Qwen3-TTS-12Hz-1.7B-Base. Should have a
model_typekey in its config.json, or contain one of the following strings in its name: aimv2, aimv2_vision_model, albert, align, altclip, apertus, arcee, aria, aria_text, audio-spectrogram-transformer, autoformer, aya_vision, bamba, bark, bart, beit, bert, bert-generation, big_bird, bigbird_pegasus, biogpt, bit, bitnet, blenderbot, blenderbot-small, blip, blip-2, blip_2_qformer, bloom, blt, bridgetower, bros, camembert, canine, chameleon, chinese_clip, chinese_clip_vision_model, clap, clip, clip_text_model, clip_vision_model, clipseg, clvp, code_llama, codegen, cohere, cohere2, cohere2_vision, colpali, colqwen2, conditional_detr, convbert, convnext, convnextv2, cpmant, csm, ctrl, cvt, d_fine, dab-detr, dac, data2vec-audio, data2vec-text, data2vec-vision, dbrx, deberta, deberta-v2, decision_transformer, deepseek_v2, deepseek_v3, deepseek_vl, deepseek_vl_hybrid, deformable_detr, deit, depth_anything, depth_pro, deta, detr, dia, diffllama, dinat, dinov2, dinov2_with_registers, dinov3_convnext, dinov3_vit, distilbert, doge, donut-swin, dots1, dpr, dpt, edgetam, edgetam_video, edgetam_vision_model, efficientformer, efficientloftr, efficientnet, electra, emu3, encodec, encoder-decoder, eomt, ernie, ernie4_5, ernie4_5_moe, ernie_m, esm, evolla, exaone4, falcon, falcon_h1, falcon_mamba, fastspeech2_conformer, fastspeech2_conformer_with_hifigan, flaubert, flava, flex_olmo, florence2, fnet, focalnet, fsmt, funnel, fuyu, gemma, gemma2, gemma3, gemma3_text, gemma3n, gemma3n_audio, gemma3n_text, gemma3n_vision, git, glm, glm4, glm4_moe, glm4v, glm4v_moe, glm4v_moe_text, glm4v_text, glpn, got_ocr2, gpt-sw3, gpt2, gpt_bigcode, gpt_neo, gpt_neox, gpt_neox_japanese, gpt_oss, gptj, gptsan-japanese, granite, granite_speech, granitemoe, granitemoehybrid, granitemoeshared, granitevision, graphormer, grounding-dino, groupvit, helium, hgnet_v2, hiera, hubert, hunyuan_v1_dense, hunyuan_v1_moe, ibert, idefics, idefics2, idefics3, idefics3_vision, ijepa, imagegpt, informer, instructblip, instructblipvideo, internvl, internvl_vision, jamba, janus, jetmoe, jukebox, kosmos-2, kosmos-2.5, kyutai_speech_to_text, layoutlm, layoutlmv2, layoutlmv3, led, levit, lfm2, lfm2_vl, lightglue, lilt, llama, llama4, llama4_text, llava, llava_next, llava_next_video, llava_onevision, longcat_flash, longformer, longt5, luke, lxmert, m2m_100, mamba, mamba2, marian, markuplm, mask2former, maskformer, maskformer-swin, mbart, mctct, mega, megatron-bert, metaclip_2, mgp-str, mimi, minimax, ministral, mistral, mistral3, mixtral, mlcd, mllama, mm-grounding-dino, mobilebert, mobilenet_v1, mobilenet_v2, mobilevit, mobilevitv2, modernbert, modernbert-decoder, moonshine, moshi, mpnet, mpt, mra, mt5, musicgen, musicgen_melody, mvp, nat, nemotron, nezha, nllb-moe, nougat, nystromformer, olmo, olmo2, olmo3, olmoe, omdet-turbo, oneformer, open-llama, openai-gpt, opt, ovis2, owlv2, owlvit, paligemma, parakeet_ctc, parakeet_encoder, patchtsmixer, patchtst, pegasus, pegasus_x, perceiver, perception_encoder, perception_lm, persimmon, phi, phi3, phi4_multimodal, phimoe, pix2struct, pixtral, plbart, poolformer, pop2piano, prompt_depth_anything, prophetnet, pvt, pvt_v2, qdqbert, qwen2, qwen2_5_omni, qwen2_5_vl, qwen2_5_vl_text, qwen2_audio, qwen2_audio_encoder, qwen2_moe, qwen2_vl, qwen2_vl_text, qwen3, qwen3_moe, qwen3_next, qwen3_omni_moe, qwen3_vl, qwen3_vl_moe, qwen3_vl_moe_text, qwen3_vl_text, rag, realm, recurrent_gemma, reformer, regnet, rembert, resnet, retribert, roberta, roberta-prelayernorm, roc_bert, roformer, rt_detr, rt_detr_resnet, rt_detr_v2, rwkv, sam, sam2, sam2_hiera_det_model, sam2_video, sam2_vision_model, sam_hq, sam_hq_vision_model, sam_vision_model, seamless_m4t, seamless_m4t_v2, seed_oss, segformer, seggpt, sew, sew-d, shieldgemma2, siglip, siglip2, siglip2_vision_model, siglip_vision_model, smollm3, smolvlm, smolvlm_vision, speech-encoder-decoder, speech_to_text, speech_to_text_2, speecht5, splinter, squeezebert, stablelm, starcoder2, superglue, superpoint, swiftformer, swin, swin2sr, swinv2, switch_transformers, t5, t5gemma, table-transformer, tapas, textnet, time_series_transformer, timesfm, timesformer, timm_backbone, timm_wrapper, trajectory_transformer, transfo-xl, trocr, tvlt, tvp, udop, umt5, unispeech, unispeech-sat, univnet, upernet, van, vaultgemma, video_llava, videomae, vilt, vipllava, vision-encoder-decoder, vision-text-dual-encoder, visual_bert, vit, vit_hybrid, vit_mae, vit_msn, vitdet, vitmatte, vitpose, vitpose_backbone, vits, vivit, vjepa2, voxtral, voxtral_encoder, wav2vec2, wav2vec2-bert, wav2vec2-conformer, wavlm, whisper, xclip, xcodec, xglm, xlm, xlm-prophetnet, xlm-roberta, xlm-roberta-xl, xlnet, xlstm, xmod, yolos, yoso, zamba, zamba2, zoedepth, qwen3_ttsWarning: flash-attn is not installed. Will only run the manual PyTorch version. Please install flash-attn for faster inference.
INFO: 127.0.0.1:53015 - "GET /history?limit=20 HTTP/1.1" 200 OK
INFO: 127.0.0.1:53015 - "GET /tasks/active HTTP/1.1" 200 OK
INFO: 127.0.0.1:56957 - "GET /tasks/active HTTP/1.1" 200 OK