Skip to content
View lucasgris's full-sized avatar

Organizations

@nilc-nlp @Sales-Holding-Equipe-voz

Block or report lucasgris

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse

Starred repositories

Showing results

Zonos-v0.1 is a leading open-weight text-to-speech model trained on more than 200k hours of varied multilingual speech, delivering expressiveness and quality on par with—or even surpassing—top TTS …

Python 4,756 446 Updated Feb 18, 2025

The open source code for SimpleSpeech series

Python 127 7 Updated Oct 8, 2024

[ICASSP 2025] FreeSVC: Towards Zero-shot Multilingual Singing Voice Conversion

Python 48 6 Updated Feb 5, 2025

Hibiki is a model for streaming speech translation (also known as simultaneous translation). Unlike offline translation—where one waits for the end of the source utterance to start translating--- H…

Rust 763 58 Updated Feb 9, 2025

[Interspeech 2024] Whisper-Flamingo: Integrating Visual Features into Whisper for Audio-Visual Speech Recognition and Translation

Jupyter Notebook 127 8 Updated Feb 12, 2025

open-source multimodal large language model that can hear, talk while thinking. Featuring real-time end-to-end speech input and streaming audio output conversational capabilities.

Python 3,144 272 Updated Nov 5, 2024

Repository for the paper "Combining audio control and style transfer using latent diffusion", accepted at ISMIR 2024

Jupyter Notebook 42 3 Updated Dec 18, 2024

Companion code for ISMIR 2017 paper "Deep Salience Representations for $F_0$ Estimation in Polyphonic Music"

Jupyter Notebook 86 20 Updated Nov 22, 2019

YuE: Open Full-song Music Generation Foundation Model, something similar to Suno.ai but open

Python 3,834 415 Updated Feb 17, 2025

🗣️🇧🇷 Bases de áudio transcrito em Português Brasileiro

Shell 55 8 Updated Mar 30, 2023

Multi-lingual large voice generation model, providing inference, training and deployment full-stack ability.

Python 10,735 1,047 Updated Feb 16, 2025

Audio Large Language Models

Python 376 21 Updated Jan 15, 2025

Awesome music generation model——MG²

Python 137 10 Updated Feb 5, 2025

Implementation of MusicLM, Google's new SOTA model for music generation using attention networks, in Pytorch

Python 3,227 261 Updated Sep 6, 2023

Text-to-Audio/Music Generation

Python 2,370 186 Updated Sep 29, 2024

Interface for OuteTTS models.

Python 923 79 Updated Feb 14, 2025

Local SRT/LLM/TTS Voicechat

Python 618 67 Updated Oct 12, 2024

Official code for "F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching"

Python 9,683 1,303 Updated Feb 18, 2025

Implementation of Liquid Nets in Pytorch

Python 57 9 Updated Jan 27, 2025

21 Lessons, Get Started Building with Generative AI 🔗 https://microsoft.github.io/generative-ai-for-beginners/

Jupyter Notebook 70,789 36,766 Updated Feb 17, 2025

A Non-Autoregressive Transformer based Text-to-Speech, supporting a family of SOTA transformers with supervised and unsupervised duration modelings. This project grows with the research community, …

Python 324 41 Updated Sep 24, 2022

Moshi is a speech-text foundation model and full-duplex spoken dialogue framework. It uses Mimi, a state-of-the-art streaming neural audio codec.

Python 7,505 600 Updated Feb 9, 2025

A HuggingFace compatible Small Language Model trainer.

Python 74 7 Updated Feb 2, 2025

Speech To Speech: an effort for an open-sourced and modular GPT4-o

Python 3,744 405 Updated Dec 4, 2024

ZMM-TTS: Zero-shot Multilingual and Multispeaker Speech Synthesis Conditioned on Self-supervised Discrete Speech Representations

C 143 9 Updated Mar 6, 2024

End-to-end Automatic Guitar Transcription

Python 7 2 Updated Oct 3, 2024

SOTA discrete acoustic codec models with 40 tokens per second for audio language modeling

Python 1,025 73 Updated Jan 2, 2025
Python 67 8 Updated Sep 3, 2024

LLaST: Improved End-to-end Speech Translation System Leveraged by Large Language Models

Python 23 1 Updated Aug 11, 2024
Next
Showing results