Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Wav2vec mapping code #12

Open
k0ngolab opened this issue Sep 21, 2023 · 5 comments
Open

Wav2vec mapping code #12

k0ngolab opened this issue Sep 21, 2023 · 5 comments

Comments

@k0ngolab
Copy link

Hello,

Could you please add the code to train wav2vec mapping in deepspeech?

Thank you.

@Elsaam2y
Copy link
Owner

Hi,

I am at the moment in the process of removing wav2vec with better solution to support other languages. If it works, will add a new model with new mapping and beside training code soon. Otherwise I will update the repo with the wav2vec mapping.

@Elsaam2y
Copy link
Owner

I tried retraining the model and syncnet with the latest version of deepspeech but this didn't lead to nice results compared to using the originally trained model. The generalization and the expressivity of the lips motion were not very convincing. An alternative solution would be training a mapping model fro the latest version of deepspeech to the original version used with DINet. This would keep the same trained model of DINet, beside keeping the inference fast as the latest version of deepspeech supports GPU and onnx. Didn't have time to test it yet but feel free to give it a try and open a PR.

@tailangjun
Copy link

I tried retraining the model and syncnet with the latest version of deepspeech but this didn't lead to nice results compared to using the originally trained model. The generalization and the expressivity of the lips motion were not very convincing. An alternative solution would be training a mapping model fro the latest version of deepspeech to the original version used with DINet. This would keep the same trained model of DINet, beside keeping the inference fast as the latest version of deepspeech supports GPU and onnx. Didn't have time to test it yet but feel free to give it a try and open a PR.

请问你后面使用的是哪个版本的 deepspeech,训练过程中维度不一致的问题是怎么解决的呢,谢谢

May I ask which version of deepspeech you are using later, and how to solve the problem of inconsistent dimensions during the training process? Thank you.

@Elsaam2y
Copy link
Owner

I was using 0.9.1 and the dimensions issue is raised mainly from other languages, like Chinese. I tried learn mapping this obtained features to the expected dimensions but this didn't always work good. Furthermore, deepspeech seems to cause many problems with many different languages and that's why I am trying to rely mainly on melspectrograms at the moment.

@PengYicong
Copy link

I was using 0.9.1 and the dimensions issue is raised mainly from other languages, like Chinese. I tried learn mapping this obtained features to the expected dimensions but this didn't always work good. Furthermore, deepspeech seems to cause many problems with many different languages and that's why I am trying to rely mainly on melspectrograms at the moment.

I'm curious about what's the difference between the original DS model used in Di-Net and the 0.9.1 version? Do they output the same result given the same input audio? If so, since the later version of the DS model supports GPU and onnx, it already benefits from speed improvement from this feature. Otherwise, maybe its better to train end-to-end using language-agnostic feature like HuBERT?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants