Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Request] Specify dataset language #18

Open
jpgallegoar opened this issue Jan 8, 2025 · 2 comments
Open

[Request] Specify dataset language #18

jpgallegoar opened this issue Jan 8, 2025 · 2 comments

Comments

@jpgallegoar
Copy link

Hello, thank you for this project. Is it possible to add a force language parameter?

@davidmartinrius
Copy link
Owner

davidmartinrius commented Jan 8, 2025

Hi @jpgallegoar
yes, it is possible. But I have not enought time to program it and validate it.
So I can tell you can you can do:

  1. add a new argument to main function, default None
  2. Add a new parameter 'language' to process_audio_files
  3. add a new parameter to process
  4. Add a new parameter 'language' to get_transcription
  5. pass argument language to result = model.transcribe(audio, batch_size=batch_size) if is not None
  6. use result["language"] when argument language is None, else use language argument

You can create a pull request if you wanted.

@davidmartinrius
Copy link
Owner

davidmartinrius commented Jan 8, 2025

If you want to make it simpler, just pass language argument to model.transcribe

And replace language = result["language"] by your desired language.

But this way, better don't create a pull request, this way is just a workaround for you

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants