Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Some translations are not possible #95

Open
leandroalbero opened this issue Sep 28, 2023 · 1 comment
Open

Some translations are not possible #95

leandroalbero opened this issue Sep 28, 2023 · 1 comment

Comments

@leandroalbero
Copy link

leandroalbero commented Sep 28, 2023

Issue description

Running latest image easynmt/api:2.0-cpu with the model set to m2m_100_418M and english as target language fails for some translations. Here are some examples:

  • 'imagina a mi'
  • 'imagina un sol'
  • 'imagina a un vikingo'

image
In this case for example, setting the source_lang to 'es' fixed the issue, so maybe the problem is somewhere in the language detection step or that there isn't a translation direction from the detected language to english.

Docker logs output:

[2023-09-28 08:38:08 +0000] [60] [INFO] Waiting for application startup.
[2023-09-28 08:38:08 +0000] [60] [INFO] Application startup complete.
Exception: 'jbo'

the text of the exception varies with every prompt, I guess it is the code of the detected language

@leandroalbero
Copy link
Author

leandroalbero commented Sep 28, 2023

Updating the model used by fasttext for language identification helps solve the issue, at least for the translations that failed in my tests.
https://fasttext.cc/docs/en/language-identification.html
This repo is using lid.176.ftz, switching to lid.176.bin helps because it is slightly more accurate
Lines to change are here:

EasyNMT/easynmt/EasyNMT.py

Lines 415 to 430 in 7c11ae8

def language_detection_fasttext(self, text: str) -> str:
"""
Given a text, detects the language code and returns the ISO language code. It supports 176 languages. Uses
the fasttext model for language detection:
https://fasttext.cc/blog/2017/10/02/blog-post.html
https://fasttext.cc/docs/en/language-identification.html
"""
if self._fasttext_lang_id is None:
import fasttext
fasttext.FastText.eprint = lambda x: None #Silence useless warning: https://github.com/facebookresearch/fastText/issues/1067
model_path = os.path.join(self._cache_folder, 'lid.176.ftz')
if not os.path.exists(model_path):
http_get('https://dl.fbaipublicfiles.com/fasttext/supervised-models/lid.176.ftz', model_path)
self._fasttext_lang_id = fasttext.load_model(model_path)

Yet there are still some translations that fail, maybe enabling a fallback in those cases to a slower model could help

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant