Some translations are not possible #95

leandroalbero · 2023-09-28T09:01:12Z

Issue description

Running latest image easynmt/api:2.0-cpu with the model set to m2m_100_418M and english as target language fails for some translations. Here are some examples:

'imagina a mi'
'imagina un sol'
'imagina a un vikingo'

In this case for example, setting the source_lang to 'es' fixed the issue, so maybe the problem is somewhere in the language detection step or that there isn't a translation direction from the detected language to english.

Docker logs output:

[2023-09-28 08:38:08 +0000] [60] [INFO] Waiting for application startup.
[2023-09-28 08:38:08 +0000] [60] [INFO] Application startup complete.
Exception: 'jbo'

the text of the exception varies with every prompt, I guess it is the code of the detected language

The text was updated successfully, but these errors were encountered:

leandroalbero · 2023-09-28T10:20:19Z

Updating the model used by fasttext for language identification helps solve the issue, at least for the translations that failed in my tests.
https://fasttext.cc/docs/en/language-identification.html
This repo is using lid.176.ftz, switching to lid.176.bin helps because it is slightly more accurate
Lines to change are here:

EasyNMT/easynmt/EasyNMT.py

Lines 415 to 430 in 7c11ae8

    
               def language_detection_fasttext(self, text: str) -> str: 
        
                   """ 
        
                   Given a text, detects the language code and returns the ISO language code. It supports 176 languages. Uses 
        
                   the fasttext model for language detection: 
        
                   https://fasttext.cc/blog/2017/10/02/blog-post.html 
        
                   https://fasttext.cc/docs/en/language-identification.html 
        
                   """ 
        
                   if self._fasttext_lang_id is None: 
        
                       import fasttext 
        
                       fasttext.FastText.eprint = lambda x: None   #Silence useless warning: https://github.com/facebookresearch/fastText/issues/1067 
        
                       model_path = os.path.join(self._cache_folder, 'lid.176.ftz') 
        
                       if not os.path.exists(model_path): 
        
                           http_get('https://dl.fbaipublicfiles.com/fasttext/supervised-models/lid.176.ftz', model_path) 
        
                       self._fasttext_lang_id = fasttext.load_model(model_path)

Yet there are still some translations that fail, maybe enabling a fallback in those cases to a slower model could help

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Some translations are not possible #95

Some translations are not possible #95

leandroalbero commented Sep 28, 2023 •

edited

Loading

leandroalbero commented Sep 28, 2023 •

edited

Loading

Some translations are not possible #95

Some translations are not possible #95

Comments

leandroalbero commented Sep 28, 2023 • edited Loading

Issue description

Docker logs output:

leandroalbero commented Sep 28, 2023 • edited Loading

leandroalbero commented Sep 28, 2023 •

edited

Loading

leandroalbero commented Sep 28, 2023 •

edited

Loading