Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Translators complain about big chunks of SC description text #4135

Open
10110111 opened this issue Feb 16, 2025 · 7 comments
Open

Translators complain about big chunks of SC description text #4135

10110111 opened this issue Feb 16, 2025 · 7 comments
Labels
infrastructure Infrastructure related issues subsystem: skycultures The issue is related to skycultures of planetarium...

Comments

@10110111
Copy link
Contributor

As I expected, we got the first complaint about big chunks of text in stellarium-skycultures-descriptions, link:

These huge text junks are neither translatable nor maintainable. Please separate them into smaller units.

So I think something has to be done about this. It indeed looks like a nightmare for a translator to have to

  • Navigate in the rather tight UI of Transifex with rather big structures (e.g. tables can be both high and wide, especially in HTML);
  • Re-translate everything on minute changes or try to rely on Transifex' Translation Memory, which I'm not sure will help with highlighting differences between the old source English text and the new one.

And discussed in #3751, we can't simply switch all languages to machine translation. So it looks like either the msgids in the .po files of the SCs being imported should be split by the import script (not sure how feasible it is), or the original .po files in stellarium-skycultures repo should keep the strings split into smaller parts.

Oh, and we need to find a way to keep the version in stellarium-skycultures up to date, otherwise it will be problematic to synchronize them.

@gzotti
Copy link
Member

gzotti commented Feb 16, 2025

Is it possible to split tables line-wise? It will still be awkward, and without context this approach is also terrible for translators, but may be still somewhat easier to handle in the Transifex GUI.

We also said SC translation is something that in some cases only experts can do properly. Could those few be instructed how to work with the po files offline? In earlier times per-language versions of the description files could be provided (which, being not watched, fell out of sync, so this is to be avoided this time...).

If stellarium-skycultures is the actual common repo to maintain for all Stellarium projects from now on, there should be an import mechanism during building (as stupid as cloning nested git repos with skycultures listed in .gitignore of stellarium?), and the SCs should then no longer be stored in the stellarium repo. But then we need to be clear about format variations, optional components or new features aimed at scientific use that s-w and s-m will happily ignore for efficiency, etc. This also again opens the question of making SCs available as optional downloads (with translations...).

@10110111
Copy link
Contributor Author

10110111 commented Feb 16, 2025

Is it possible to split tables line-wise?

It's problematic in general. The table in the entry linked to above is in Markdown, but we also have complicated tables in e.g. Babylonian SCs, which we'd finally have to somehow split into small entries as we used to do with util/skycultures/extract.sh.

We also said SC translation is something that in some cases only experts can do properly. Could those few be instructed how to work with the po files offline?

I'm afraid with this approach we'll never get the SC descriptions translated. First, there are not many translators in general, and even fewer of them are actually experts in the subject.

Another problem is that current translators often mess up formatting (e.g. Lithuanian texts often contain <a href=„ instead of <a href="; BTW I forgot to fix this while converting the SCs...), which means that we need some more elaborate system for this. Maybe make a special tool for translating SC descriptions, that would only allow editing the text but not the formatting? Or does such a tool already exist? I thought gettext would do this separation (as the practice used to be with util/skycultures/extract.sh), but somehow Lithuanian translations appeared rather recently, and still contain this broken formatting.

there should be an import mechanism during building

There is an import script, but it takes half an hour on my machine to convert all the separate .po files into a single file for each locale (which may actually be a fault of the importer, as I suddenly realized), but the importer does make some changes to extracted comments to disambiguate them, so simple msgmerge/msgcat isn't sufficient. I guess we could edit the original comments in the .po files of individual SCs to be unambiguous from the beginning, so that simple msgcat would work well.

@gzotti
Copy link
Member

gzotti commented Feb 16, 2025

Yes, some formatting is weird. I recently had latex fail on a source written by someone on a Mac which occasionally (not consistently!) split German umlauts into "base vowel plus diaeresis diacritics". It also always messes up with uncommon hyphens. UTF8 would have been fine with umlauts...

Then the "import SC" step should not be auto-run on every compilation, of course. During feature development it should not run. Just any "release", "install" or "package build" workflow, or the occasional manual update call, would have to call it to re-import the current state of SCs.

Are these separate .pos for each skyculture? Would that already allow per-SC downloads/installations?

@10110111
Copy link
Contributor Author

Are these separate .pos for each skyculture? Would that already allow per-SC downloads/installations?

They are separate, but Stellarium keeps all translations for skycultures in single .po per locale. So an import step is to merge the different SCs' .po files into this one.

@alex-w alex-w added the infrastructure Infrastructure related issues label Feb 16, 2025
@10110111
Copy link
Contributor Author

10110111 commented Feb 18, 2025

One additional problem is that <notr> tags are not followed by humans. This used to be handled by po4a that was instructed to not extract items from inside this tag. Now I see that Swedish translation was completed with lots of such items translated...

@10110111
Copy link
Contributor Author

And it looks like Markdown is exacerbating the problem: here the markdown for hyperlinks [text](URL) is split into [text] (URL), which will of course break the hyperlink into two plain text pieces with brackets.

@gzotti
Copy link
Member

gzotti commented Feb 18, 2025

Hehe, for years I had wondered why Firefox offered, in German, "Link teilen" (I only later saw it means "share link"). German "teilen"="split (into pieces)", and only "share" as secondary translation. Why would I ever split a URL to have two broken parts? Now this perfectly illustrates the problem :-)

@alex-w alex-w added the subsystem: skycultures The issue is related to skycultures of planetarium... label Feb 19, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
infrastructure Infrastructure related issues subsystem: skycultures The issue is related to skycultures of planetarium...
Development

No branches or pull requests

3 participants