Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Homepage] discussion - unnecessary end-of-line characters in strings to translate #275

Open
kaktusus opened this issue Sep 11, 2023 · 15 comments

Comments

@kaktusus
Copy link
Contributor

The issue relates to \n characters appearing in some strings exported to Crowdin.

I will introduce the problem using the example of string: Thank you for supporting FreeCAD! Whether you donated a little or a lot, all your efforts contribute to further and faster development of FreeCAD.

A string was imported into Crowdin:
https://crowdin.com/translate/freecad/27908/en-pl#6625455
obraz

After extracting from the contributor.php source file, we get:

obraz
The way it is written is very intriguing and the \n sign is presented in a different way.

homepage.pot.txt

The view in the source file:

obraz
https://github.com/FreeCAD/FreeCAD-Homepage/blob/54e47134da26d82d253184ebb4661c3911d4b583/contributor.php#L15


obraz


It is worth mentioning that there are many multi-line long strings. However, not all of them have extra line breaks inside.
Different source files use a different way of writing (eg donation.php). So the issue does not always occur.


kaktus' note
https://manpages.debian.org/unstable/gettext/xgettext.1.en.html

@kaktusus
Copy link
Contributor Author

kaktusus commented Sep 12, 2023

Solutions that come to mind:

  1. Compromise for developers and translators 😉
    additional parsing of the file with strings after extraction so as to remove everything that gets in the way (a little bit of sed magic)
    something like: sed -e 's/\\n"/"/g;s/"[[:blank:]]*/"/g' homepage.pot > homepage_modified.pot

  2. Rewriting the source files and making them look consistent
    The well-formatted code we see in the files:
    for example
    downloads.php
    donation.php

  3. Waiting for suggestions from staff ... and other users 😉

@yorikvanhavre
Copy link
Member

yorikvanhavre commented Sep 12, 2023

I would maybe write a script that combines your sed command + the xgettext command, that we use instead of the xgettext line... Maybe it could even be integrated to the updateCrowdin script?

But it needs to be tested first if that "sanitized" .po file can still be recognized and used by php.

@kaktusus
Copy link
Contributor Author

I see some special cases that may be able to be solved.
I am further conducting analysis in this regard.

obraz

  • each line of a multi-line statement should end with a space otherwise you get a cluster of words
  • I need to build a rule that allows a single blank space to be left after the " character.

What happens next with the generated homepage.pot file
I understand that this is not the end of processing ...

For testing I use: xgettext --from-code=UTF-8 -o homepage.pot *.php && sed -e 's/\\n"/"/g;s/"[[:blank:]]*/"/g' homepage.pot > homepage_modified3.pot
After that, it can be shortened to: xgettext --from-code=UTF-8 -o homepage.pot *.php && sed -ie 's/\\n"/"/g;s/"[[:blank:]]*/"/g' homepage.pot
Changes will be applied directly to the source file.

@yorikvanhavre
Copy link
Member

Hmm so every line needs to be ended by a space? That's annoying (many code editors will remove that trailing space automatically) but it's doable. Maybe that's the best solution here...

@kaktusus
Copy link
Contributor Author

kaktusus commented Sep 12, 2023

part of the code (the ugly one) is just prepared this way 😜


We can customize the processing with sed any way we want so that we are all satisfied,
you just need to choose the right rules.
at the moment we have:
searching for \n" and replacing with ".
and
searching for " with any number of spaces or tabs and replacing with ".
if we change or add search keys you can customize everything


however, we must remember that each change of the source string generates a lot of work for translators, so everything must be well thought out (and tested) and changes to the production environment must be introduced once


Hmm so every line needs to be ended by a space?

Whether we choose spaces as the last character of the line or as the first character in the line the entire code will still require adjustment to the chosen rule.

@kaktusus
Copy link
Contributor Author

I modified the selection key and thus solved the first problem I showed in the picture above

xgettext --from-code=UTF-8 -o homepage.pot *.php && sed -e 's/[[:space:]]*\\n"/ "/g;s/"[[:blank:]]*/"/g' homepage.pot > homepage_modified4.pot

obraz

The second special case requires more of my attention, I need to read the documentation to solve it.

Yorik I consider this solution as a temporary prosthesis if you would like to apply it in a production environment I have no objection.
However, for me the best solution is the one from point 2.

@chennes and @luzpaz and others what do you think about this topic?

@kaktusus
Copy link
Contributor Author

Everything works with my expectations
Too bad only Yorik picked up the gauntlet ....
as a rule, the more different opinions the better the result.

obraz

I have tested two different variants and both are perfectly suitable for my planned task:

15:59 ~/Pobrane/FreeCAD/test$ xgettext --from-code=UTF-8 -o homepage.pot *.php && sed -e 's/[[:space:]]*\\n"/ "/g;s/"\s\{2,\}/"/g' homepage.pot > homepage_modified5.pot
17:02 ~/Pobrane/FreeCAD/test$ xgettext --from-code=UTF-8 -o homepage.pot *.php && sed -e 's/[[:space:]]*\\n"/ "/g;s/"[[:blank:]]\{2,\}/"/g' homepage.pot > homepage_modified5.pot

😃

@kaktusus
Copy link
Contributor Author

Any ideas?

@yorikvanhavre
Copy link
Member

This needs to be tested first. Because if the .pot file contains a string that is different than in the HTML file, the gettext system might not be able to match and apply the translation

@kaktusus
Copy link
Contributor Author

kaktusus commented Oct 3, 2023

to make testing easier and faster, I suggest you look at the files with Polish translations. They may contain many answers.
The translations do not contain unnecessary characters /n and strings of blank spaces.

@luzpaz
Copy link
Collaborator

luzpaz commented Sep 9, 2024

Hey @Reqrefusion, thanks for all your work on the FC homepage. Could you weigh-in on this translation ticket ? TIA

@Reqrefusion
Copy link
Member

Hey @Reqrefusion, thanks for all your work on the FC homepage. Could you weigh-in on this translation ticket ? TIA

A complete herculean task. It's not so much that it will cause problems as it is that the current translation will be broken. It will take a lot of messing around to find the right way. I have a few things in mind right now, but I don't know which ones won't cause problems.

@kaktusus
Copy link
Contributor Author

kaktusus commented Sep 9, 2024

any change in the source string will generate work for the translators. The changed string will be recognized by Crowdin as a new translation unit.

There is no getting around it.

@Reqrefusion
Copy link
Member

any change in the source string will generate work for the translators. The changed string will be recognized by Crowdin as a new translation unit.

There is no getting around it.

I have been searching the internet since the morning and unfortunately you are right. Actually there are some complicated ways like manually editing the project files but they are very complicated. However since there will be an exact match I think translators just need to approve it. I have even seen that the project manager may not even ask for translator approval for exact matches. It would be better to make discussions about this through him when I will create a PR about it.

@kaktusus
Copy link
Contributor Author

Crowdin supports translators in such cases (minor string correction) and proposes translations that almost match the original string based on translation history.

Such action makes the work of translators much easier and faster. 😉

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants