Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

I'm trying to get rid of the letter ij #9244

Open
fraternl opened this issue Jan 20, 2025 · 4 comments
Open

I'm trying to get rid of the letter ij #9244

fraternl opened this issue Jan 20, 2025 · 4 comments

Comments

@fraternl
Copy link

I noticed a line about replacing the Dutch letter ij into the letter i and j in this file.
In that file it only does this when a word doesn't exist, but I prefer to have it done always.

https://github.com/SubtitleEdit/subtitleedit/blob/main/Dictionaries/nld_OCRFixReplaceList.xml

I thought this would work, but it doesn't.
What am I doing wrong or is it just not working?

No examples are given for PartialWordsAlways, so I had to do some guessing.

<ReplaceList>
  <PartialWordsAlways>
    <WordPart from="ij" to="ij" />
  </PartialWordsAlways>
<ReplaceList>

<ReplaceList>
  <PartialWordsAlways>
    <WordPartAlways from="ij" to="ij" />
  </PartialWordsAlways>
<ReplaceList>

@GrampaWildWilly
Copy link

This is a pretty wild guess from somebody who is far from an expert in these things. Your closing tag looks like this:

<ReplaceList>

I think it should look like this:

</ReplaceList>

It looks to me like you're missing the closing slash in the closing tag. Maybe. Like I said, I don't know much about these things so I could be totally off base.

@niksedk
Copy link
Member

niksedk commented Jan 20, 2025

Please add some more details + attached subtitle.

@fraternl
Copy link
Author

fraternl commented Jan 20, 2025

Even though the Dutch were given a special character ij, no-one uses it.
It's not on any keyboard, nor is there an uppercase one for it.

We all use the 2 letters i an j.

Some software chokes on this special character, so I don't want to use it in my subtitles and prefer to have them replaced by i and j

It seems you've already implemented some code to get rid of it, but it seems it doesn't work.

se.srt.txt

The bottom group (PartialWords) is just a copy/paste from your code.
If it should do what I think, it doesn't work.

I think it should convert any ij to ij if the word doesn't exist in the dictionary.
That one doesn't work either.

I have an srt attached with this content:

1
00:00:01,000 --> 00:00:03,000
Deze zijn heeft 3 letters.

2
00:00:03,032 --> 00:00:05,032
Deze zijn heeft 4 letters.

3
00:00:05,064 --> 00:00:07,064
Dit is een onbestaand woord: sijzpoor.

The word "sijzpoor" doesn't exist.
Here it is written with a one-letter ij


<ReplaceList>
  <PartialWordsAlways>
    <WordPartAlways from="ij" to="ij" />
  </PartialWordsAlways>

  <PartialWords>
    <!-- Will be used to check words not in dictionary -->
    <!-- If new word(s) exists in spelling dictionary, it is (they are) accepted -->
    <WordPart from="ij" to="ij" />
  </PartialWords>
</ReplaceList>

@fraternl
Copy link
Author

fraternl commented Jan 20, 2025

This is a pretty wild guess from somebody who is far from an expert in these things. Your closing tag looks like this:

<ReplaceList>

I think it should look like this:

</ReplaceList>

It looks to me like you're missing the closing slash in the closing tag. Maybe. Like I said, I don't know much about these things so I could be totally off base.

You're right, but my file is much longer and has much more in it.

So when I copy/pasted the snippet from my file, I typed <ReplaceList> to complete it and forgot the slash.
In the actual file it was there.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants