Skip to content

feat(arabic): add tashkeel support and shadda+vowel ligatures#343

Merged
oneplus1000 merged 2 commits intosignintech:masterfrom
AmmrFX:master
Feb 8, 2026
Merged

feat(arabic): add tashkeel support and shadda+vowel ligatures#343
oneplus1000 merged 2 commits intosignintech:masterfrom
AmmrFX:master

Conversation

@AmmrFX
Copy link
Copy Markdown
Contributor

@AmmrFX AmmrFX commented Feb 3, 2026

Summary

This PR enhances Arabic text support with proper handling of tashkeel (diacritical marks) and automatic ligature generation for better PDF rendering.

New Features

  • Tashkeel Constants: FATHA, DAMMA, KASRA, SUKUN, SHADDA, TANWEEN_*, and Quranic marks (SUPERSCRIPT_ALEF, MADDAH_ABOVE, etc.)

  • Shadda+Vowel Ligatures: Automatic combination using Arabic Presentation Forms-B (U+FC5E-U+FC63)

  • Allah Ligature: Automatic conversion of the word "الله" (Alef + Lam + Lam + Heh) to the special Allah ligature character U+FDF2 (ﷲ). This ensures the word is rendered correctly as a single beautiful ligature in PDF output, which is the proper typographic representation used in Arabic typography.

  • IsTashkeel(r rune) bool: Public function to detect Arabic diacritical marks

  • GetShaddaLigature(vowel rune) rune: Get combined shadda+vowel form

Improvements

  • Tashkeel-aware text reversal keeps diacritics attached to base characters
  • Character shaping now skips tashkeel when determining letter position (initial/medial/final)
  • Lam-Alef ligature preserves tashkeel between the letters

Test plan

  • All existing tests pass
  • New tests for IsTashkeel(), GetShaddaLigature(), and tashkeel handling
  • Arabic example updated with Surah Al-Fatiha

- Add tashkeel (diacritical marks) constants: FATHA, DAMMA, KASRA, SUKUN, SHADDA, TANWEEN_*, and Quranic marks
- Add Shadda+Vowel ligature constants (Arabic Presentation Forms-B)
- Add GetShaddaLigature() function for vowel-to-ligature lookup
- Add IsTashkeel() function to detect Arabic diacritical marks
- Add Allah ligature (U+FDF2) automatic conversion
- Implement reverseWithTashkeel() to keep tashkeel attached during RTL reversal
- Update ToArabic() to handle tashkeel-aware character shaping
- Add comprehensive tests for tashkeel handling and ligatures
- Update Arabic example with Surah Al-Fatiha and Adobe-Arabic font
@AmmrFX
Copy link
Copy Markdown
Contributor Author

AmmrFX commented Feb 5, 2026

@oneplus1000 please review

@oneplus1000
Copy link
Copy Markdown
Collaborator

Okay, I'll do it during the weekend. :-)

@oneplus1000
Copy link
Copy Markdown
Collaborator

Everything looks good overall. My only concern is that the Adobe TTF font should be replaced with an open-source font to avoid licensing issues.

@AmmrFX
Copy link
Copy Markdown
Contributor Author

AmmrFX commented Feb 7, 2026

done, I replaced it with Amiri-regular.

@oneplus1000 oneplus1000 merged commit 2dcf2ba into signintech:master Feb 8, 2026
2 checks passed
@oneplus1000
Copy link
Copy Markdown
Collaborator

thank you

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants