This project provides language-specific mapping tables to standardize and normalize typographic and special characters in AI-generated or processed texts.
Mappings cover characters that are regionally uncommon, discouraged, or problematic for further text processing.
- Many LLMs and online AIs output Unicode characters that are not desirable or conventional for end users in specific countries (e.g. curly quotes, em dashes, narrow spaces, ß, etc).
- This repo defines clear rules to convert these to standard, widely-supported, or country-preferred equivalents.
- Main use case: LLM or AI text pre/post-processing in privacy-focused, in-browser, or server-side tools.
Import the JSON mapping for your target language/region and apply the character replacements to your text pipeline.
Usage Example (pseudo code):
for char, replacement in mapping.items():
text = text.replace(char, replacement)
- Swiss German (
swiss-german.json
) - German (
german.json
) - French (
french.json
) - Italian (
italian.json
) - English (International) (
english-international.json
) - English (US) (
english-us.json
)
Feel free to open issues or PRs for new mappings, edge cases, or country-specific improvements!