Language Typography & Character Mapping for AI Text Cleanup

This project provides language-specific mapping tables to standardize and normalize typographic and special characters in AI-generated or processed texts.
Mappings cover characters that are regionally uncommon, discouraged, or problematic for further text processing.

Why?

Many LLMs and online AIs output Unicode characters that are not desirable or conventional for end users in specific countries (e.g. curly quotes, em dashes, narrow spaces, ß, etc).
This repo defines clear rules to convert these to standard, widely-supported, or country-preferred equivalents.
Main use case: LLM or AI text pre/post-processing in privacy-focused, in-browser, or server-side tools.

Usage

Import the JSON mapping for your target language/region and apply the character replacements to your text pipeline.

Usage Example (pseudo code):

for char, replacement in mapping.items():
    text = text.replace(char, replacement)

Supported Languages

Swiss German (swiss-german.json)
German (german.json)
French (french.json)
Italian (italian.json)
English (International) (english-international.json)
English (US) (english-us.json)

Contributing

Feel free to open issues or PRs for new mappings, edge cases, or country-specific improvements!

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
mappings		mappings
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Language Typography & Character Mapping for AI Text Cleanup

Why?

Usage

Supported Languages

Contributing

About

Uh oh!

Releases

Packages

License

patrickdobler/llm-text-normalizer-mappings

Folders and files

Latest commit

History

Repository files navigation

Language Typography & Character Mapping for AI Text Cleanup

Why?

Usage

Supported Languages

Contributing

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Packages