feat: benchmark /translate skill — 200 keys regenerated across 9 languages#12028
feat: benchmark /translate skill — 200 keys regenerated across 9 languages#12028premiumjibles wants to merge 9 commits intodevelopfrom
Conversation
Refactor translation pipeline so each per-language sub-agent owns its full lifecycle (translate → validate → retry → review → refine → merge → verify) instead of the orchestrator managing all steps across 9 languages. Reduces orchestrator to a lightweight coordinator that spawns agents and reads status files. - Extract shared script-detection utilities into script-utils.js - Refactor validate.js to import from script-utils.js (no behavior change) - Add validate-file.js for post-merge full-file validation (JSON validity, key completeness, aggregate script ratio, regression detection) - Simplify merge.js: remove duplicate script-validation, add pre-merge backup for rollback support - Rewrite SKILL.md Steps 5-8 for self-contained language agent architecture Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Translate 11 missing English strings into de, es, fr, ja, pt, ru, tr, uk, zh using the new /translate Claude Code skill. Covers RFOX FAQ entries, action center failure messages, and yield cooldown notices. Also fixes merge.js to only add new keys by default, never overwriting existing translations. A --force flag is available for intentional re-translation of changed English strings. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Fix glossary key mismatch in compile-report.js (disambiguated keys
didn't match actual glossary.json keys, silently skipping 4 checks)
- Fix mixed Latin/Cyrillic in ru.md locale guide (vы → вы)
- Fix fragile file-path detection in merge.js (use fs.existsSync instead
of includes('/'), add missing-arg guard and JSON.parse try/catch)
- Add try/catch in missing-keys.js for corrupt/missing locale files
- Add French elision rule to fr.md: use "de" when numeric %{amount}
buffers the symbol, use "en" when symbol placeholder is directly
after the preposition (avoids runtime elision ambiguity)
- Retranslate French yield/unstake strings applying the new rule:
"déstaking de %{symbol}" → "déstake en %{symbol}"
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Ukrainian: в/у and з/із/зі preposition alternation rules for dynamic placeholders where runtime values are unknown at translation time. Turkish: vowel harmony rules for dynamic placeholders — prefer postpositions over direct suffixes on placeholders since crypto symbols span all vowel classes. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Add register examples to all 9 locale files (de, es, fr, ja, pt, ru, tr, uk, zh) with correct/incorrect pairs for non-pronoun register markers
- Add register consistency as 6th reviewer focus in SKILL.md
- Add "Multichain Snap" and "Snap" to glossary never-translate list
- Fix 2 broken community translations across all 9 locales (stale multiChain.body, missing %{symbol} in getAssets.about)
- Update compile-report.js to use stemMatch instead of raw .includes() for glossary metrics
- Improve stemMatch with language-aware morphological matching (suffix stripping, Levenshtein distance, CJK character overlap)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Remove 200 existing translated keys from all 9 non-English locales and regenerate them using the /translate skill pipeline. This creates a diff where language experts can compare original vs. AI-generated translations for quality benchmarking. Key selection: 40 short, 50 multi-placeholder, 7 tagged, 30 long, 30 crypto-domain, 43 general — distributed across 31 namespaces. All 1800 translations (200 × 9 locales) passed validation with 0 rejections and 0 manual review items. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
Important Review skippedDraft detected. Please check the settings in the CodeRabbit UI or the You can disable this status message by setting the Use the checkbox below for a quick retry:
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
NeOMakinG
left a comment
There was a problem hiding this comment.
📋 Code Review — Translation Benchmark Skill
Status: Code reviewed, CI pending
Overview
This PR adds a comprehensive translation quality benchmark skill (/benchmark-translate) that:
- Selects stratified test keys across 6 categories
- Captures ground truth translations
- Runs translations through the translate skill
- Uses sub-agent LLM judges to rate quality
- Produces regression reports comparing against baselines
Architecture
- ✅ Well-documented SKILL.md with clear pipeline steps
- ✅ Smart fixed/rotating key split for stable regression tracking
- ✅ 9-locale support with wave-based sub-agent parallelization
- ✅ Scripts in
scripts/translations/benchmark/properly gitignored
This is developer tooling, not a user-facing feature
No browser or transaction testing required.
Awaiting CI completion before final approval.
🤖 Reviewed by Claude Code
Description
Benchmark of the
/translateskill pipeline quality. Removes 200 existing translated keys from all 9 non-English locales and regenerates them using the automated translate-review-refine pipeline. The diff shows original (human) translations vs. AI-generated translations for expert comparison.Key Selection (200 keys, stratified sampling)
%{variable}placeholders<span>,<strong>,<link>etc. (only 7 exist)Keys distributed across 31 top-level namespaces for realistic coverage.
Pipeline Results
All 1800 translations (200 × 9 locales) passed validation:
How to Review
Each locale's diff shows the removed original translations (red) vs. regenerated translations (green). Language experts should evaluate:
%{variables}and HTML tags preserved?Ground truth (original translations) saved at
/tmp/benchmark-ground-truth.jsonfor automated comparison.Issue (if applicable)
N/A — internal benchmark
Risk
Zero risk. This is a draft PR for evaluation only, not intended for merge. Translation-only changes, no code modifications.
Testing
Engineering
node .claude/skills/translate/scripts/validate-file.js {locale}for each of de, es, fr, ja, pt, ru, tr, uk, zhOperations
This is a translation-only benchmark PR. No functional changes to test.
Screenshots (if applicable)
N/A