feat: benchmark /translate skill — 200 keys regenerated across 9 languages by premiumjibles · Pull Request #12028 · shapeshift/web

premiumjibles · 2026-02-25T00:57:56Z

Description

Benchmark of the /translate skill pipeline quality. Removes 200 existing translated keys from all 9 non-English locales and regenerates them using the automated translate-review-refine pipeline. The diff shows original (human) translations vs. AI-generated translations for expert comparison.

Key Selection (200 keys, stratified sampling)

Category	Count	Criteria
Short strings	40	1-3 words, single UI labels
Multiple placeholders	50	2+ `%{variable}` placeholders
Tagged (HTML markup)	7	Contains `<span>`, `<strong>`, `<link>` etc. (only 7 exist)
Long/complex	30	15+ words, full sentences
Crypto domain	30	Staking, liquidity, vault, swap, slippage, yield, etc.
General (random)	43	Random selection for broad coverage

Keys distributed across 31 top-level namespaces for realistic coverage.

Pipeline Results

All 1800 translations (200 × 9 locales) passed validation:

Locale	Status	Translated
de (German)	Success	200
es (Spanish)	Success	200
fr (French)	Success	200
ja (Japanese)	Success	200
pt (Portuguese BR)	Success	200
ru (Russian)	Success	200
tr (Turkish)	Success	200
uk (Ukrainian)	Success	200
zh (Chinese Simplified)	Success	200

How to Review

Each locale's diff shows the removed original translations (red) vs. regenerated translations (green). Language experts should evaluate:

Accuracy — Does the translation convey the same meaning?
Naturalness — Does it sound native?
Terminology — Are crypto terms translated consistently?
Register — Is formal/informal address used correctly?
Placeholders/Tags — Are %{variables} and HTML tags preserved?

Ground truth (original translations) saved at /tmp/benchmark-ground-truth.json for automated comparison.

Issue (if applicable)

N/A — internal benchmark

Risk

Zero risk. This is a draft PR for evaluation only, not intended for merge. Translation-only changes, no code modifications.

No protocols, transactions, wallets or contracts affected.

Testing

Engineering

Verify all locale JSON files are valid: node .claude/skills/translate/scripts/validate-file.js {locale} for each of de, es, fr, ja, pt, ru, tr, uk, zh
Compare diff line counts — should be ~1235 insertions and ~1235 deletions (1:1 replacement)

Operations

🏁 My feature is behind a flag and doesn't require operations testing (yet)

This is a translation-only benchmark PR. No functional changes to test.

Screenshots (if applicable)

N/A

Refactor translation pipeline so each per-language sub-agent owns its full lifecycle (translate → validate → retry → review → refine → merge → verify) instead of the orchestrator managing all steps across 9 languages. Reduces orchestrator to a lightweight coordinator that spawns agents and reads status files. - Extract shared script-detection utilities into script-utils.js - Refactor validate.js to import from script-utils.js (no behavior change) - Add validate-file.js for post-merge full-file validation (JSON validity, key completeness, aggregate script ratio, regression detection) - Simplify merge.js: remove duplicate script-validation, add pre-merge backup for rollback support - Rewrite SKILL.md Steps 5-8 for self-contained language agent architecture Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Translate 11 missing English strings into de, es, fr, ja, pt, ru, tr, uk, zh using the new /translate Claude Code skill. Covers RFOX FAQ entries, action center failure messages, and yield cooldown notices. Also fixes merge.js to only add new keys by default, never overwriting existing translations. A --force flag is available for intentional re-translation of changed English strings. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

- Fix glossary key mismatch in compile-report.js (disambiguated keys didn't match actual glossary.json keys, silently skipping 4 checks) - Fix mixed Latin/Cyrillic in ru.md locale guide (vы → вы) - Fix fragile file-path detection in merge.js (use fs.existsSync instead of includes('/'), add missing-arg guard and JSON.parse try/catch) - Add try/catch in missing-keys.js for corrupt/missing locale files - Add French elision rule to fr.md: use "de" when numeric %{amount} buffers the symbol, use "en" when symbol placeholder is directly after the preposition (avoids runtime elision ambiguity) - Retranslate French yield/unstake strings applying the new rule: "déstaking de %{symbol}" → "déstake en %{symbol}" Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Ukrainian: в/у and з/із/зі preposition alternation rules for dynamic placeholders where runtime values are unknown at translation time. Turkish: vowel harmony rules for dynamic placeholders — prefer postpositions over direct suffixes on placeholders since crypto symbols span all vowel classes. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

- Add register examples to all 9 locale files (de, es, fr, ja, pt, ru, tr, uk, zh) with correct/incorrect pairs for non-pronoun register markers - Add register consistency as 6th reviewer focus in SKILL.md - Add "Multichain Snap" and "Snap" to glossary never-translate list - Fix 2 broken community translations across all 9 locales (stale multiChain.body, missing %{symbol} in getAssets.about) - Update compile-report.js to use stemMatch instead of raw .includes() for glossary metrics - Improve stemMatch with language-aware morphological matching (suffix stripping, Levenshtein distance, CJK character overlap) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Remove 200 existing translated keys from all 9 non-English locales and regenerate them using the /translate skill pipeline. This creates a diff where language experts can compare original vs. AI-generated translations for quality benchmarking. Key selection: 40 short, 50 multi-placeholder, 7 tagged, 30 long, 30 crypto-domain, 43 general — distributed across 31 namespaces. All 1800 translations (200 × 9 locales) passed validation with 0 rejections and 0 manual review items. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

coderabbitai · 2026-02-25T00:58:03Z

Important

Review skipped

Draft detected.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

🔍 Trigger review

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch benchmark/translate-200-keys

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

NeOMakinG

📋 Code Review — Translation Benchmark Skill

Status: Code reviewed, CI pending

Overview

This PR adds a comprehensive translation quality benchmark skill (/benchmark-translate) that:

Selects stratified test keys across 6 categories
Captures ground truth translations
Runs translations through the translate skill
Uses sub-agent LLM judges to rate quality
Produces regression reports comparing against baselines

Architecture

✅ Well-documented SKILL.md with clear pipeline steps
✅ Smart fixed/rotating key split for stable regression tracking
✅ 9-locale support with wave-based sub-agent parallelization
✅ Scripts in scripts/translations/benchmark/ properly gitignored

This is developer tooling, not a user-facing feature

No browser or transaction testing required.

Awaiting CI completion before final approval.

🤖 Reviewed by Claude Code

premiumjibles and others added 9 commits February 24, 2026 07:34

feat: implement translations skill

4c42cd3

small cleanup fixes

66d5758

hardening based on review

6e87928

Base automatically changed from agent-translations to develop February 25, 2026 22:29

firebomb1 mentioned this pull request Mar 2, 2026

feat: claude code skill for automated i18n translations #11985

Merged

1 task

NeOMakinG reviewed Mar 13, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: benchmark /translate skill — 200 keys regenerated across 9 languages#12028

feat: benchmark /translate skill — 200 keys regenerated across 9 languages#12028
premiumjibles wants to merge 9 commits intodevelopfrom
benchmark/translate-200-keys

premiumjibles commented Feb 25, 2026

Uh oh!

coderabbitai bot commented Feb 25, 2026

Review skipped

Uh oh!

NeOMakinG left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

premiumjibles commented Feb 25, 2026

Description

Key Selection (200 keys, stratified sampling)

Pipeline Results

How to Review

Issue (if applicable)

Risk

Testing

Engineering

Operations

Screenshots (if applicable)

Uh oh!

coderabbitai bot commented Feb 25, 2026

Review skipped

Uh oh!

NeOMakinG left a comment

Choose a reason for hiding this comment

📋 Code Review — Translation Benchmark Skill

Overview

Architecture

This is developer tooling, not a user-facing feature

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants