Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
11 changes: 4 additions & 7 deletions .beads/pr-context.jsonl
Original file line number Diff line number Diff line change
@@ -1,7 +1,4 @@
{"id":"shapeshiftweb-33d","title":"bugfix pass: wrapped assets, popular assets, chain icons for second-class EVM chains","description":"Sequential bugfix pass from Mantle to Mode. For each PR: checkout, fix issues, merge previous, build/regen/lint/typecheck, commit and push. NEVER force push. NEVER merge PRs. gh api read-only (PR body + comments only). Key fixes: (1) Generalize wrapped native asset detection from Berachain to all chains via chainId-\u003ewrappedNativeAddress mapping, (2) Fix popular assets availability (Cronos only has 2, Linea too), (3) Fix Linea perma-loading asset icon, (4) Sanity check brand chain icons for each chain, (5) Update second-class-evm-chain contract with learnings, (6) Create append-only check skill to prevent closing brace issues in future PRs.","status":"open","priority":2,"issue_type":"epic","owner":"[email protected]","created_at":"2026-02-20T15:05:00Z","created_by":"gomes","updated_at":"2026-02-20T15:05:00Z"}
{"id":"shapeshiftweb-33d.1","title":"Mantle (#11905): generalize wrapped native detection, fix popular assets, fix Linea icon, update contract, create append-only skill","description":"gh pr checkout 11905. (1) Generalize Berachain's WBERA burn detection to all second-class chains via WRAPPED_NATIVE_BY_CHAIN_ID mapping in SecondClassEvmAdapter.ts - add WMNT address for Mantle. (2) Investigate + fix popular assets issue (compare against happy chains like Scroll/Ink). (3) Fix Linea perma-loading asset icon. (4) Update .claude/contracts/second-class-evm-chain.md with wrapped native + popular assets learnings. (5) Create append-only check skill to prevent closing brace issues. (6) Sanity check Mantle chain icon. Build/regen/lint/typecheck/commit/push. NEVER force push. NEVER merge PR.","status":"in-progress","priority":2,"issue_type":"task","owner":"[email protected]","created_at":"2026-02-20T15:05:20Z","created_by":"gomes","updated_at":"2026-02-20T15:05:51Z","dependencies":[{"issue_id":"shapeshiftweb-33d.1","depends_on_id":"shapeshiftweb-33d","type":"parent-child","created_at":"2026-02-20T16:05:20Z","created_by":"gomes","metadata":"{}"}]}
{"id":"shapeshiftweb-33d.2","title":"Cronos (#11910): ensure wrapped asset fix for WCRO, fix popular assets","description":"gh pr checkout 11910. Merge Mantle branch. Ensure WCRO address in WRAPPED_NATIVE_BY_CHAIN_ID. Fix popular assets (only 2 currently). Sanity check chain icon. Build/regen/lint/typecheck/commit/push. NEVER force push. NEVER merge PR.","status":"open","priority":2,"issue_type":"task","owner":"[email protected]","created_at":"2026-02-20T15:05:20Z","created_by":"gomes","updated_at":"2026-02-20T15:05:20Z","dependencies":[{"issue_id":"shapeshiftweb-33d.2","depends_on_id":"shapeshiftweb-33d","type":"parent-child","created_at":"2026-02-20T16:05:20Z","created_by":"gomes","metadata":"{}"},{"issue_id":"shapeshiftweb-33d.2","depends_on_id":"shapeshiftweb-33d.1","type":"blocks","created_at":"2026-02-20T16:05:20Z","created_by":"gomes","metadata":"{}"}]}
{"id":"shapeshiftweb-33d.3","title":"Sonic (#11923): merge Cronos, ensure no wrapped/popular bugs, sanity check icon","description":"gh pr checkout 11923. Merge Cronos branch. Check if Sonic has wrapped native pattern (likely WSONIC). Sanity check popular assets + chain icon. Build/regen/lint/typecheck/commit/push. NEVER force push. NEVER merge PR.","status":"open","priority":2,"issue_type":"task","owner":"[email protected]","created_at":"2026-02-20T15:05:21Z","created_by":"gomes","updated_at":"2026-02-20T15:05:21Z","dependencies":[{"issue_id":"shapeshiftweb-33d.3","depends_on_id":"shapeshiftweb-33d","type":"parent-child","created_at":"2026-02-20T16:05:20Z","created_by":"gomes","metadata":"{}"},{"issue_id":"shapeshiftweb-33d.3","depends_on_id":"shapeshiftweb-33d.2","type":"blocks","created_at":"2026-02-20T16:05:20Z","created_by":"gomes","metadata":"{}"}]}
{"id":"shapeshiftweb-33d.4","title":"Unichain (#11924): merge Sonic, ensure no wrapped/popular bugs, sanity check icon","description":"gh pr checkout 11924. Merge Sonic branch. Check wrapped native pattern (WETH on Unichain). Sanity check popular assets + chain icon. Build/regen/lint/typecheck/commit/push. NEVER force push. NEVER merge PR.","status":"open","priority":2,"issue_type":"task","owner":"[email protected]","created_at":"2026-02-20T15:05:21Z","created_by":"gomes","updated_at":"2026-02-20T15:05:21Z","dependencies":[{"issue_id":"shapeshiftweb-33d.4","depends_on_id":"shapeshiftweb-33d","type":"parent-child","created_at":"2026-02-20T16:05:20Z","created_by":"gomes","metadata":"{}"},{"issue_id":"shapeshiftweb-33d.4","depends_on_id":"shapeshiftweb-33d.3","type":"blocks","created_at":"2026-02-20T16:05:20Z","created_by":"gomes","metadata":"{}"}]}
{"id":"shapeshiftweb-33d.5","title":"BOB (#11925): merge Unichain, ensure no wrapped/popular bugs, sanity check icon","description":"gh pr checkout 11925. Merge Unichain branch. Check wrapped native pattern (WETH on BOB). Sanity check popular assets + chain icon. Build/regen/lint/typecheck/commit/push. NEVER force push. NEVER merge PR.","status":"open","priority":2,"issue_type":"task","owner":"[email protected]","created_at":"2026-02-20T15:05:21Z","created_by":"gomes","updated_at":"2026-02-20T15:05:21Z","dependencies":[{"issue_id":"shapeshiftweb-33d.5","depends_on_id":"shapeshiftweb-33d","type":"parent-child","created_at":"2026-02-20T16:05:21Z","created_by":"gomes","metadata":"{}"},{"issue_id":"shapeshiftweb-33d.5","depends_on_id":"shapeshiftweb-33d.4","type":"blocks","created_at":"2026-02-20T16:05:21Z","created_by":"gomes","metadata":"{}"}]}
{"id":"shapeshiftweb-33d.6","title":"Mode (#11926): merge BOB, ensure no wrapped/popular bugs, sanity check icon","description":"gh pr checkout 11926. Merge BOB branch. Check wrapped native pattern (WETH on Mode). Sanity check popular assets + chain icon. Build/regen/lint/typecheck/commit/push. NEVER force push. NEVER merge PR.","status":"open","priority":2,"issue_type":"task","owner":"[email protected]","created_at":"2026-02-20T15:05:21Z","created_by":"gomes","updated_at":"2026-02-20T15:05:21Z","dependencies":[{"issue_id":"shapeshiftweb-33d.6","depends_on_id":"shapeshiftweb-33d","type":"parent-child","created_at":"2026-02-20T16:05:21Z","created_by":"gomes","metadata":"{}"},{"issue_id":"shapeshiftweb-33d.6","depends_on_id":"shapeshiftweb-33d.5","type":"blocks","created_at":"2026-02-20T16:05:21Z","created_by":"gomes","metadata":"{}"}]}
{"id":"shapeshiftWeb-2f09","title":"Sanity check Ink + Scroll regen data","description":"Verify generatedAssetData.json has entries for both eip155:534352 (Scroll) and eip155:57073 (Ink). Verify relatedAssetIndex.json has inkAssetId in ETH related array. Verify no regressions. Run review-second-class-evm skill.","status":"closed","priority":1,"issue_type":"task","owner":"[email protected]","created_at":"2026-02-19T13:51:24.013329+01:00","created_by":"gomes-bot","updated_at":"2026-02-19T17:01:37.198079+01:00","closed_at":"2026-02-19T17:01:37.198079+01:00","close_reason":"Popular assets + market data verified working after cache clear. All ink fixes merged, PR #11960 opened.","dependencies":[{"issue_id":"shapeshiftWeb-2f09","depends_on_id":"shapeshiftWeb-cgtg","type":"blocks","created_at":"2026-02-19T13:51:48.716437+01:00","created_by":"gomes-bot"}]}
{"id":"shapeshiftWeb-4uq9","title":"Checkout + merge-fix Ink PR #11904","description":"Checkout Ink PR, extract regen data before merge, merge origin/develop with -X theirs to resolve all conflicts in favor of develop. Result: branch has Ink code changes but develop's generated files.","status":"closed","priority":1,"issue_type":"task","owner":"[email protected]","created_at":"2026-02-19T13:50:52.705351+01:00","created_by":"gomes-bot","updated_at":"2026-02-19T13:53:13.843624+01:00","closed_at":"2026-02-19T13:53:13.843624+01:00","close_reason":"Merged origin/develop with -X theirs, all conflicts resolved"}
{"id":"shapeshiftWeb-cgtg","title":"Cherry-pick Ink regen data into develop generated files","description":"Extract Ink (eip155:57073) entries from saved PR generated files. Merge into develop's generatedAssetData.json, relatedAssetIndex.json. Create coingecko adapter. Bump clearAssets migration. Regenerate manifest hashes + brotli/gzip compression.","status":"closed","priority":1,"issue_type":"task","owner":"[email protected]","created_at":"2026-02-19T13:51:03.136273+01:00","created_by":"gomes-bot","updated_at":"2026-02-19T13:56:30.878339+01:00","closed_at":"2026-02-19T13:56:30.878339+01:00","close_reason":"Added coingecko adapter, index.ts import/export, migration bump 293. User will run yarn generate:asset-data for actual regen.","dependencies":[{"issue_id":"shapeshiftWeb-cgtg","depends_on_id":"shapeshiftWeb-4uq9","type":"blocks","created_at":"2026-02-19T13:51:38.351227+01:00","created_by":"gomes-bot"}]}
{"id":"shapeshiftWeb-l6zn","title":"Add Ink native to ETH related asset index + recompress","status":"closed","priority":1,"issue_type":"bug","owner":"[email protected]","created_at":"2026-02-19T16:09:45.315585+01:00","created_by":"gomes-bot","updated_at":"2026-02-19T16:12:14.214097+01:00","closed_at":"2026-02-19T16:12:14.214097+01:00","close_reason":"Closed"}
140 changes: 140 additions & 0 deletions .claude/skills/benchmark-translate/SKILL.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,140 @@
---
name: benchmark-translate
description: Run a quality benchmark of the /translate skill by selecting stratified test keys, capturing ground truth, translating, judging with sub-agents, and compiling a regression report. Invoke with /benchmark-translate.
allowed-tools: Read, Write, Edit, Grep, Glob, Bash(node *), Bash(git checkout*), Bash(git diff*), Bash(git status*), Bash(git rev-parse*), Task, Skill, AskUserQuestion
---

# Translation Quality Benchmark

Measures the quality of the `/translate` skill by comparing its output against existing human translations. Uses stratified key selection with a fixed/rotating split, LLM judges, and programmatic validation to produce a comprehensive quality report with regression tracking across all 9 supported locales.

## Data Artifacts

All benchmark data lives in `scripts/translations/benchmark/` (gitignored):

| File | Purpose |
|------|---------|
| `testKeys.json` | Selected test keys with categories and `fixed` flag |
| `coreKeys.json` | Persistent core key set (stable across runs) |
| `ground-truth.json` | Captured human translations before removal |
| `report.json` | Latest benchmark report (becomes baseline on next run) |
| `baseline.json` | Previous report (auto-copied by setup.js) |

## Pipeline (7 Steps)

### Step 1: Select Keys

```bash
node .claude/skills/benchmark-translate/scripts/select-keys.js [--count N] [--core N]
```

Selects N keys (default 150) stratified across 6 categories: glossary-term, financial-error, single-word, interpolation, defi-jargon, general. Validates all selected keys exist in en + all 9 locales.

**Fixed/rotating split:**
- `--core N` (default 100): Number of fixed core keys for stable regression tracking
- `--count N` (default 150): Total keys (core + rotating)
- If `coreKeys.json` exists: loads it, validates keys still exist in all locales, tops up if needed
- If `coreKeys.json` doesn't exist: selects core keys via stratified sampling and saves them
- Remaining keys (default 50) are randomly selected as rotating keys from the non-core pool
- Each entry in `testKeys.json` has `"fixed": true` (core) or `"fixed": false` (rotating)

Outputs `scripts/translations/benchmark/testKeys.json`.

### Step 2: Setup

```bash
node .claude/skills/benchmark-translate/scripts/setup.js
```

- If `report.json` exists from a previous run, copies it to `baseline.json`
- Reads `testKeys.json`, captures ground truth translations for all 9 locales
- Writes `ground-truth.json`
- Removes test keys from locale files so `/translate` can regenerate them

### Step 3: Translate

Invoke the `/translate` skill using the Skill tool. This regenerates the removed keys through the full translate-review-refine pipeline.

### Step 4: Judge (Sub-Agents)

Launch **9 sub-agents in 3 waves of 3** (matching `/translate`'s wave structure) using the Task tool. Each sub-agent receives the locale info, all key triplets, and glossary terms.

**Wave 1:** de, es, fr
**Wave 2:** pt, ru, tr
**Wave 3:** ja, uk, zh

**For each locale, use this prompt:**

```
You are an expert multilingual localization quality assessor for a cryptocurrency/DeFi application.
Rate translations from English into {LANGUAGE_NAME} on a 1-5 scale.

1 = Wrong/misleading meaning
2 = Significant issues (wrong register, missing nuance)
3 = Acceptable but could be more natural
4 = Good, natural, accurate
5 = Excellent, indistinguishable from professional native translation

Check: meaning preservation, naturalness, register ({REGISTER}), UI conciseness,
glossary compliance (these stay English: {NEVER_TRANSLATE_TERMS}),
placeholder integrity (%{...} preserved), DeFi terminology conventions.

Rate each translation INDEPENDENTLY. Community translations can contain errors.

Input: JSON array of {key, english, human, skill}
{ITEMS_JSON}

Output: Return ONLY a JSON array of objects with these exact fields:
{key, humanScore, skillScore, humanJustification, skillJustification, preferenceNote}

Scores must be integers 1-5. Justifications should be 1-2 sentences. preferenceNote should say which is better and why, or "tie" if equal.
```

**Locale info for prompt substitution:**

| Locale | Language | Register |
|--------|----------|----------|
| `de` | German | Formal (Sie) |
| `es` | Spanish | Informal (tú) |
| `fr` | French | Formal (vous) |
| `ja` | Japanese | Polite (です/ます) |
| `pt` | Portuguese | Informal (você) |
| `ru` | Russian | Formal (вы) |
| `tr` | Turkish | Formal (siz) |
| `uk` | Ukrainian | Formal (ви) |
| `zh` | Chinese (Simplified) | Neutral/formal |

**Building the items array for each locale:**

1. Read `scripts/translations/benchmark/ground-truth.json`
2. Read the current (post-translate) `src/assets/translations/{locale}/main.json`
3. For each test key, build: `{ key: dottedPath, english: groundTruth.english[key], human: groundTruth.groundTruth[locale][key], skill: getValueFromLocaleFile(key) }`

**Getting never-translate terms:** Read `src/assets/translations/glossary.json`, collect all keys where value is `null` (excluding `_meta`).

**Each sub-agent must write its output** to `/tmp/{locale}-judge-scores.json`. Parse the JSON array from the sub-agent's response and write it to that path.

### Step 5: Compile Report

```bash
node .claude/skills/benchmark-translate/scripts/compile-report.js
```

Loads judge scores from `/tmp/{locale}-judge-scores.json`, runs programmatic validation (including Cyrillic script check for ru/uk), computes summary stats, and writes `scripts/translations/benchmark/report.json`. If `baseline.json` exists, includes regression deltas. Report includes `coreSummary` and `rotatingSummary` alongside the overall `summary`.

### Step 6: Restore

```bash
node .claude/skills/benchmark-translate/scripts/restore.js
```

Restores locale files via `git checkout --`, verifies no diff remains.

### Step 7: Present Results

Read the compile output (printed to stdout) and present to the user:
- Overall score summary with baseline regression (if available)
- Core vs rotating stats (divergence suggests overfitting)
- Notable improvements/regressions
- Per-locale and per-category highlights
- Any items needing attention (low scores, validation failures, glossary issues)
Loading