Commit 7265c7b

and

committed

rescore(benchmark): qwen3.5:35b full 45-triple rescore (claude-opus-4-7-rubric-v1)

Adds a 2nd observation for qwen3.5:35b, rescoring all 45 translations end-to-end with a fresh Opus 4.7 pass under the current rubric (no post-hoc -reranked suffix, pre-§5). Stored alongside the original (-reranked) at qwen3.5-35b_rescore.json. Aggregator will surface n_obs=2 with median. Notable score movements vs the original judge: - ko->en (unsu_jouen_nal): 6.7 -> 4.5 (caught currency hallucination jeon vs won and direction omission "to" Dongkwang School) - en->es (emerson_self_reliance): 6.7 -> 8.5 (Spanish "traicionar" carries the literary "reveal" sense; was wrongly penalized as contresens by mirroring the Chinese rendering's actual reversal) - en->zh-Hans (pride_prejudice): 8.0 -> 7.0 (Austen's iconic opening reordered, ironic "must" weakened to 总想) - en->fr (dorian_gray): scored 6.0 (cumulative lexical errors: odorat for odeur, délié for délicat, l'épines agreement break, laburnum untranslated, studio -> pièce) Global avg overall: 7.70 -> 7.59 (-0.11). 18/45 moved by >=0.5. §5 rerank skipped per session decision. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

1 parent f25e6a4 commit 7265c7bCopy full SHA for 7265c7b

1 file changed

benchmark/data/submissions
- 2026-05-10_hydropix_qwen3.5-35b_rescore.json

Comments

(0)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Commit 7265c7b

File tree

0 commit comments