-
Notifications
You must be signed in to change notification settings - Fork 4
Open
Description
Hi, really enjoyed digging into this!
We’ve investigated a related angle and thought you might find it handy: Remedy: Learning Machine Translation Evaluation from Human Preferences with Reward Modeling. In short, we train a reward model on human pair-wise prefs, then use it as an MT metric—ends up out-scoring XCOMET, MetricX, MT-Ranker, etc. on WMT22-24.
We found that pairwise training yields better and more robust performance than regression-based models, and more importantly, compared to ranking approaches like MT-Ranker, the inference is not quadratic (since MT-Ranker can only make pairwise rankings). It would be awesome if you could cite our work—and if you want to discuss more, I'm happy to grab coffee and geek out in Tokyo!
Cheers
Metadata
Metadata
Assignees
Labels
No labels