Skip to content

A relevant paper #1

@stan-sony

Description

@stan-sony

Hi, really enjoyed digging into this!

We’ve investigated a related angle and thought you might find it handy: Remedy: Learning Machine Translation Evaluation from Human Preferences with Reward Modeling. In short, we train a reward model on human pair-wise prefs, then use it as an MT metric—ends up out-scoring XCOMET, MetricX, MT-Ranker, etc. on WMT22-24.

We found that pairwise training yields better and more robust performance than regression-based models, and more importantly, compared to ranking approaches like MT-Ranker, the inference is not quadratic (since MT-Ranker can only make pairwise rankings). It would be awesome if you could cite our work—and if you want to discuss more, I'm happy to grab coffee and geek out in Tokyo!

Cheers

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions