-
Notifications
You must be signed in to change notification settings - Fork 0
feat: evaluate translations #234
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
Adds a new translation evaluation feature and updates import paths to the new translation submodule.
- Updates test imports to use
ml_filter.translation.translate. - Introduces
translation_evaluation.pywith functions to load references, prepare inputs, and compute COMET scores. - Extends the CLI (
__main__.py) with anevaluate_translationscommand.
Reviewed Changes
Copilot reviewed 4 out of 5 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
| tests/test_translate.py | Updated import path for Translator. |
| tests/conftest.py | Updated import paths for translation clients and Translator. |
| src/ml_filter/translation/translation_evaluation.py | Added new module for reference loading, input prep, and COMET evaluation. |
| src/ml_filter/main.py | Imported evaluate_translations and added a CLI subcommand for translation evaluation. |
Comments suppressed due to low confidence (1)
src/ml_filter/translation/translation_evaluation.py:33
- The docstring for
_prepare_translation_inputlists alangparameter that isn’t in the signature. Please update the docstring to match the actual parameters.
lang: Language code.
src/ml_filter/__main__.py
Outdated
| @click.option("--gold-path", required=True, help="Path to gold reference JSONL file") | ||
| @click.option("--model-name", default="Unbabel/wmt22-cometkiwi-da", help="COMET model to use") | ||
| @click.option("--languages", type=str, required=True, help="Comma-separated list of supported language codes") | ||
| @click.option("--batch-size", help="Batch size for processing translations") |
Copilot
AI
Jun 28, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CLI option --batch-size has no type specified; it will be parsed as a string. Consider adding type=int to ensure batch_size is passed as an integer.
| @click.option("--batch-size", help="Batch size for processing translations") | |
| @click.option("--batch-size", type=int, help="Batch size for processing translations") |
| for filename in os.listdir(data_dir): | ||
| if filename.endswith(".jsonl"): | ||
| file_path = os.path.join(data_dir, filename) | ||
| lang = filename.split("_")[5] |
Copilot
AI
Jun 28, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Splitting by underscore and accessing index 5 is brittle and may raise IndexError for unexpected filenames. Add a check on segment length or use a more robust parsing method.
le1nux
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Functionalitywise I think it is all correct. There a few stylistic issues.
Did not get to review the plotting functions themselves, yet.
| batch_size: int, | ||
| model_name: str = "Unbabel/wmt22-cometkiwi-da", | ||
| ) -> None: | ||
| """Evaluate translation quality for a set of files using a COMET model. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe explain how we actually want to evaluate it? Like 2-3 sentences about pitching the idea.
| target_texts = [] | ||
| with open(file_path, "r") as f: | ||
| for line_num, line in enumerate(f, 1): | ||
| if not line: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
line is never a boolean, but it treated here like one. It also works with "None" but in my opinion it is bad style.
| if not line: | |
| if line is None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we maybe even raise an exception here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
line will never be None since it is a string.
The current solution checks for empty strings.
| languages: str, | ||
| batch_size: int, | ||
| ): | ||
| """CLI entry point for evaluating translation quality.""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we should document that the files in data_dir need to folllow a certain convention
|
|
||
| target_texts = _prepare_translation_input(file_path, gold_dict) | ||
|
|
||
| if target_texts: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| if target_texts: | |
| if len(target_texts) > 0: |
| except json.JSONDecodeError as e: | ||
| logging.warning(f"Skipping invalid line {line_num} in {file_path}: {e}") | ||
| continue | ||
| return target_texts |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we should add a warning if len(target_texts) != len(gold_dict)
If some languages misses certain translations it introduces biases.
|
|
||
| # Step 3: Count translation scores | ||
| for sample_id, trans_score in id_to_translation_score.items(): | ||
| if sample_id in id_to_gt_quality_score: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what if this is not the case? Should we issue a warning?
| if sample_id in id_to_gt_quality_score: | ||
| gt_score = float(id_to_gt_quality_score[sample_id]) | ||
| trans_score = trans_score.lower() | ||
| if trans_score in score_classes: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also what about the else case?
| if filename.endswith(".json"): | ||
| parts = filename.split("_") | ||
| if len(parts) != 8: | ||
| continue |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no warning?
| languages: list[str], | ||
| output_dir: Path, | ||
| ) -> None: | ||
| """Plot histograms for translation quality results. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we should also mention the convention of the file names in data_dir here
| output_path=output_path, | ||
| ) | ||
|
|
||
| output_path = os.path.join(output_dir, f"{lang}_translation_quality_vs_gt_histogram.png") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we want this for the paper, I would always plot as PDF
No description provided.