feat: implement multiple file support in Dataset#81
Merged
Abdeali099 merged 4 commits intotest-suitefrom Mar 30, 2026
Merged
Conversation
Abdeali099
commented
Mar 30, 2026
|
|
||
| self.log.file_content = content | ||
| return content | ||
| self.log.file_content = combined |
Member
Author
There was a problem hiding this comment.
Can add here something like, Order can be made of multiple files.
make separator constant
Confidence Score: 4/5Safe to merge after restoring per-row file type validation; all other changes are well-structured. One P1 finding: the SUPPORTED_FILE_TYPES guard was removed without replacement, meaning unsupported file types silently pass validation and only fail during background job execution. All other findings are P2 style/UX suggestions. Restoring the validation check in validate_files would bring this to 5/5. parser_benchmark_dataset.py — the validate_files method needs a per-row file type check restored.
|
| Filename | Overview |
|---|---|
| transaction_parser/parser_benchmark/doctype/parser_benchmark_dataset/parser_benchmark_dataset.py | Core model refactored from single-file to child-table; file type validation was removed without replacement, creating a regression where unsupported file types pass validation and only fail at background job execution time. |
| transaction_parser/parser_benchmark/doctype/parser_benchmark_dataset/parser_benchmark_dataset.json | DocType fields migrated from single Attach+Data to a child Table; the depends_on visibility guard for the PDF Processor section was dropped, making it permanently visible regardless of uploaded file types. |
| transaction_parser/parser_benchmark/runner.py | Runner updated to iterate over multiple file docs and join content with a separator; logic is clean and the empty-list guard in _get_file_docs prevents IndexError. |
| transaction_parser/patches/populate_dataset_files_table.py | Migration patch correctly reads previously-attached File docs and inserts them as child rows; uses db_insert() which is appropriate for a data migration patch. |
| transaction_parser/patches/remove_dataset_file_field.py | Pre-model-sync patch ensures File docs are properly linked before the old file column is dropped; handles orphan and missing File docs gracefully. |
Reviews (1): Last reviewed commit: "refactor: enhance Parser Benchmark Datas..." | Re-trigger Greptile
...saction_parser/parser_benchmark/doctype/parser_benchmark_dataset/parser_benchmark_dataset.py
Show resolved
Hide resolved
...ction_parser/parser_benchmark/doctype/parser_benchmark_dataset/parser_benchmark_dataset.json
Show resolved
Hide resolved
...saction_parser/parser_benchmark/doctype/parser_benchmark_dataset/parser_benchmark_dataset.py
Show resolved
Hide resolved
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
No description provided.