Skip to content

feat: implement multiple file support in Dataset#81

Merged
Abdeali099 merged 4 commits intotest-suitefrom
multifiles-dataset
Mar 30, 2026
Merged

feat: implement multiple file support in Dataset#81
Abdeali099 merged 4 commits intotest-suitefrom
multifiles-dataset

Conversation

@Abdeali099
Copy link
Copy Markdown
Member

No description provided.


self.log.file_content = content
return content
self.log.file_content = combined
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can add here something like, Order can be made of multiple files.

make separator constant

@Abdeali099 Abdeali099 marked this pull request as ready for review March 30, 2026 14:01
@Abdeali099 Abdeali099 merged commit 51ccd69 into test-suite Mar 30, 2026
2 of 3 checks passed
@Abdeali099 Abdeali099 deleted the multifiles-dataset branch March 30, 2026 14:01
@greptile-apps
Copy link
Copy Markdown

greptile-apps bot commented Mar 30, 2026

Confidence Score: 4/5

Safe to merge after restoring per-row file type validation; all other changes are well-structured.

One P1 finding: the SUPPORTED_FILE_TYPES guard was removed without replacement, meaning unsupported file types silently pass validation and only fail during background job execution. All other findings are P2 style/UX suggestions. Restoring the validation check in validate_files would bring this to 5/5.

parser_benchmark_dataset.py — the validate_files method needs a per-row file type check restored.

Important Files Changed

Filename Overview
transaction_parser/parser_benchmark/doctype/parser_benchmark_dataset/parser_benchmark_dataset.py Core model refactored from single-file to child-table; file type validation was removed without replacement, creating a regression where unsupported file types pass validation and only fail at background job execution time.
transaction_parser/parser_benchmark/doctype/parser_benchmark_dataset/parser_benchmark_dataset.json DocType fields migrated from single Attach+Data to a child Table; the depends_on visibility guard for the PDF Processor section was dropped, making it permanently visible regardless of uploaded file types.
transaction_parser/parser_benchmark/runner.py Runner updated to iterate over multiple file docs and join content with a separator; logic is clean and the empty-list guard in _get_file_docs prevents IndexError.
transaction_parser/patches/populate_dataset_files_table.py Migration patch correctly reads previously-attached File docs and inserts them as child rows; uses db_insert() which is appropriate for a data migration patch.
transaction_parser/patches/remove_dataset_file_field.py Pre-model-sync patch ensures File docs are properly linked before the old file column is dropped; handles orphan and missing File docs gracefully.

Reviews (1): Last reviewed commit: "refactor: enhance Parser Benchmark Datas..." | Re-trigger Greptile

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant