update #1359

SFJohnson24 · 2025-10-02T12:49:56Z

This pull request introduces two major new configuration options for validation and reporting: limiting the number of rows per Excel report file and capping the number of errors reported per rule. These options can be set via CLI flags or environment variables, and the code now supports splitting Excel reports into multiple files when data exceeds the configured row limit. The changes are reflected in the CLI, reporting logic, and rule validation flow.

Reporting enhancements

Added support for splitting Excel reports into multiple files when the number of rows exceeds a configurable maximum (max_report_rows). This can be set via the CLI (-mr) or the MAX_REPORT_ROWS environment variable, and is documented in README.md and env.example. The logic ensures the larger value between CLI and env is used, and disables the limit if set to 0. [1] [2] [3] [4] [5] [6]
Refactored ExcelReport to handle multiple output files, updating file naming and metadata to reflect report parts when splitting occurs. [1] [2]

Validation logic improvements

Introduced a configurable soft maximum for errors reported per rule (max_errors_per_rule), settable via CLI (-me) or MAX_ERRORS_PER_RULE env variable, with logic to halt validation for a rule once the limit is reached. Defaults to 1000 if unspecified; disables the limit if set to 0. [1] [2] [3] [4] [5] [6]
Added helper function set_max_errors_per_rule to centralize max error logic, and integrated it into both CLI and script-based validation flows. [1] [2] [3] [4]

CLI and documentation updates

Updated CLI (core.py) to accept new flags for controlling report row and error limits, and ensured environment variables are loaded for these options. Also updated test and validation argument handling to support the new parameters. [1] [2] [3] [4] [5] [6] [7] [8]
Expanded README.md and env.example to document the new options and their behaviors. [1] [2]

RamilCDISC

Could you please see the below comment and will the rule editor also have a way to set the size of records in the report?

RamilCDISC · 2025-10-02T21:13:45Z

cdisc_rules_engine/services/reporting/excel_report.py

+        Create multiple workbooks when data exceeds Excel's row limit.
+        """
+        workbooks = []
+        detailed_chunks = self._chunk_data(detailed_data, self.MAX_ROWS_PER_SHEET)


Could you lowercase the max_rows_per_sheet here as it is defined in the constructor.

rule editor will not have a way to set the size--this is purely a feature for the CLI and the executable. editor uses excel data and should not be getting a dataset with issues that would reach the row limit of excel

Could you lowercase the max_rows_per_sheet here as it is defined in the constructor.

this is done

RamilCDISC

I ran multiple validation with different test cases. I found that the issue summary page is being left empty from part2 onwards. Only the first part shows content on issue summary page.
Could we make the env variable name and CLI flag same?
Is the rules report sheet omitted from the max number of rows check?

SFJohnson24 · 2025-10-06T20:39:21Z

I ran multiple validation with different test cases. I found that the issue summary page is being left empty from part2 onwards. Only the first part shows content on issue summary page.

I changed this so if it is shorter than the row limit, it will print it on every wb. If it is longer than the limit, it will populate as much as it can but may leave blanks

2. Could we make the env variable name and CLI flag same?

done

3. Is the rules report sheet omitted from the max number of rows check?

yes--I could add this if we think this is necessary? Number of issues and the issue reports seemed like the logical things to do as those could get quite long. I didnt think that rules run would get to the point it needs to be split across multiple workbooks

RamilCDISC

I ran a validation using the test_dataset.xslx file in test/resuources folder. I set the standard to sdtmig, version to 3.4 and -me 1. The generated report in the issue details tab show record for CORE-000206 for 80 times for the same dataset suppec.xpt. What i understood from the readme is that when the max error limit is met or exceeded the engine will cease the execution for that rule on that dataset. This is violating the expected behavior.

The report file is attached below

CORE-Report-2025-10-08T17-10-39.xlsx

SFJohnson24 · 2025-10-09T13:24:40Z

@RamilCDISC -- the limit was met or exceeded for a single dataset. for suppec, it produced 80 errors which is >1 so it did not validate any of the other datasets against that rule. If you look at the results, it is only validating 1 dataset per rule because once the first dataset is validated with issues, it exceeds the limit and the validation for that rule ceases for tthe remaining datasets. This is expected behavior based off what you just pasted.

setting a limit and stopping validaition mid-dataset is much more difficult that after each dataset runs--which is why it is 'met or exceeded'. It runs a dataset against a rule, checks the errors returns and sees if it is at or over the limit

RamilCDISC

I read your last comment. Thank you for making it more clear.
Could you please see the below comment. Its the final change and then we are good to merge.

RamilCDISC · 2025-10-09T18:53:26Z

env.example

-DATASET_SIZE_THRESHOLD = size_in_bytes to force dask implementation
+DATASET_SIZE_THRESHOLD = size_in_bytes to force dask implementation
+MAX_REPORT_ROWS = maximum number of issues per excel sheet (plus headers) in result report
+MAX_ERRORS_PER_RULE = maximum number of errors to report per rule during a validation run.


here the env variable is MAX_ERRORS_PER_RULE but the readme file says MAX_REPORT_ERRORS. Could you please update the readme?

it was incorrect in the readme. MAX_ERRORS_PER_RULE is correct--i believe that is more descriptive.

RamilCDISC

The PR adds two new features to engine. Users can now set the max number of rows in the reports and also configure max number of errors for one rule. The validation was done by:

Reviewing the code for updated logic and correctness.
Validating the code to ensure no unwanted code or comments are remaining.
Validating all unit tests, regression tests and other automated testings pass.
Running validation manually for -mr flag with a high value.
Running validation manually for -mr flag with a low value.
Running validation manually for -mr flag set to 0.
Running validation manually for -me flag with a high value.
Running validation manually for -me flag with a low value.
Running validation manually for -me flag set to 0.
Running valdiaiton using both mr and me flags to ensure they integrate properly together.

update

a6177bb

SFJohnson24 self-assigned this Oct 2, 2025

SFJohnson24 temporarily deployed to DEV October 2, 2025 12:50 — with GitHub Actions Inactive

Merge branch 'main' into limit

9e5b479

SFJohnson24 temporarily deployed to DEV October 2, 2025 12:50 — with GitHub Actions Inactive

tests

3c3b82c

SFJohnson24 temporarily deployed to DEV October 2, 2025 12:58 — with GitHub Actions Inactive

test2

196d0ee

SFJohnson24 temporarily deployed to DEV October 2, 2025 13:01 — with GitHub Actions Inactive

logging fix

3699072

SFJohnson24 temporarily deployed to DEV October 2, 2025 13:14 — with GitHub Actions Inactive

debug

80df783

SFJohnson24 temporarily deployed to DEV October 2, 2025 13:22 — with GitHub Actions Inactive

log

126c49d

SFJohnson24 temporarily deployed to DEV October 2, 2025 13:36 — with GitHub Actions Inactive

log

7ae6cc9

SFJohnson24 temporarily deployed to DEV October 2, 2025 13:53 — with GitHub Actions Inactive

log

787d121

SFJohnson24 temporarily deployed to DEV October 2, 2025 14:02 — with GitHub Actions Inactive

RamilCDISC temporarily deployed to DEV October 2, 2025 21:30 — with GitHub Actions Inactive

RamilCDISC requested changes Oct 2, 2025

View reviewed changes

SFJohnson24 added 2 commits October 3, 2025 09:43

Merge branch 'main' into limit

f7d33d9

lower case

a7c920e

SFJohnson24 temporarily deployed to DEV October 3, 2025 13:44 — with GitHub Actions Inactive

SFJohnson24 requested a review from RamilCDISC October 3, 2025 13:44

RamilCDISC requested changes Oct 3, 2025

View reviewed changes

SFJohnson24 requested a review from RamilCDISC October 6, 2025 20:39

changes

880ab9a

SFJohnson24 temporarily deployed to DEV October 6, 2025 20:39 — with GitHub Actions Inactive

Merge branch 'main' into limit

f3d6355

SFJohnson24 temporarily deployed to DEV October 8, 2025 17:55 — with GitHub Actions Inactive

fix bug

a57fb22

SFJohnson24 temporarily deployed to DEV October 8, 2025 18:01 — with GitHub Actions Inactive

tests

997ee97

SFJohnson24 temporarily deployed to DEV October 8, 2025 18:45 — with GitHub Actions Inactive

kwarg

e06d7a2

SFJohnson24 temporarily deployed to DEV October 8, 2025 19:04 — with GitHub Actions Inactive

ruletester

96ef4d7

SFJohnson24 temporarily deployed to DEV October 8, 2025 19:14 — with GitHub Actions Inactive

SFJohnson24 requested a review from RamilCDISC October 8, 2025 19:26

RamilCDISC requested changes Oct 8, 2025

View reviewed changes

SFJohnson24 requested a review from RamilCDISC October 9, 2025 13:24

RamilCDISC requested changes Oct 9, 2025

View reviewed changes

SFJohnson24 requested a review from RamilCDISC October 10, 2025 11:55

readme update

e673463

SFJohnson24 temporarily deployed to DEV October 10, 2025 11:55 — with GitHub Actions Inactive

SFJohnson24 added 2 commits October 10, 2025 14:39

Merge branch 'main' into limit

19dd858

test

4e96cf2

SFJohnson24 temporarily deployed to DEV October 10, 2025 18:40 — with GitHub Actions Inactive

test

ed29859

SFJohnson24 temporarily deployed to DEV October 10, 2025 18:47 — with GitHub Actions Inactive

RamilCDISC reviewed Oct 10, 2025

View reviewed changes

Merge branch 'main' into limit

7bf13ea

RamilCDISC temporarily deployed to DEV October 10, 2025 22:52 — with GitHub Actions Inactive

RamilCDISC self-requested a review October 12, 2025 20:57

RamilCDISC approved these changes Oct 12, 2025

View reviewed changes

RamilCDISC merged commit a0ffebe into main Oct 12, 2025
6 checks passed

RamilCDISC deleted the limit branch October 12, 2025 20:58

update #1359

update #1359

Uh oh!

Conversation

SFJohnson24 commented Oct 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reporting enhancements

Validation logic improvements

CLI and documentation updates

Uh oh!

RamilCDISC left a comment

Choose a reason for hiding this comment

Uh oh!

RamilCDISC Oct 2, 2025

Choose a reason for hiding this comment

Uh oh!

SFJohnson24 Oct 3, 2025

Choose a reason for hiding this comment

Uh oh!

SFJohnson24 Oct 3, 2025

Choose a reason for hiding this comment

Uh oh!

RamilCDISC left a comment

Choose a reason for hiding this comment

Uh oh!

SFJohnson24 commented Oct 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

RamilCDISC left a comment

Choose a reason for hiding this comment

Uh oh!

SFJohnson24 commented Oct 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

RamilCDISC left a comment

Choose a reason for hiding this comment

Uh oh!

RamilCDISC Oct 9, 2025

Choose a reason for hiding this comment

Uh oh!

SFJohnson24 Oct 10, 2025

Choose a reason for hiding this comment

Uh oh!

RamilCDISC left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

SFJohnson24 commented Oct 2, 2025 •

edited

Loading

SFJohnson24 commented Oct 6, 2025 •

edited

Loading

SFJohnson24 commented Oct 9, 2025 •

edited

Loading