Performance testing #990

RamilCDISC · 2025-01-27T23:04:39Z

The pull request adds performance Test to the tests folder. The performance test can be run by following command
python tests/PerformanceTest.py -dd path/to/folder/with/datasets -rd path/to/folder/with/rule/files -total_calls 1
Total calls define how many times do you want to execute a rule. The report will be saved in the directory from whic the test file was executed

.flake8

gerrycampion · 2025-01-31T18:45:14Z

tests/PerformanceTest.py

Can you add the new cli instructions to the readme

gerrycampion · 2025-02-03T19:14:01Z

README.md

+To execute the performance test, navigate to the root directory of the project and run the following command:
+
+```sh
+python tests/PerformanceTest.py -dd <DATASET_DIRECTORY> -rd <RULES_DIRECTORY> -total_calls <NUMBER_OF_CALLS> -od <OUTPUT_DIRECTORY>


If this is the same as the -d, --data TEXT and -lr, --local_rules TEXT args in the validate command, it would be good to use the same arg names.

gerrycampion · 2025-02-03T19:16:05Z

README.md

+
+This repository includes a performance testing script located in the `tests` folder under the filename `PerformanceTest.py`. The script is designed to evaluate the execution time of rules against datasets by running multiple test iterations.
+
+### Running the Performance Test


This and the next header should be nested under the Performance Testing header. IOW, use 4 #

gerrycampion · 2025-02-03T19:33:06Z

README.md

+```
+### Performance Test Command-Line Flags
+
+- **`-dd` (Dataset Directory)**: The directory containing the dataset files in `.json` or `.xpt` format.


Format the documentation similar to the existing documentation. For example, use a code block for the args

RamilCDISC · 2025-04-21T18:58:51Z

Since engine is updated to remove test column, I will re-go over my code and update it to be according to updated engine. Putting back to in progress for now. @SFJohnson24 @gerrycampion

…into PerformanceTesting

gerrycampion · 2025-05-08T16:22:24Z

tests/QARegressionTests/test_core/test_validate.py

One of the unit tests is failing

gerrycampion · 2025-05-08T17:22:01Z

tests/PerformanceTest.py

+                    "core.py",
+                    "test",
+                    "-s",
+                    "sdtmig",


This should be a cli option

gerrycampion · 2025-05-08T17:22:10Z

tests/PerformanceTest.py

+                    "-s",
+                    "sdtmig",
+                    "-v",
+                    "3.4",


This should be a cli option

gerrycampion · 2025-05-08T17:22:57Z

tests/PerformanceTest.py

+            for num_call in range(total_calls):
+                rule_path = os.path.join(rule_dir, rule)
+                command = [
+                    "python",


This should use sys.executable so that it uses the same python exe and env as the caller.

gerrycampion · 2025-05-08T17:23:27Z

tests/PerformanceTest.py

+                command = [
+                    "python",
+                    "core.py",
+                    "test",


This should be validate now

gerrycampion · 2025-05-08T17:25:22Z

tests/PerformanceTest.py

+
+            for num_call in range(total_calls):
+                rule_path = os.path.join(rule_dir, rule)
+                command = [


There should be an ability to include a define xml

gerrycampion · 2025-05-08T17:36:59Z

tests/PerformanceTest.py

+
+            for num_call in range(total_calls):
+                rule_path = os.path.join(rule_dir, rule)
+                command = [


see previous comments

gerrycampion · 2025-05-08T17:46:29Z

tests/PerformanceTest.py

+    return results, rule_results
+
+
+def all_datset_against_each_rule(dataset_dir, rule_dir, total_calls):


should be "dataset"

gerrycampion · 2025-05-08T17:51:37Z

tests/PerformanceTest.py

+                writer, sheet_name=f"Rule_{sanitized_rule_name[:28]}", index=False
+            )  # Truncate to 31 chars
+
+        # Overall collective dataset results
+        collective_dataset_df = pd.DataFrame(collective_dataset_result)
+        collective_dataset_df.to_excel(
+            writer, sheet_name="Collective Dataset Result", index=False
+        )
+
+        # Individual dataset results
+        for dataset_name, dataset_data in individual_dataset_result.items():
+            sanitized_dataset_name = re.sub(
+                r"[\\/*?:[\]]", "_", dataset_name
+            )  # Replace invalid characters with '_'
+            dataset_df = pd.DataFrame(dataset_data)
+            dataset_df.to_excel(
+                writer, sheet_name=f"Dataset_{sanitized_dataset_name[:28]}", index=False


I'm still getting a warning that some titles are too long. The math might be off.

gerrycampion · 2025-05-09T18:35:42Z

tests/PerformanceTest.py

+    for rule in rules:
+        rule_name = os.path.basename(rule)
+
+        # Initialize variables to collect times for the dataset across all rules
+        all_time_taken = []
+        all_preprocessing_times = []
+        all_operator_times = {}
+        all_operation_times = []
+
+        rule_names = []
+        for dataset_path in dataset_files:


It looks like this method is essentially running the same set of commands as the previous function, just in a different order. I think it is redundant.
I don't think it supports rules where the rule might require multiple datasets that are joined together since only a single dataset is being passed at a time. I think the timing feedback needs to given from within the engine. Maybe an additional logging mechanism can be used.

RamilCDISC added 8 commits January 3, 2025 16:37

initial commit

02c4b94

lint update

36ee676

update

c5c91ac

lint update

28aeaa8

tests

7d357f3

test

f754edf

lint

90c762a

lint

3838013

RamilCDISC requested review from SFJohnson24 and gerrycampion January 28, 2025 22:05

RamilCDISC added 2 commits January 28, 2025 16:41

black version update

e38f0d9

reformatted rules_engine.py using updated version of black

df8eaf9

gerrycampion requested changes Jan 31, 2025

View reviewed changes

.flake8 Outdated Show resolved Hide resolved

tests/PerformanceTest.py

Copy link

Collaborator

gerrycampion Jan 31, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add the new cli instructions to the readme

RamilCDISC added 2 commits January 31, 2025 14:34

add how to run Performance Testing to README.md

6a61a7a

Update .flake8

c7e8270

RamilCDISC requested a review from gerrycampion January 31, 2025 22:04

gerrycampion requested changes Feb 3, 2025

View reviewed changes

RamilCDISC added 5 commits February 3, 2025 17:33

Update README.md

6ff135f

Merge branch 'main' into PerformanceTesting

93de0ea

update the performance test flags name

ab5ece2

Update test_validate.py

f62ca9a

Update test_validate.py

d7c14bf

RamilCDISC requested a review from gerrycampion February 10, 2025 20:09

OGarcia11 assigned SFJohnson24 and gerrycampion Feb 17, 2025

gerrycampion added this to the v0.10.0 milestone Feb 19, 2025

gerrycampion added large datasets testing Unit, regression, performance, QA, test automation labels Feb 20, 2025

gerrycampion linked an issue Feb 21, 2025 that may be closed by this pull request

Benchmark baseline testing - Reports and test data #881

Open

12 tasks

Merge branch 'main' of https://github.com/cdisc-org/cdisc-rules-engine …

4d0b746

…into PerformanceTesting

nickdedonder modified the milestones: v0.10.0, V0.11.0 Apr 28, 2025

Merge branch 'main' into PerformanceTesting

94fca10

gerrycampion requested changes May 9, 2025

View reviewed changes

RamilCDISC added 2 commits May 12, 2025 14:06

update

3e57537

lint update

8da8c6b

RamilCDISC requested a review from gerrycampion May 12, 2025 19:26

OGarcia11 modified the milestones: V0.12.0, v1.0.0 Jul 28, 2025


		This repository includes a performance testing script located in the `tests` folder under the filename `PerformanceTest.py`. The script is designed to evaluate the execution time of rules against datasets by running multiple test iterations.

		### Running the Performance Test

		return results, rule_results


		def all_datset_against_each_rule(dataset_dir, rule_dir, total_calls):

Performance testing #990

Are you sure you want to change the base?

Performance testing #990

Uh oh!

Conversation

RamilCDISC commented Jan 27, 2025

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

RamilCDISC commented Apr 21, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants