-
Notifications
You must be signed in to change notification settings - Fork 23
Performance testing #990
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Performance testing #990
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add the new cli instructions to the readme
README.md
Outdated
To execute the performance test, navigate to the root directory of the project and run the following command: | ||
|
||
```sh | ||
python tests/PerformanceTest.py -dd <DATASET_DIRECTORY> -rd <RULES_DIRECTORY> -total_calls <NUMBER_OF_CALLS> -od <OUTPUT_DIRECTORY> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this is the same as the -d, --data TEXT
and -lr, --local_rules TEXT
args in the validate
command, it would be good to use the same arg names.
README.md
Outdated
|
||
This repository includes a performance testing script located in the `tests` folder under the filename `PerformanceTest.py`. The script is designed to evaluate the execution time of rules against datasets by running multiple test iterations. | ||
|
||
### Running the Performance Test |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This and the next header should be nested under the Performance Testing header. IOW, use 4 #
README.md
Outdated
``` | ||
### Performance Test Command-Line Flags | ||
|
||
- **`-dd` (Dataset Directory)**: The directory containing the dataset files in `.json` or `.xpt` format. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Format the documentation similar to the existing documentation. For example, use a code block for the args
Since engine is updated to remove test column, I will re-go over my code and update it to be according to updated engine. Putting back to in progress for now. @SFJohnson24 @gerrycampion |
…into PerformanceTesting
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One of the unit tests is failing
tests/PerformanceTest.py
Outdated
"core.py", | ||
"test", | ||
"-s", | ||
"sdtmig", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be a cli option
tests/PerformanceTest.py
Outdated
"-s", | ||
"sdtmig", | ||
"-v", | ||
"3.4", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be a cli option
tests/PerformanceTest.py
Outdated
for num_call in range(total_calls): | ||
rule_path = os.path.join(rule_dir, rule) | ||
command = [ | ||
"python", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should use sys.executable
so that it uses the same python exe and env as the caller.
tests/PerformanceTest.py
Outdated
command = [ | ||
"python", | ||
"core.py", | ||
"test", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be validate
now
tests/PerformanceTest.py
Outdated
|
||
for num_call in range(total_calls): | ||
rule_path = os.path.join(rule_dir, rule) | ||
command = [ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There should be an ability to include a define xml
tests/PerformanceTest.py
Outdated
|
||
for num_call in range(total_calls): | ||
rule_path = os.path.join(rule_dir, rule) | ||
command = [ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
see previous comments
tests/PerformanceTest.py
Outdated
return results, rule_results | ||
|
||
|
||
def all_datset_against_each_rule(dataset_dir, rule_dir, total_calls): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should be "dataset"
tests/PerformanceTest.py
Outdated
writer, sheet_name=f"Rule_{sanitized_rule_name[:28]}", index=False | ||
) # Truncate to 31 chars | ||
|
||
# Overall collective dataset results | ||
collective_dataset_df = pd.DataFrame(collective_dataset_result) | ||
collective_dataset_df.to_excel( | ||
writer, sheet_name="Collective Dataset Result", index=False | ||
) | ||
|
||
# Individual dataset results | ||
for dataset_name, dataset_data in individual_dataset_result.items(): | ||
sanitized_dataset_name = re.sub( | ||
r"[\\/*?:[\]]", "_", dataset_name | ||
) # Replace invalid characters with '_' | ||
dataset_df = pd.DataFrame(dataset_data) | ||
dataset_df.to_excel( | ||
writer, sheet_name=f"Dataset_{sanitized_dataset_name[:28]}", index=False |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm still getting a warning that some titles are too long. The math might be off.
tests/PerformanceTest.py
Outdated
for rule in rules: | ||
rule_name = os.path.basename(rule) | ||
|
||
# Initialize variables to collect times for the dataset across all rules | ||
all_time_taken = [] | ||
all_preprocessing_times = [] | ||
all_operator_times = {} | ||
all_operation_times = [] | ||
|
||
rule_names = [] | ||
for dataset_path in dataset_files: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like this method is essentially running the same set of commands as the previous function, just in a different order. I think it is redundant.
I don't think it supports rules where the rule might require multiple datasets that are joined together since only a single dataset is being passed at a time. I think the timing feedback needs to given from within the engine. Maybe an additional logging mechanism can be used.
The pull request adds performance Test to the tests folder. The performance test can be run by following command
python tests/PerformanceTest.py -dd path/to/folder/with/datasets -rd path/to/folder/with/rule/files -total_calls 1
Total calls define how many times do you want to execute a rule. The report will be saved in the directory from whic the test file was executed