Benchmark baseline testing - Reports and test data

### Reporting
For each of the following, we want `type`, `name`, `number of calls`, `cumulative time`, `mean time`, `median time`, `min time`, `max time`
- [x] #905
- [x] #906

| Rule # | Dataset | Time (s) |
| --- | --- | --- |
| CORE-00001 | LB | 5.1 |

- [x] #907

| Rule # | # of Calls | Cumulative Time (s) | Mean Time | Median Time | Min Time | Max Time |
| --- | --- | --- | ---| --- | --- | --- | 
| CORE-00001 | 5 | 5.1 | 1.1 | 1.0 | .5 | 8 |

- [x] #908
- [x] #909
  https://github.com/cdisc-org/cdisc-rules-engine/blob/68017ec16192d2c931e4a97d6085fab8ec1c6c02/cdisc_rules_engine/rules_engine.py#L334-L339
- [x] #910
  Refer to https://github.com/cdisc-org/cdisc-rules-engine/pull/898 - dataframe_operators.py has a new wrapper that can be used 
- [x] #911
  https://github.com/cdisc-org/cdisc-rules-engine/blob/68017ec16192d2c931e4a97d6085fab8ec1c6c02/cdisc_rules_engine/operations/base_operation.py#L47
- [x] #912
  https://github.com/cdisc-org/cdisc-rules-engine/blob/68017ec16192d2c931e4a97d6085fab8ec1c6c02/cdisc_rules_engine/rules_engine.py#L236-L300
- [x] #913
- [x] #914
  - Need to ensure that these reports do not have any data or proprietary information

- [x] #915
   Test data should include:
   - A set of small datasets and define-xml (Nick's certification data)
   - A set containing at least one large (~1GB) dataset
   - All currently published rules, given as list of rules

- [x] #916
   - JSON will be the primary output
   - JSON converted to Excel

	dataset = deepcopy(dataset)
	# preprocess dataset
	dataset_preprocessor = DatasetPreprocessor(
	dataset, domain, dataset_path, self.data_service, self.cache
	)
	dataset = dataset_preprocessor.preprocess(rule_copy, datasets)

	kwargs = {}
	builder = self.get_dataset_builder(rule, dataset_path, datasets, domain)
	dataset = builder.get_dataset()
	# Update rule for certain rule types
	# SPECIAL CASES FOR RULE TYPES ###############################
	# TODO: Handle these special cases better.
	if self.library_metadata:
	kwargs[
	"variable_codelist_map"
	] = self.library_metadata.variable_codelist_map
	kwargs[
	"codelist_term_maps"
	] = self.library_metadata.get_all_ct_package_metadata()
	if rule.get("rule_type") == RuleTypes.DEFINE_ITEM_METADATA_CHECK.value:
	if self.library_metadata:
	kwargs[
	"variable_codelist_map"
	] = self.library_metadata.variable_codelist_map
	kwargs[
	"codelist_term_maps"
	] = self.library_metadata.get_all_ct_package_metadata()

	elif (
	rule.get("rule_type")
	== RuleTypes.VARIABLE_METADATA_CHECK_AGAINST_DEFINE.value
	):
	self.rule_processor.add_comparator_to_rule_conditions(
	rule, comparator=None, target_prefix="define_"
	)
	elif (
	rule.get("rule_type")
	== RuleTypes.VALUE_LEVEL_METADATA_CHECK_AGAINST_DEFINE.value
	):
	value_level_metadata: List[dict] = self.get_define_xml_value_level_metadata(
	dataset_path, domain
	)
	kwargs["value_level_metadata"] = value_level_metadata

	elif (
	rule.get("rule_type")
	== RuleTypes.DATASET_CONTENTS_CHECK_AGAINST_DEFINE_AND_LIBRARY.value
	):
	library_metadata: dict = self.library_metadata.variables_metadata.get(
	domain, {}
	)
	define_metadata: List[dict] = builder.get_define_xml_variables_metadata()
	targets: List[
	str
	] = self.data_processor.filter_dataset_columns_by_metadata_and_rule(
	dataset.columns.tolist(), define_metadata, library_metadata, rule
	)
	rule_copy = deepcopy(rule)
	updated_conditions = RuleProcessor.duplicate_conditions_for_all_targets(
	rule_copy["conditions"], targets
	)
	rule_copy["conditions"].set_conditions(updated_conditions)
	# When duplicating conditions,
	# rule should be copied to prevent updates to concurrent rule executions
	return self.execute_rule(
	rule_copy, dataset, dataset_path, datasets, domain, **kwargs
	)

	kwargs["ct_packages"] = list(self.ct_packages)

	logger.info(f"Using dataset build by: {builder.__class__}")

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Benchmark baseline testing - Reports and test data #881

Reporting

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Benchmark baseline testing - Reports and test data #881

Description

Reporting

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions