Skip to content

Commit cb728ca

Browse files
committed
feat: added entity name override option in data contract error details to align with business rules
1 parent d1743a1 commit cb728ca

File tree

7 files changed

+61
-55
lines changed

7 files changed

+61
-55
lines changed

docs/json_schemas/contract/components/field_error_detail.schema.json

Lines changed: 5 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -11,12 +11,11 @@
1111
},
1212
"error_message": {
1313
"description": "The message to be used for the field and error type specified. This can include templating (specified using jinja2 conventions). During templating, the full record will be available with an additional __error_value to easily obtain nested offending values.",
14-
"type": "string",
15-
"enum": [
16-
"record_rejection",
17-
"file_rejection",
18-
"warning"
19-
]
14+
"type": "string"
15+
},
16+
"reporting_entity": {
17+
"description": "The entity name to be given for grouping in error report. If left blank will default to the contract entity name",
18+
"type": "string"
2019
}
2120
},
2221
"required": [

src/dve/core_engine/message.py

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -30,6 +30,7 @@ class DataContractErrorDetail(BaseModel):
3030

3131
error_code: str
3232
error_message: Optional[str] = None
33+
reporting_entity: Optional[str] = None
3334

3435
def template_message(
3536
self,
@@ -232,7 +233,8 @@ def from_pydantic_error(
232233

233234
messages.append(
234235
cls(
235-
entity=entity,
236+
entity=error_detail.reporting_entity or entity,
237+
original_entity=entity,
236238
record=record,
237239
failure_type=failure_type,
238240
is_informational=is_informational,

tests/features/movies.feature

Lines changed: 44 additions & 44 deletions
Original file line numberDiff line numberDiff line change
@@ -1,47 +1,47 @@
11
Feature: Pipeline tests using the movies dataset
2-
Tests for the processing framework which use the movies dataset.
2+
Tests for the processing framework which use the movies dataset.
33

4-
This tests submissions in JSON format, with configuration in JSON config files.
5-
Complex types are tested (arrays, nested structs)
4+
This tests submissions in JSON format, with configuration in JSON config files.
5+
Complex types are tested (arrays, nested structs)
66

7-
Some validation of entity attributes is performed: SQL expressions and Python filter
8-
functions are used, and templatable business rules feature in the transformations.
7+
Some validation of entity attributes is performed: SQL expressions and Python filter
8+
functions are used, and templatable business rules feature in the transformations.
99

10-
Scenario: Validate and filter movies (spark)
11-
Given I submit the movies file movies.json for processing
12-
And A spark pipeline is configured
13-
And I create the following reference data tables in the database movies_refdata
14-
| table_name | parquet_path |
15-
| sequels | tests/testdata/movies/refdata/movies_sequels.parquet |
16-
And I add initial audit entries for the submission
17-
Then the latest audit record for the submission is marked with processing status file_transformation
18-
When I run the file transformation phase
19-
Then the movies entity is stored as a parquet after the file_transformation phase
20-
And the latest audit record for the submission is marked with processing status data_contract
21-
When I run the data contract phase
22-
Then there are 3 record rejections from the data_contract phase
23-
And there are errors with the following details and associated error_count from the data_contract phase
24-
| ErrorCode | ErrorMessage | error_count |
25-
| BLANKYEAR | year not provided | 1 |
26-
| DODGYYEAR | year value (NOT_A_NUMBER) is invalid | 1 |
27-
| DODGYDATE | date_joined value is not valid: daft_date | 1 |
28-
And the movies entity is stored as a parquet after the data_contract phase
29-
And the latest audit record for the submission is marked with processing status business_rules
30-
When I run the business rules phase
31-
Then The rules restrict "movies" to 4 qualifying records
32-
And there are errors with the following details and associated error_count from the business_rules phase
33-
| ErrorCode | ErrorMessage | error_count |
34-
| LIMITED_RATINGS | Movie has too few ratings ([6.1]) | 1 |
35-
| RUBBISH_SEQUEL | The movie The Greatest Movie Ever has a rubbish sequel | 1 |
36-
And the latest audit record for the submission is marked with processing status error_report
37-
When I run the error report phase
38-
Then An error report is produced
39-
And The statistics entry for the submission shows the following information
40-
| parameter | value |
41-
| record_count | 5 |
42-
| number_record_rejections | 4 |
43-
| number_warnings | 1 |
44-
And the error aggregates are persisted
10+
Scenario: Validate and filter movies (spark)
11+
Given I submit the movies file movies.json for processing
12+
And A spark pipeline is configured
13+
And I create the following reference data tables in the database movies_refdata
14+
| table_name | parquet_path |
15+
| sequels | tests/testdata/movies/refdata/movies_sequels.parquet |
16+
And I add initial audit entries for the submission
17+
Then the latest audit record for the submission is marked with processing status file_transformation
18+
When I run the file transformation phase
19+
Then the movies entity is stored as a parquet after the file_transformation phase
20+
And the latest audit record for the submission is marked with processing status data_contract
21+
When I run the data contract phase
22+
Then there are 3 record rejections from the data_contract phase
23+
And there are errors with the following details and associated error_count from the data_contract phase
24+
| Entity | ErrorCode | ErrorMessage | error_count |
25+
| movies | BLANKYEAR | year not provided | 1 |
26+
| movies_rename_test | DODGYYEAR | year value (NOT_A_NUMBER) is invalid | 1 |
27+
| movies | DODGYDATE | date_joined value is not valid: daft_date | 1 |
28+
And the movies entity is stored as a parquet after the data_contract phase
29+
And the latest audit record for the submission is marked with processing status business_rules
30+
When I run the business rules phase
31+
Then The rules restrict "movies" to 4 qualifying records
32+
And there are errors with the following details and associated error_count from the business_rules phase
33+
| ErrorCode | ErrorMessage | error_count |
34+
| LIMITED_RATINGS | Movie has too few ratings ([6.1]) | 1 |
35+
| RUBBISH_SEQUEL | The movie The Greatest Movie Ever has a rubbish sequel | 1 |
36+
And the latest audit record for the submission is marked with processing status error_report
37+
When I run the error report phase
38+
Then An error report is produced
39+
And The statistics entry for the submission shows the following information
40+
| parameter | value |
41+
| record_count | 5 |
42+
| number_record_rejections | 4 |
43+
| number_warnings | 1 |
44+
And the error aggregates are persisted
4545

4646
Scenario: Validate and filter movies (duckdb)
4747
Given I submit the movies file movies.json for processing
@@ -57,10 +57,10 @@ Feature: Pipeline tests using the movies dataset
5757
When I run the data contract phase
5858
Then there are 3 record rejections from the data_contract phase
5959
And there are errors with the following details and associated error_count from the data_contract phase
60-
| ErrorCode | ErrorMessage | error_count |
61-
| BLANKYEAR | year not provided | 1 |
62-
| DODGYYEAR | year value (NOT_A_NUMBER) is invalid | 1 |
63-
| DODGYDATE | date_joined value is not valid: daft_date | 1 |
60+
| Entity | ErrorCode | ErrorMessage | error_count |
61+
| movies | BLANKYEAR | year not provided | 1 |
62+
| movies_rename_test | DODGYYEAR | year value (NOT_A_NUMBER) is invalid | 1 |
63+
| movies | DODGYDATE | date_joined value is not valid: daft_date | 1 |
6464
And the movies entity is stored as a parquet after the data_contract phase
6565
And the latest audit record for the submission is marked with processing status business_rules
6666
When I run the business rules phase

tests/test_core_engine/test_backends/fixtures.py

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -567,9 +567,11 @@ def nested_parquet_custom_dc_err_details(temp_dir):
567567
err_details = {
568568
"id": {
569569
"Blank": {"error_code": "TESTIDBLANK",
570-
"error_message": "id cannot be null"},
570+
"error_message": "id cannot be null",
571+
"reporting_entity": "test_rename"},
571572
"Bad value": {"error_code": "TESTIDBAD",
572-
"error_message": "id is invalid: id - {{id}}"}
573+
"error_message": "id is invalid: id - {{id}}",
574+
"reporting_entity": "test_rename"}
573575
},
574576
"datetimefield": {
575577
"Bad value": {"error_code": "TESTDTFIELDBAD",

tests/test_core_engine/test_backends/test_implementations/test_duckdb/test_data_contract.py

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -360,4 +360,5 @@ def test_duckdb_data_contract_custom_error_details(nested_all_string_parquet_w_e
360360
assert messages[0].error_code == "SUBFIELDTESTIDBAD"
361361
assert messages[0].error_message == "subfield id is invalid: subfield.id - WRONG"
362362
assert messages[1].error_code == "TESTIDBAD"
363-
assert messages[1].error_message == "id is invalid: id - WRONG"
363+
assert messages[1].error_message == "id is invalid: id - WRONG"
364+
assert messages[1].entity == "test_rename"

tests/test_core_engine/test_backends/test_implementations/test_spark/test_data_contract.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -235,5 +235,6 @@ def test_spark_data_contract_custom_error_details(nested_all_string_parquet_w_er
235235
assert messages[0].error_message == "subfield id is invalid: subfield.id - WRONG"
236236
assert messages[1].error_code == "TESTIDBAD"
237237
assert messages[1].error_message == "id is invalid: id - WRONG"
238+
assert messages[1].entity == "test_rename"
238239

239240

tests/testdata/movies/movies_contract_error_details.json

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -12,7 +12,8 @@
1212
},
1313
"Bad value": {
1414
"error_code": "DODGYYEAR",
15-
"error_message": "year value ({{year}}) is invalid"
15+
"error_message": "year value ({{year}}) is invalid",
16+
"reporting_entity": "movies_rename_test"
1617
}
1718
},
1819
"cast.date_joined": {

0 commit comments

Comments
 (0)