11Feature : Pipeline tests using the movies dataset
2- Tests for the processing framework which use the movies dataset.
2+ Tests for the processing framework which use the movies dataset.
33
4- This tests submissions in JSON format, with configuration in JSON config files.
5- Complex types are tested (arrays, nested structs)
4+ This tests submissions in JSON format, with configuration in JSON config files.
5+ Complex types are tested (arrays, nested structs)
66
7- Some validation of entity attributes is performed: SQL expressions and Python filter
8- functions are used, and templatable business rules feature in the transformations.
7+ Some validation of entity attributes is performed: SQL expressions and Python filter
8+ functions are used, and templatable business rules feature in the transformations.
99
10- Scenario : Validate and filter movies (spark)
11- Given I submit the movies file movies.json for processing
12- And A spark pipeline is configured
13- And I create the following reference data tables in the database movies_refdata
14- | table_name | parquet_path |
15- | sequels | tests /testdata /movies /refdata /movies_sequels .parquet |
16- And I add initial audit entries for the submission
17- Then the latest audit record for the submission is marked with processing status file_transformation
18- When I run the file transformation phase
19- Then the movies entity is stored as a parquet after the file_transformation phase
20- And the latest audit record for the submission is marked with processing status data_contract
21- When I run the data contract phase
22- Then there are 3 record rejections from the data_contract phase
23- And there are errors with the following details and associated error_count from the data_contract phase
24- | ErrorCode | ErrorMessage | error_count |
25- | BLANKYEAR | year not provided | 1 |
26- | DODGYYEAR | year value (NOT_A_NUMBER ) is invalid | 1 |
27- | DODGYDATE | date_joined value is not valid : daft_date | 1 |
28- And the movies entity is stored as a parquet after the data_contract phase
29- And the latest audit record for the submission is marked with processing status business_rules
30- When I run the business rules phase
31- Then The rules restrict "movies" to 4 qualifying records
32- And there are errors with the following details and associated error_count from the business_rules phase
33- | ErrorCode | ErrorMessage | error_count |
34- | LIMITED_RATINGS | Movie has too few ratings ([6 .1 ]) | 1 |
35- | RUBBISH_SEQUEL | The movie The Greatest Movie Ever has a rubbish sequel | 1 |
36- And the latest audit record for the submission is marked with processing status error_report
37- When I run the error report phase
38- Then An error report is produced
39- And The statistics entry for the submission shows the following information
40- | parameter | value |
41- | record_count | 5 |
42- | number_record_rejections | 4 |
43- | number_warnings | 1 |
44- And the error aggregates are persisted
10+ Scenario : Validate and filter movies (spark)
11+ Given I submit the movies file movies.json for processing
12+ And A spark pipeline is configured
13+ And I create the following reference data tables in the database movies_refdata
14+ | table_name | parquet_path |
15+ | sequels | tests /testdata /movies /refdata /movies_sequels .parquet |
16+ And I add initial audit entries for the submission
17+ Then the latest audit record for the submission is marked with processing status file_transformation
18+ When I run the file transformation phase
19+ Then the movies entity is stored as a parquet after the file_transformation phase
20+ And the latest audit record for the submission is marked with processing status data_contract
21+ When I run the data contract phase
22+ Then there are 3 record rejections from the data_contract phase
23+ And there are errors with the following details and associated error_count from the data_contract phase
24+ | Entity | ErrorCode | ErrorMessage | error_count |
25+ | movies | BLANKYEAR | year not provided | 1 |
26+ | movies_rename_test | DODGYYEAR | year value (NOT_A_NUMBER ) is invalid | 1 |
27+ | movies | DODGYDATE | date_joined value is not valid : daft_date | 1 |
28+ And the movies entity is stored as a parquet after the data_contract phase
29+ And the latest audit record for the submission is marked with processing status business_rules
30+ When I run the business rules phase
31+ Then The rules restrict "movies" to 4 qualifying records
32+ And there are errors with the following details and associated error_count from the business_rules phase
33+ | ErrorCode | ErrorMessage | error_count |
34+ | LIMITED_RATINGS | Movie has too few ratings ([6 .1 ]) | 1 |
35+ | RUBBISH_SEQUEL | The movie The Greatest Movie Ever has a rubbish sequel | 1 |
36+ And the latest audit record for the submission is marked with processing status error_report
37+ When I run the error report phase
38+ Then An error report is produced
39+ And The statistics entry for the submission shows the following information
40+ | parameter | value |
41+ | record_count | 5 |
42+ | number_record_rejections | 4 |
43+ | number_warnings | 1 |
44+ And the error aggregates are persisted
4545
4646 Scenario : Validate and filter movies (duckdb)
4747 Given I submit the movies file movies.json for processing
@@ -57,10 +57,10 @@ Feature: Pipeline tests using the movies dataset
5757 When I run the data contract phase
5858 Then there are 3 record rejections from the data_contract phase
5959 And there are errors with the following details and associated error_count from the data_contract phase
60- | ErrorCode | ErrorMessage | error_count |
61- | BLANKYEAR | year not provided | 1 |
62- | DODGYYEAR | year value (NOT_A_NUMBER ) is invalid | 1 |
63- | DODGYDATE | date_joined value is not valid : daft_date | 1 |
60+ | Entity | ErrorCode | ErrorMessage | error_count |
61+ | movies | BLANKYEAR | year not provided | 1 |
62+ | movies_rename_test | DODGYYEAR | year value (NOT_A_NUMBER ) is invalid | 1 |
63+ | movies | DODGYDATE | date_joined value is not valid : daft_date | 1 |
6464 And the movies entity is stored as a parquet after the data_contract phase
6565 And the latest audit record for the submission is marked with processing status business_rules
6666 When I run the business rules phase
0 commit comments