Skip to content

Add db014#28

Open
harjotxsaini wants to merge 1 commit intomainfrom
DB014---Dataset-Validation-Script
Open

Add db014#28
harjotxsaini wants to merge 1 commit intomainfrom
DB014---Dataset-Validation-Script

Conversation

@harjotxsaini
Copy link
Copy Markdown
Collaborator

Ticket DB014 - Dataset Validation Script

Description

This PR implements the DB014 Dataset Validation Script, which performs comprehensive validation of the product dataset. The script checks for:

Basic schema compliance (required fields, types)
Nutrient structure validation
Allergen structure validation
Barcode validation (format, empty values, duplicates)
Advanced schema checks (enum validation, missing subfields)

The script generates a JSON validation report (schema_validation_report.json) and logs all validation results. The latest run confirms that:

Basic schema issues: 4757 detected (missing fields like productName and nutriscoreGrade)
Nutrients and allergens structure: all valid
Barcode issues: 118 invalid formats detected
Advanced schema invalid records: 38
The validation report is saved and ready for review

This ensures that all dataset integrity checks are implemented and working as intended.

Run: python -m database.Validation.db021_validator

Checks

  • All requirements of the ticket have been implemented, or I have commented on any exclusions
  • Unit tests have been added or updated for any backend changes (if applicable)
  • I have reviewed the Files Changed tab and verified it only contains relevant changes (comment if unsure about any)
  • This PR has been reviewed and approved

Screenshots

Screenshot 2026-04-01 132729 Screenshot 2026-04-01 132759 Screenshot 2026-04-01 132760

@s223503101
Copy link
Copy Markdown
Collaborator

Screenshot 2026-04-02 144249 Screenshot 2026-04-02 144304 Please remove emoji from log and print messages (or use plain text only). On Windows the console encoding breaks on those characters. After that, a failed dataset will exit with code 1 on purpose; a crash from encoding is a separate bug we should avoid.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants