Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add data sanity check between ETL and aggregation #88

Open
chorsley opened this issue Jan 28, 2017 · 0 comments
Open

Add data sanity check between ETL and aggregation #88

chorsley opened this issue Jan 28, 2017 · 0 comments
Labels

Comments

@chorsley
Copy link
Contributor

Create an independent script which checks:

  • Are all expected data sources present on S3?
  • Are the number of files available on S3 correct / the right order of magnitude?
  • Are the individual file sizes sane?
  • Check datapackage.json for inclusion of all sources + files.

This would ideally be forced to run before / integrated with the aggregation script. Any warnings and errors would need to be acknowledged or resolved before kicking off the aggregation process.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant