-
Notifications
You must be signed in to change notification settings - Fork 68
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Split covid_hosp into daily & timeseries tables #1126
base: dev
Are you sure you want to change the base?
Conversation
Sonarcloud is overzealous |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nicely done! i think the suggestions i made in the .sql files might make sonarcloud chillax
@@ -45,7 +45,7 @@ def __init__(self, | |||
self.connection = connection | |||
self.table_name = table_name | |||
self.hhs_dataset_id = hhs_dataset_id | |||
self.publication_col_name = "issue" if table_name == 'covid_hosp_state_timeseries' else \ | |||
self.publication_col_name = "issue" if table_name == 'covid_hosp_state_timeseries' or table_name == "covid_hosp_state_daily" else \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this line is getting a little gnarly, but we will fix it with the other yaml-schema changes so you can leave it for now
integrations/acquisition/covid_hosp/state_timeseries/test_scenarios.py
Outdated
Show resolved
Hide resolved
I've swapped the quality gate profile to one with a 20% duplication threshold. It should take effect on the next push to this branch (I attempted a manual re-run but it doesn't want to listen to me) (the duplicated lines are from database.py constructors so wouldn't be affected by .sql file changes) |
d07299f
to
caa796c
Compare
oof, looks like that force-push lost some changes 😢 |
@melange396 that one's intended - I reverted a couple of changes because of #1126 (comment) |
hmmm, this makes it look like you lost a change to |
@melange396 good point, re-added those two changes! (The tests actually pass just fine without the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a few more points...
@melange396 applied the changes from this round as well! |
this looks pretty good, though i havent given it a very deep inspection yet. in the meantime, you should include explain plans and timing for some "mocked" sample queries against real data (where we can use |
3734972
to
3893f62
Compare
Requesting re-review after many iterations on the The testing results of this query compared to the current one in
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
his is awesome. its much easier to understand (IMNSHO), even though we moved away from the one overloaded table and into two (getting rid of the WITH
expressions is a big help). plus the performance increases speak for themselves....
all comments marked 'nit:' are elective, you can decide whether you want to do them or not.
Kudos, SonarCloud Quality Gate passed!
|
@melange396 applied the fixes! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
awesome!
we probbly want to hold off on merging just yet, and first figure out whether to do this as a migration or as a re-load of all the data from the external sources |
Prerequisites:
dev
branchdev
Summary
Currently, the
covid_hosp_state_timeseries
table stores data from two datasets at once - daily snapshots and timeseries data; the nature of this data is distinguished by therecord_type
column. This PR by:state_daily
DDL with two tables instead of one