-
Notifications
You must be signed in to change notification settings - Fork 16
Add hsanci to nssp #2162
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Add hsanci to nssp #2162
Conversation
dshemetov
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me.
nssp/delphi_nssp/constants.py
Outdated
| "state", | ||
| "county", | ||
| "hhs", | ||
| "hsanci", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why not this?
| "hsanci", | |
| "hsa_nci", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
acquisition will think the geo is hsa and the name of the signal is nci_actual_signal_name based on the csv file name.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
geos with underscores in them break an acquisition regex
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how about this then?
| "hsanci", | |
| "hsa-nci", |
nssp/delphi_nssp/run.py
Outdated
| elif geo == "hsanci": | ||
| df = df[["hsa_nci_id", "val", "timestamp"]] | ||
| df = df[df["hsa_nci_id"] != "All"] | ||
| df = df.groupby(["hsa_nci_id", "timestamp"])['val'].min().reset_index() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what is this for?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The data source reports at the HSA-NCI level and duplicates the same value across the constituent counties. This picks out just one of those per key.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah ok, for deduplicating. in that case, the .min() is misleading/confusing and could use a small explanatory comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fair enough. Wrote a similar comment on the epidata-etl side.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
an alternative couldve been:
df = df.drop_duplicates()
Description
Add hsanci geo level to nssp
Ran through extensive existing unit tests for nssp, ran the indicators and looked at the csv files. Result looks good.