Skip to content
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions nssp/delphi_nssp/constants.py
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,7 @@
"state",
"county",
"hhs",
"hsanci",
]

SIGNALS_MAP = {
Expand Down
2 changes: 1 addition & 1 deletion nssp/delphi_nssp/pull.py
Original file line number Diff line number Diff line change
Expand Up @@ -177,5 +177,5 @@ def pull_nssp_data(
# Format county fips to all be 5 digits with leading zeros
df_ervisits["fips"] = df_ervisits["fips"].apply(lambda x: str(x).zfill(5) if str(x) != "0" else "0")

keep_columns = ["timestamp", "geography", "county", "fips"]
keep_columns = ["timestamp", "geography", "county", "fips", "hsa_nci_id"]
return df_ervisits[SIGNALS + keep_columns]
5 changes: 5 additions & 0 deletions nssp/delphi_nssp/run.py
Original file line number Diff line number Diff line change
Expand Up @@ -137,6 +137,11 @@ def run_module(params, logger=None):
df = geo_mapper.add_geocode(df, "state_code", "hhs", from_col="state_code", new_col="geo_id")
df = geo_mapper.aggregate_by_weighted_sum(df, "geo_id", "val", "timestamp", "population")
df = df.rename(columns={"weighted_val": "val"})
elif geo == "hsanci":
df = df[["hsa_nci_id", "val", "timestamp"]]
df = df[df["hsa_nci_id"] != "All"]
df = df.groupby(["hsa_nci_id", "timestamp"])['val'].min().reset_index()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is this for?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The data source reports at the HSA-NCI level and duplicates the same value across the constituent counties. This picks out just one of those per key.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah ok, for deduplicating. in that case, the .min() is misleading/confusing and could use a small explanatory comment

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair enough. Wrote a similar comment on the epidata-etl side.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

an alternative couldve been:

df = df.drop_duplicates()

df = df.rename(columns={"hsa_nci_id": "geo_id"})
else:
df = df[df["county"] != "All"]
df["geo_id"] = df["fips"]
Expand Down
Loading