Skip to content

Conversation

@minhkhul
Copy link
Contributor

@minhkhul minhkhul commented Nov 7, 2025

Description

Add hsanci geo level to nssp

Ran through extensive existing unit tests for nssp, ran the indicators and looked at the csv files. Result looks good.

@minhkhul minhkhul requested a review from dshemetov November 7, 2025 22:53
Copy link
Contributor

@dshemetov dshemetov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me.

"state",
"county",
"hhs",
"hsanci",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not this?

Suggested change
"hsanci",
"hsa_nci",

Copy link
Contributor Author

@minhkhul minhkhul Nov 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

acquisition will think the geo is hsa and the name of the signal is nci_actual_signal_name based on the csv file name.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

geos with underscores in them break an acquisition regex

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how about this then?

Suggested change
"hsanci",
"hsa-nci",

elif geo == "hsanci":
df = df[["hsa_nci_id", "val", "timestamp"]]
df = df[df["hsa_nci_id"] != "All"]
df = df.groupby(["hsa_nci_id", "timestamp"])['val'].min().reset_index()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is this for?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The data source reports at the HSA-NCI level and duplicates the same value across the constituent counties. This picks out just one of those per key.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah ok, for deduplicating. in that case, the .min() is misleading/confusing and could use a small explanatory comment

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair enough. Wrote a similar comment on the epidata-etl side.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

an alternative couldve been:

df = df.drop_duplicates()

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants