Skip to content

fluview: in-archive CSV filename changes #1638

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Apr 16, 2025
Merged

Conversation

melange396
Copy link
Collaborator

Fluview acquisition stopped working correctly. Data files are downloaded and archived, and some national-level data is added to the DB successfully before the job crashes. According to cronicle job logs, this started happening between 8am and 4pm on March 24.

The names of certain CSV files (inside of the .zip archives we download from them) have changed for what appears to be redaction "reasons". Particularly, the string "WHO" has been replaced with "ICL", which seems to stand for "Influenza Collaborating Laboratories".

I have not yet tested this as much as i would like, though i am pretty confident that it should work for now. Eventually we might hafta change this line too:

"DatasourceDT": [get_entry(1, "ILINet"), get_entry(0, "WHO_NREVSS")],

I will patch and put this into production after the weekend, probably on monday (it is friday at the time of writing). The missing data will also be backfilled then (currently, all but the national-level data is behind by ~2 weeks (or two data points per location)).

Copy link

@aysim319
Copy link
Contributor

aysim319 commented Apr 11, 2025

checked that at least the new filenames does get processed and returns rows from load_zipped_csv function.
I did notice that in the in the main() for fluview_update.py we don't actually updated the PHL files. Is the public data going to be live at some point?
on line 554-555

# TODO: header row has changed for public health lab data
 # update_from_file_public(issue, date, filename, test_mode=args.test)

@melange396
Copy link
Collaborator Author

This is fixed in the live database.

I added the changes from this PR in-place on the production machine, then ran the following as the automation user:

cd ~/driver

python3 -m delphi.epidata.acquisition.fluview.fluview_update --issue 202512 --file flu_data/ilinet_cen_202512_20250328_160001.zip 2>&1 | tee -a /var/log/filebeat-pickup/epidata.acquisition.fluview.fluview_update.log
python3 -m delphi.epidata.acquisition.fluview.fluview_update --issue 202512 --file flu_data/ilinet_hhs_202512_20250328_160001.zip 2>&1 | tee -a /var/log/filebeat-pickup/epidata.acquisition.fluview.fluview_update.log
python3 -m delphi.epidata.acquisition.fluview.fluview_update --issue 202512 --file flu_data/ilinet_nat_202512_20250328_160001.zip 2>&1 | tee -a /var/log/filebeat-pickup/epidata.acquisition.fluview.fluview_update.log
python3 -m delphi.epidata.acquisition.fluview.fluview_update --issue 202512 --file flu_data/ilinet_sta_202512_20250328_160001.zip 2>&1 | tee -a /var/log/filebeat-pickup/epidata.acquisition.fluview.fluview_update.log

python3 -m delphi.epidata.acquisition.fluview.fluview_update --issue 202513 --file flu_data/ilinet_cen_202513_20250404_160001.zip 2>&1 | tee -a /var/log/filebeat-pickup/epidata.acquisition.fluview.fluview_update.log
python3 -m delphi.epidata.acquisition.fluview.fluview_update --issue 202513 --file flu_data/ilinet_hhs_202513_20250404_160001.zip 2>&1 | tee -a /var/log/filebeat-pickup/epidata.acquisition.fluview.fluview_update.log
python3 -m delphi.epidata.acquisition.fluview.fluview_update --issue 202513 --file flu_data/ilinet_nat_202513_20250404_160001.zip 2>&1 | tee -a /var/log/filebeat-pickup/epidata.acquisition.fluview.fluview_update.log
python3 -m delphi.epidata.acquisition.fluview.fluview_update --issue 202513 --file flu_data/ilinet_sta_202513_20250404_160001.zip 2>&1 | tee -a /var/log/filebeat-pickup/epidata.acquisition.fluview.fluview_update.log

python3 -m delphi.epidata.acquisition.fluview.fluview_update --issue 202514 --file flu_data/ilinet_cen_202514_20250411_160003.zip 2>&1 | tee -a /var/log/filebeat-pickup/epidata.acquisition.fluview.fluview_update.log
python3 -m delphi.epidata.acquisition.fluview.fluview_update --issue 202514 --file flu_data/ilinet_hhs_202514_20250411_160003.zip 2>&1 | tee -a /var/log/filebeat-pickup/epidata.acquisition.fluview.fluview_update.log
python3 -m delphi.epidata.acquisition.fluview.fluview_update --issue 202514 --file flu_data/ilinet_nat_202514_20250411_160003.zip 2>&1 | tee -a /var/log/filebeat-pickup/epidata.acquisition.fluview.fluview_update.log
python3 -m delphi.epidata.acquisition.fluview.fluview_update --issue 202514 --file flu_data/ilinet_sta_202514_20250411_160003.zip 2>&1 | tee -a /var/log/filebeat-pickup/epidata.acquisition.fluview.fluview_update.log

and then ran this SQL on the database:

UPDATE epidata.fluview SET release_date='2025-03-28' WHERE issue=202512;
UPDATE epidata.fluview SET release_date='2025-04-04' WHERE issue=202513;
UPDATE epidata.fluview SET release_date='2025-04-11' WHERE issue=202514;

@melange396
Copy link
Collaborator Author

I did notice that in the in the main() for fluview_update.py we don't actually updated the PHL files. Is the public data going to be live at some point?

@aysim319 Correct, the "public" dataset is not currently being updated, but its file still exists in the archives (and it was also renamed to "ICL"). The history is a little weird:

That was all within a week, 6.5 years ago (before my time here). Strangely, there is data in the fluview_public table with issue dates between Oct 2018 and Aug 2020. Perhaps the code was un-commented-out by hand on the production system way back then, but then it got overwritten or replaced during the flurry of activity that Delphi had in 2020. ¯\_(ツ)_/¯

@melange396 melange396 merged commit 4043173 into dev Apr 16, 2025
8 checks passed
@melange396 melange396 deleted the fluview_csvfile_rename branch April 16, 2025 02:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants