| hewr | EpiAutoGP | pipelines |
|---|---|---|
The PyRenew-HEW project aims to create short-term forecasts of respiratory disease burden using the PyRenew library and several data sources:
- Hospital Admissions from the National Healthcare Safety Network
- Emergency Department Visits from the National Syndromic Surveillance Program
- Wastewater virus concentration from the National Wastewater Surveillance System
This is a work in progress, and not all data sources are currently integrated into the model.
This repository contains code for the PyRenew-HEW model itself, as well as pipelines for running the model in production, and utilities for working with model outputs.
The project uses GitHub Actions for automatically building container images based on the project's Containerfile. The images are currently hosted on Github Container Registry and are built and pushed via the containers.yaml GitHub Actions workflow.
Images can also be built locally. The Makefile contains several targets for building and pushing images. Although the Makefile uses Docker as the default engine, the ENGINE environment variable can be set to podman to use Podman instead, for example:
ENGINE=podman make container_build
# Equivalent to:
# podman build . -t cfa-stf-routine-forecasting -f ContainerfileContainer images pushed to the Azure Container Registry are automatically tagged as either latest (if the commit is on the main branch) or with the branch name (if the commit is on a different branch). After a branch is deleted, the image tag is remove from the registry via the delete-container-tag.yaml GitHub Actions workflow.
Note
Azure Batch Forecasting Pipelines can only be run by CDC internal users on the CFA Virtual Analyst Platform.
There are two ways to run Azure Batch Modeling Code:
- The Azure Command Center - interactive/manual.
- Dagster Workflow Orchestration - automated, feature rich GUI.
Specific environment setup steps required can be found in the Routine Forecasting Standard Operating Procedure.
You can run uv run pipelines/azure_command_center.py (or make acc) to launch the Azure Command Center.
- The Azure Command Center will check for necessary data before offering to run pipelines.
- You must have previously configured your Azure Credentials and Environment Variables. To do this, run
make config, or follow the steps in the SOP. - The Azure Command Center is meant to be a streamlined interface for interactively running in production.
To execute dagster workflows fully locally with this project, you'll need to have blobs mounted. However, you can also launch jobs locally and have them submit to Azure Batch.
Prerequisites:
uv.docker, a VAP VM with a registered managed identity in Azure.- Permissions to push to the container registry and both
$GH_USERNAMEand$GH_PATset as environment variables in your shell.
The following instructions will set up Dagster on your VAP. However, based on the current configuration, actual execution will still run in the cloud via Azure Batch. You can change the executor option in dagster_defs.py to test using the local Docker Executor - this will require you to have setup Blobfuse. See Using the local docker executor.
- Build and push the
cfa-stf-routine-forecastingcontainer, as also described above:make container_build(requires Docker or Podman)make container_pushto build and push (set$GH_USERNAMEand$GH_PATfirst)make container_explorefor local testing
- Run
uv run dagster_defs.pyand open the terminal link (usually http://127.0.0.1:4000/)
Dagster is now ready to use locally.
Note
The following process has been changing frequently. We will work to firm it up over the coming weeks and months.
- To run a full pyrenew model pipeline run: go to
Jobs→weekly_pyrenew_via_backfill - To run individual models: navigate to
Lineageand select specific assets, making sure to check the Launchpad config and make sure you've selected the appropriate partitions (State x Disease combination).
In development, whenever you update code, rerun make container_push and then Reload Definitions from the dagster lineage page.
Pushing your code to github will also re-build and push the container image, but will typically take longer and you will have to wait for completion in Github Actions.
By default, on this repository, Dagster will submit tasks to Azure Batch for execution.
If you'd like to test a few "tasks" locally, you can have dagster execute on your machine, which is much faster than waiting for Azure Batch to pick up jobs. Dagster can leverage your VM's own docker daemon to emulate Azure Batch. When doing this, take care not to run more than two or three state x disease combinations at a time or you will quickly put your VM into a coma.
When using the Docker Executor, Dagster assumes mounts at ./blobfuse/mounts/ in the working directory.
make mount: mounts the pyrenew-relevant blobs using blobfuse. Use this before launching locally-executed dagster jobs.make unmount: gracefully unmounts the pyrenew-relevant blobs.
From our production dagster server, you can run and schedule model runs and see other projects' pipelines at CFA.
- Pushes to main will automatically update this server via a Github Actions Workflow.
- Before pushing to
main, make sure you have thoroughly tested your own branch and gotten a PR review. - It is good practice to periodically re-sync (
uv sync) and even re-create your virtual environment if your branch has been open a while to make sure dependencies are up to date.cfa-dagster, our own implementation of dagster, updates frequently. To specifically update that package, runuv lock --upgrade-package cfa-dagster.
This repository was created for use by CDC programs to collaborate on public health related projects in support of the CDC mission. GitHub is not hosted by the CDC, but is a third party website used by CDC and its partners to share information and collaborate on software. CDC use of GitHub does not imply an endorsement of any one particular service, product, or enterprise.
This repository constitutes a work of the United States Government and is not subject to domestic copyright protection under 17 USC § 105. This repository is in the public domain within the United States, and copyright and related rights in the work worldwide are waived through the CC0 1.0 Universal public domain dedication. All contributions to this repository will be released under the CC0 dedication. By submitting a pull request you are agreeing to comply with this waiver of copyright interest.
This repository is licensed under ASL v2 or later.
This source code in this repository is free: you can redistribute it and/or modify it under the terms of the Apache Software License version 2, or (at your option) any later version.
This source code in this repository is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the Apache Software License for more details.
You should have received a copy of the Apache Software License along with this program. If not, see http://www.apache.org/licenses/LICENSE-2.0.html
The source code forked from other open source projects will inherit its license.
This repository contains only non-sensitive, publicly available data and information. All material and community participation is covered by the Disclaimer and Code of Conduct. For more information about CDC's privacy policy, please visit http://www.cdc.gov/other/privacy.html.
Anyone is encouraged to contribute to the repository by forking and submitting a pull request. (If you are new to GitHub, you might start with a basic tutorial.) By contributing to this project, you grant a world-wide, royalty-free, perpetual, irrevocable, non-exclusive, transferable license to all users under the terms of the Apache Software License v2 or later.
All comments, messages, pull requests, and other submissions received through CDC including this GitHub page may be subject to applicable federal law, including but not limited to the Federal Records Act, and may be archived. Learn more at http://www.cdc.gov/other/privacy.html.
This repository is not a source of government records but is a copy to increase collaboration and collaborative potential. All government records will be published through the CDC web site.