Skip to content

Implement Google Cloud Batch for parallel policy impact calculations #28

@PavelMakarchuk

Description

@PavelMakarchuk

Problem

Running 1,200 simulations (8 reforms × 75 years x static/dynamic) takes ~200 hours sequentially. Each simulation takes ~20 minutes to calculate baseline and reform income tax totals.

Goal: Complete in 2-3 hours using Google Cloud Batch

Solution

Two-phase parallel approach:

  1. Phase 1: Compute 75 baseline totals (one per year) → 2 minutes
  2. Phase 2: Compute 1,200 reform totals using cached baselines → 2 hours

Time: ~2 hours | Cost: ~$4/run

Implementation

Prerequisites (Already Complete ✅)

  • Google Cloud CLI installed
  • Authenticated with [email protected]
  • Project: policyengine-api with billing enabled
  • APIs enabled: Batch, Compute, Storage, Artifact Registry

1. Create Cloud Storage Bucket

gsutil mb -l us-central1 gs://crfb-ss-analysis-results/
  1. Create Files in batch/ Directory

Core files:

  • compute_baseline.py - Worker: calculates baseline for one year, saves to bucket
  • compute_reform.py - Worker: downloads baseline, calculates reform, saves result
  • submit_baselines.py - Submits 75 baseline jobs (75 parallel workers)
  • submit_reforms.py - Submits 1,200 reform jobs (200 parallel workers)
  • download_results.py - Combines all results into data/policy_impacts_dynamic.csv
  • Dockerfile - Container with Python 3.13 + PolicyEngine + dependencies
  • requirements.txt - policyengine-us, policyengine-core, pandas, google-cloud-storage

Result format:
{
"reform_id": "option1",
"reform_name": "Full Repeal...",
"year": 2026,
"baseline_total": 1234567890,
"reform_total": 1111111111,
"revenue_impact": -123456779,
"scoring_type": "dynamic"
}

File Structure

crfb-tob-impacts/
├── batch/
│ ├── compute_baseline.py # Phase 1 worker
│ ├── compute_reform.py # Phase 2 worker
│ ├── submit_baselines.py # Phase 1 submission
│ ├── submit_reforms.py # Phase 2 submission
│ ├── download_results.py # Result aggregation
│ ├── Dockerfile # Container definition
│ └── requirements.txt # Python dependencies
├── src/
│ └── reforms.py # Existing reform definitions
└── data/
└── policy_impacts_dynamic.csv # Final output

Cloud Storage:
gs://crfb-ss-analysis-results/
├── baselines/
│ ├── 2026.json → {"year": 2026, "baseline_total": 1234567890}
│ ├── 2027.json
│ └── ... (75 files, ~10KB total)
└── results/
└── reform-abc123/
├── option1_2026.json
└── ... (1,200 files, ~50KB total)

  1. Build Docker Container

gcloud config set project policyengine-api
gcloud builds submit --tag gcr.io/policyengine-api/ss-calculator:latest batch/

  1. Test (2 years, 2 reforms = 4 simulations)

python batch/submit_baselines.py --years 2026,2027
python batch/submit_reforms.py --reforms option1,option2 --years 2026,2027
python batch/download_results.py --job-id [JOB_ID]

  1. Run Full Job

python batch/submit_baselines.py --years 2026-2100 # 2 mins
python batch/submit_reforms.py --years 2026-2100 # 2 hours
python batch/download_results.py --job-id [JOB_ID] # saves CSV with 1,200 rows

File Structure

batch/
├── compute_baseline.py
├── compute_reform.py
├── submit_baselines.py
├── submit_reforms.py
├── download_results.py
├── Dockerfile
└── requirements.txt

Cloud Storage:
gs://crfb-ss-analysis-results/
├── baselines/{year}.json # 75 files
└── results/{job_id}/{reform}_{year}.json # 1,200 files

Success Criteria

  • Test run (4 sims) completes and matches notebook results
  • Full run (1,200 sims) completes in < 3 hours
  • Final CSV has 1,200 rows with columns: reform_id, reform_name, year, baseline_total, reform_total, revenue_impact, scoring_type

Technical Details

  • Uses CBO labor supply elasticities for dynamic scoring
  • Spot instances for cost savings ($0.01/hr per VM)
  • Phase 1 avoids computing each baseline 8× (once per reform)
  • All logic matches jupyterbook/policy-impacts-dynamic.ipynb

Metadata

Metadata

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions