|
| 1 | +# CaTSG for Causal Time Series Generation |
| 2 | + |
| 3 | +This repository provides the implementation code corresponding to our paper entitled [Causal Time Series Generation via Diffusion Models](https://arxiv.org/pdf/2509.20846). |
| 4 | +The code is implemented using PyTorch 1.13.0 and PyTorch Lightning 1.4.2 framework on a server with NVIDIA A100 80GB PCIe. |
| 5 | + |
| 6 | +## Description |
| 7 | +In the paper, we introduce *causal time series generation* as a new time series generation task family, formalized within Pearl’s causal ladder, include *interventional* and *counterfactual* settings. |
| 8 | + |
| 9 | +To instantiate these tasks, we develop **CaTSG**, a unified diffusion-based generative framework with backdoor-adjusted guidance that steers sampling toward desired interventional and individual counterfactual distributions. |
| 10 | + |
| 11 | + |
| 12 | + |
| 13 | +## Installation |
| 14 | + |
| 15 | +### Requirements |
| 16 | +CaTSG uses the following dependencies: |
| 17 | +- Pytorch 1.13.0 and PyTorch Lightning 1.4.2 |
| 18 | +- Numpy and Scipy |
| 19 | +- Python 3.8 |
| 20 | +- CUDA 11.7 or latest version, cuDNN |
| 21 | + |
| 22 | +### Setup Environment |
| 23 | + |
| 24 | +Please first clone the **TimeCraft** repository and then set up the environment for CaTSG. |
| 25 | + |
| 26 | +```bash |
| 27 | +# Clone the repository |
| 28 | +git clone https://github.com/microsoft/TimeCraft.git |
| 29 | +cd TimeCraft/CaTSG |
| 30 | + |
| 31 | +# Create and activate conda environment |
| 32 | +conda env create -f environment.yml |
| 33 | +conda activate catsg |
| 34 | +``` |
| 35 | + |
| 36 | +## Dataset Preparation |
| 37 | + |
| 38 | +This project supports both **synthetic datasets** for controlled experiments and **real-world datasets** for practical evaluation. |
| 39 | + |
| 40 | +### Overview |
| 41 | + |
| 42 | +- **Synthetic datasets** |
| 43 | +We construct two synthetic datasets which simulate a class of damped mechanical oscillators governed by second-order differential equations $m \cdot \ddot{x}(t) + \gamma \cdot \dot{x}(t) + k \cdot x(t) = 0$. Details are presented in the appendix of our paper. |
| 44 | + - **Harmonic-VM**: Harmonic Oscillator with Variable Mass |
| 45 | + - **Harmonic-VP**: Harmonic Oscillator with Variable Parameters |
| 46 | + |
| 47 | +- **Real-world datasets** |
| 48 | + - **[Air Quality](https://archive.ics.uci.edu/dataset/501/beijing+multi+site+air+quality+data)**: |
| 49 | + Four years of hourly air quality and meteorological measurements from **12 monitoring stations** in Beijing, China. |
| 50 | + - **[Traffic](https://archive.ics.uci.edu/dataset/492/metro+interstate+traffic+volume)**: |
| 51 | + Hourly traffic volume recorded on Interstate 94 near Minneapolis–St Paul, USA, including weather and holiday indicators. |
| 52 | + |
| 53 | +### Synthetic Datasets |
| 54 | + |
| 55 | +You can also create the datasets from scratch: |
| 56 | +```bash |
| 57 | +# Harmonic-VM |
| 58 | +python utils/tsg_dataset_creator.py --config configs/dataset_config/harmonic_vm.yaml |
| 59 | + |
| 60 | +# Harmonic-VP |
| 61 | +python utils/tsg_dataset_creator.py --config configs/dataset_config/harmonic_vp.yaml |
| 62 | +``` |
| 63 | + |
| 64 | +### Real-World Datasets |
| 65 | + |
| 66 | +#### Step 1: Download Raw Data |
| 67 | + |
| 68 | +- **Air Quality**: Download [here](https://archive.ics.uci.edu/dataset/501/beijing+multi+site+air+quality+data) and unzip the dataset. Place all `.csv` files from `PRSA_Data_20130301-20170228`(12 statations data) into `./data_raw/AQ/` folder. |
| 69 | +- **Traffic**: Download [here](https://archive.ics.uci.edu/dataset/492/metro+interstate+traffic+volume) and unzip the dataset. Place the single csv file into `./data_raw/Metro_Interstate_Traffic_Volume.csv`. |
| 70 | + |
| 71 | +After downloading, the directory should look like: |
| 72 | + |
| 73 | + ```bash |
| 74 | + data_raw |
| 75 | + ├── AQ |
| 76 | + │ ├── PRSA_Data_Aotizhongxin_20130301-20170228.csv |
| 77 | + │ ├── ... |
| 78 | + │ └── PRSA_Data_Wanshouxigong_20130301-20170228.csv |
| 79 | + └── Metro_Interstate_Traffic_Volume.csv |
| 80 | + ``` |
| 81 | + |
| 82 | +#### Step 2: Preprocess into Required Format |
| 83 | +Run the following commands to generate processed datasets: |
| 84 | +```bash |
| 85 | +# Air Quality dataset |
| 86 | +python utils/tsg_dataset_creator.py --config configs/dataset_config/aq.yaml |
| 87 | + |
| 88 | +# Traffic dataset |
| 89 | +python utils/tsg_dataset_creator.py --config configs/dataset_config/traffic.yaml |
| 90 | +``` |
| 91 | + |
| 92 | +### Dataset split |
| 93 | + |
| 94 | +The default dataset splits used in our experiments are listed below. |
| 95 | +You can modify them in `configs/dataset_config/{dataset}.yaml`. |
| 96 | +For the Air Quality dataset split, an interactive map of station locations is available [**here**](./assets/aq_station_loc.html). |
| 97 | + |
| 98 | +| Type | Dataset | Target ($x$) | Context ($c$) | Default split strategy | Samples (Train/Val/Test) | |
| 99 | +|------------|-------------|----------------|-------------------------------------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|--------------------------| |
| 100 | +| Synthetic | Harmonic-VM | Acceleration | Velocity, Position | $\alpha$-based: Train $[0.0,0.2]$; Val $[0.3, 0.5]$; Test $[0.6,1.0]$ | 3,000/ 1,000/ 1,000 | |
| 101 | +| Synthetic | Harmonic-VP | Acceleration | Velocity, Position | Combination-based: Train: $\alpha \in [0.0, 0.2]$, $\beta \in [0.0, 0.01]$, $\eta \in [0.002, 0.08]$ Val: $\alpha \in [0.3, 0.5]$, $\beta \in [0.018, 0.022]$, $\eta \in [0.18, 0.22]$ Test: $\alpha \in [0.6, 1.0]$, $\beta \in [0.035, 0.04]$, $\eta \in [0.42, 0.5]$ | 3,000/ 1,000/ 1,000 | |
| 102 | +| Real-world | Air Quality | $PM_{2.5}$ | TEMP, PRES, DEWP, WSPM, RAIN, wd | Station-based: Train (Dongsi, Guanyuan, Tiantan, Wanshouxigong, Aotizhongxin, Nongzhanguan, Wanliu, Gucheng); Val (Changping, Dingling); Test (Shunyi, Huairou) | 11,664/2,916/2,916 | |
| 103 | +| Real-world | Traffic | traffic_volume | rain_1h, snow_1h, clouds_all, weather_main, holiday | Temperature-based: Train (<12°C); Val ([12,22]°C); Test (>22°C) | 26,477/16,054/5,578 | |
| 104 | + |
| 105 | +## Quick Start |
| 106 | + |
| 107 | +Train CaTSG on the harmonic dataset and test both intervention and counterfactual tasks: |
| 108 | + |
| 109 | +```bash |
| 110 | +# 1) Training (automatically runs both int and cf evaluation after training) |
| 111 | +python main.py --base configs/catsg.yaml --dataset harmonic_vm --train |
| 112 | + |
| 113 | +# 2) Testing specific tasks |
| 114 | +python main.py --base configs/catsg.yaml --dataset harmonic_vm --test int |
| 115 | +python main.py --base configs/catsg.yaml --dataset harmonic_vm --test cf_harmonic |
| 116 | +``` |
| 117 | + |
| 118 | +**Outputs** |
| 119 | +- **Logs**: saved under `logs/<dataset>/CaTSG/<exp_name>/` |
| 120 | +- **Results**: saved as `.csv` under `results/<dataset>/` |
| 121 | + |
| 122 | +## Citation |
| 123 | + |
| 124 | +If you find our work useful, please cite: |
| 125 | + |
| 126 | +```bibtex |
| 127 | +@article{xia2025causal, |
| 128 | + title={Causal Time Series Generation via Diffusion Models}, |
| 129 | + author={Xia, Yutong and Xu, Chang and Liang, Yuxuan and Wen, Qingsong and Zimmermann, Roger and Bian, Jiang}, |
| 130 | + journal={arXiv preprint arXiv:2509.20846}, |
| 131 | + year={2025} |
| 132 | +} |
| 133 | +``` |
0 commit comments