Skip to content

Commit 6a6217e

Browse files
authored
Merge pull request #19 from yutong-xia/integrate-catsg
Integrate CaTSG: Causal Time Series Generation Model
2 parents d07e60f + 9093431 commit 6a6217e

31 files changed

+9564
-0
lines changed

CaTSG/README.md

Lines changed: 133 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,133 @@
1+
# CaTSG for Causal Time Series Generation
2+
3+
This repository provides the implementation code corresponding to our paper entitled [Causal Time Series Generation via Diffusion Models](https://arxiv.org/pdf/2509.20846).
4+
The code is implemented using PyTorch 1.13.0 and PyTorch Lightning 1.4.2 framework on a server with NVIDIA A100 80GB PCIe.
5+
6+
## Description
7+
In the paper, we introduce *causal time series generation* as a new time series generation task family, formalized within Pearl’s causal ladder, include *interventional* and *counterfactual* settings.
8+
9+
To instantiate these tasks, we develop **CaTSG**, a unified diffusion-based generative framework with backdoor-adjusted guidance that steers sampling toward desired interventional and individual counterfactual distributions.
10+
11+
![image](./assets/framework.png)
12+
13+
## Installation
14+
15+
### Requirements
16+
CaTSG uses the following dependencies:
17+
- Pytorch 1.13.0 and PyTorch Lightning 1.4.2
18+
- Numpy and Scipy
19+
- Python 3.8
20+
- CUDA 11.7 or latest version, cuDNN
21+
22+
### Setup Environment
23+
24+
Please first clone the **TimeCraft** repository and then set up the environment for CaTSG.
25+
26+
```bash
27+
# Clone the repository
28+
git clone https://github.com/microsoft/TimeCraft.git
29+
cd TimeCraft/CaTSG
30+
31+
# Create and activate conda environment
32+
conda env create -f environment.yml
33+
conda activate catsg
34+
```
35+
36+
## Dataset Preparation
37+
38+
This project supports both **synthetic datasets** for controlled experiments and **real-world datasets** for practical evaluation.
39+
40+
### Overview
41+
42+
- **Synthetic datasets**
43+
We construct two synthetic datasets which simulate a class of damped mechanical oscillators governed by second-order differential equations $m \cdot \ddot{x}(t) + \gamma \cdot \dot{x}(t) + k \cdot x(t) = 0$. Details are presented in the appendix of our paper.
44+
- **Harmonic-VM**: Harmonic Oscillator with Variable Mass
45+
- **Harmonic-VP**: Harmonic Oscillator with Variable Parameters
46+
47+
- **Real-world datasets**
48+
- **[Air Quality](https://archive.ics.uci.edu/dataset/501/beijing+multi+site+air+quality+data)**:
49+
Four years of hourly air quality and meteorological measurements from **12 monitoring stations** in Beijing, China.
50+
- **[Traffic](https://archive.ics.uci.edu/dataset/492/metro+interstate+traffic+volume)**:
51+
Hourly traffic volume recorded on Interstate 94 near Minneapolis–St Paul, USA, including weather and holiday indicators.
52+
53+
### Synthetic Datasets
54+
55+
You can also create the datasets from scratch:
56+
```bash
57+
# Harmonic-VM
58+
python utils/tsg_dataset_creator.py --config configs/dataset_config/harmonic_vm.yaml
59+
60+
# Harmonic-VP
61+
python utils/tsg_dataset_creator.py --config configs/dataset_config/harmonic_vp.yaml
62+
```
63+
64+
### Real-World Datasets
65+
66+
#### Step 1: Download Raw Data
67+
68+
- **Air Quality**: Download [here](https://archive.ics.uci.edu/dataset/501/beijing+multi+site+air+quality+data) and unzip the dataset. Place all `.csv` files from `PRSA_Data_20130301-20170228`(12 statations data) into `./data_raw/AQ/` folder.
69+
- **Traffic**: Download [here](https://archive.ics.uci.edu/dataset/492/metro+interstate+traffic+volume) and unzip the dataset. Place the single csv file into `./data_raw/Metro_Interstate_Traffic_Volume.csv`.
70+
71+
After downloading, the directory should look like:
72+
73+
```bash
74+
data_raw
75+
├── AQ
76+
│ ├── PRSA_Data_Aotizhongxin_20130301-20170228.csv
77+
│ ├── ...
78+
│ └── PRSA_Data_Wanshouxigong_20130301-20170228.csv
79+
└── Metro_Interstate_Traffic_Volume.csv
80+
```
81+
82+
#### Step 2: Preprocess into Required Format
83+
Run the following commands to generate processed datasets:
84+
```bash
85+
# Air Quality dataset
86+
python utils/tsg_dataset_creator.py --config configs/dataset_config/aq.yaml
87+
88+
# Traffic dataset
89+
python utils/tsg_dataset_creator.py --config configs/dataset_config/traffic.yaml
90+
```
91+
92+
### Dataset split
93+
94+
The default dataset splits used in our experiments are listed below.
95+
You can modify them in `configs/dataset_config/{dataset}.yaml`.
96+
For the Air Quality dataset split, an interactive map of station locations is available [**here**](./assets/aq_station_loc.html).
97+
98+
| Type | Dataset | Target ($x$) | Context ($c$) | Default split strategy | Samples (Train/Val/Test) |
99+
|------------|-------------|----------------|-------------------------------------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|--------------------------|
100+
| Synthetic | Harmonic-VM | Acceleration | Velocity, Position | $\alpha$-based: Train $[0.0,0.2]$; Val $[0.3, 0.5]$; Test $[0.6,1.0]$ | 3,000/ 1,000/ 1,000 |
101+
| Synthetic | Harmonic-VP | Acceleration | Velocity, Position | Combination-based: Train: $\alpha \in [0.0, 0.2]$, $\beta \in [0.0, 0.01]$, $\eta \in [0.002, 0.08]$ Val: $\alpha \in [0.3, 0.5]$, $\beta \in [0.018, 0.022]$, $\eta \in [0.18, 0.22]$ Test: $\alpha \in [0.6, 1.0]$, $\beta \in [0.035, 0.04]$, $\eta \in [0.42, 0.5]$ | 3,000/ 1,000/ 1,000 |
102+
| Real-world | Air Quality | $PM_{2.5}$ | TEMP, PRES, DEWP, WSPM, RAIN, wd | Station-based: Train (Dongsi, Guanyuan, Tiantan, Wanshouxigong, Aotizhongxin, Nongzhanguan, Wanliu, Gucheng); Val (Changping, Dingling); Test (Shunyi, Huairou) | 11,664/2,916/2,916 |
103+
| Real-world | Traffic | traffic_volume | rain_1h, snow_1h, clouds_all, weather_main, holiday | Temperature-based: Train (<12°C); Val ([12,22]°C); Test (>22°C) | 26,477/16,054/5,578 |
104+
105+
## Quick Start
106+
107+
Train CaTSG on the harmonic dataset and test both intervention and counterfactual tasks:
108+
109+
```bash
110+
# 1) Training (automatically runs both int and cf evaluation after training)
111+
python main.py --base configs/catsg.yaml --dataset harmonic_vm --train
112+
113+
# 2) Testing specific tasks
114+
python main.py --base configs/catsg.yaml --dataset harmonic_vm --test int
115+
python main.py --base configs/catsg.yaml --dataset harmonic_vm --test cf_harmonic
116+
```
117+
118+
**Outputs**
119+
- **Logs**: saved under `logs/<dataset>/CaTSG/<exp_name>/`
120+
- **Results**: saved as `.csv` under `results/<dataset>/`
121+
122+
## Citation
123+
124+
If you find our work useful, please cite:
125+
126+
```bibtex
127+
@article{xia2025causal,
128+
title={Causal Time Series Generation via Diffusion Models},
129+
author={Xia, Yutong and Xu, Chang and Liang, Yuxuan and Wen, Qingsong and Zimmermann, Roger and Bian, Jiang},
130+
journal={arXiv preprint arXiv:2509.20846},
131+
year={2025}
132+
}
133+
```

0 commit comments

Comments
 (0)