StarEmbed: Benchmarking Time Series Foundation Models on Astronomical Observations of Variable Starts
The first benchmark to test the state-of-the-art TSFMs on stellar time series observations ("light curves").
A complete benchmark framework for astronomical time series. This repository includes tools for (1) preprocessing raw light curves, (2) generating embeddings (with TSFMs and Astromer), (3) engineering handcrafted features, and (4) comprehensive evaluations on clustering, classification, and out-of-distribution detection.
| 🏠Benchmark Page | 🤗Huggingface Dataset | 📖Paper |
Raw light curve preprocessing and data preparation scripts
→ See datasets/README.md for detailed preprocessing workflows
Time series foundation model implementations and embedding generation
- Astromer 1&2: Transformer-based astronomical time series model
- Chronos: Amazon's forecasting foundation model
- Moirai: Salesforce's universal time series model
compute_avg_embeddings.py: Generate combined embeddings from multi-band data
Evaluation pipeline with pre-computed embeddings
- Classification: kNN, Linear models, MLPs, Random Forest with HPO
- Clustering: K-Means, hierarchical clustering, t-SNE visualization
→ Seebenchmark/README.mdfor complete evaluation workflows
job scripts for evaluation with hyperparameter search and multi-run script
- Preprocess data:
datasets/→ Raw light curves to standardized format - Generate embeddings:
model/→ Extract features using TSFMs - Create combined embeddings:
model/compute_avg_embeddings.py→ Multi-band aggregation - Run evaluations:
benchmark/→ Classification, clustering, visualization
All the code are under MIT license.