Aligning Distributionally Robust Optimization with Practical Deep Learning Needs

TL;DR: ALSO is adaptive distributionally robust optimizer, which is compatible with DL pipelines.

This repository contains the official implementation for the paper "Aligning Distributionally Robust Optimization with Practical Deep Learning Needs"

The repository provides:

The official implementation of the ALSO method.
The code for experiments used in the paper.

Abstract:

Deep learning (DL) models often struggle with real-world data heterogeneity, such as class imbalance or varied data sources, as standard training methods treat all samples equally. Distributionally Robust Optimization (DRO) offers a principled approach by optimizing for a worst-case data distribution. However, a significant gap exists between DRO and current DL practices. DRO methods often lack adaptive parameter updates (like Adam), struggle with the non-convexity of neural networks, and are difficult to integrate with group-based weighting in standard mini-batch training pipelines. This paper aims to bridge this gap by introducing ALSO – Adaptive Loss Scaling Optimizer – a novel optimizer that integrates an adaptive, Adam-like update for the model parameters with an efficient, principled mechanism for learning worst-case data weights. Crucially, it supports stochastic updates for both model parameters and data weights, making it fully compatible with group-based weighting and standard Deep Learning training pipelines. We prove the convergence of our proposed algorithm for non-convex objectives, which is the typical case for DL models. Empirical evaluation across diverse Deep Learning tasks characterized by different types of data heterogeneity demonstrates that ALSO outperforms both traditional DL approaches and existing DRO methods.

Using ALSO in practice

To use ALSO outside of this repository, you only need the following:

also.py: the minimal single-file implementation.
example.ipynb: the end-to-end example of training model with ALSO.
The section about hyperparameters.

Table of contents

Using ALSO in practice
Hyperparameters
Experiments from the paper
How to cite

Hyperparameters

This section covers default hyperparameters and hyperparameter tuning.

Default hyperparameters

Note

This section only provides general advice on default hyperparameter. Our experiments indicate ALSO to be robust to hyperparameter tuning in terms of gains over baselines. However, additional tuning can lead to significant performance boost However, we strongly recommend to use pi_reg and pi_init computed as softmax of static weights (i.e., the inverse of ratio of samples of the given class)

Hyperparameter	$\texttt{ALSO}$
mode	optimistic
alpha	1.0
loss_scale	n_groups / batch_size
lr	0.001
weight_decay	0.001
pi_lr	0.001
pi_decay	0.01
pi_reg	Uniform
pi_init	Uniform

Hyperparameter tuning

During our experiments, we found a reasonable search space for hyperparameters. However, one can feel free to change it to make it more suitable for the concrete task.

Hyperparameter	Search space for $\texttt{ALSO}$
mode	optimistic
alpha	1.0
loss_scale	n_groups / batch_size
lr	LogUniform[0.00003, 0.001]
weight_decay	LogUniform[0.0001, 0.1]
pi_lr	LogUniform[0.00001, 0.001]
pi_decay	LogUniform[0.001, 1]
pi_reg	Uniform/Static Weights
pi_init	Uniform/Static Weights

In our experiments we used 100 iterations of the TPE sampler from Optuna. Thus, we suggest to use this amount of iterations. However, in compute limited setup, we suppose that 40-50 iterations should result in a somewhat reasonable.

Additionally, we do not recommend to include pi_reg and pi_init into the search space, we suggest to simply use Static Weights when reasonable and Uniform elsewise (at least, use the same value for both parameters). However, if one wants to tune this parameters, we recommend to run tuning for each variant potentially with less trials.

Experiments from the paper

We publish experiments from the paper in the corresponding folder.

TODO

Distributed Training Experiments
Robust Training to Adversarial Attacks Experiments
Ablation Studies Code

How to cite

@article{feoktistov2025aligning,
  title={Aligning distributionally robust optimization with practical deep learning needs},
  author={Feoktistov, Dmitrii and Ignashin, Igor and Veprikov, Andrey and Borovko, Nikita and Bogdanov, Alexander and Chezhegov, Savelii and Beznosikov, Aleksandr},
  journal={arXiv preprint arXiv:2508.16734},
  year={2025}
}

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
paper		paper
LICENSE		LICENSE
README.md		README.md
also.py		also.py
example.ipynb		example.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Aligning Distributionally Robust Optimization with Practical Deep Learning Needs

Abstract:

Using ALSO in practice

Hyperparameters

Experiments from the paper

TODO

How to cite

About

Uh oh!

Releases

Packages

Languages

License

brain-lab-research/ALSO

Folders and files

Latest commit

History

Repository files navigation

Aligning Distributionally Robust Optimization with Practical Deep Learning Needs

Abstract:

Using ALSO in practice

Hyperparameters

Experiments from the paper

TODO

How to cite

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages