jax-aht

Welcome to JaxAHT! This is a Jax-based benchmark repository for Ad Hoc Teamwork. For a quick introduction to the benchmark, please see our Colab tutorial notebook.

If you find this repository useful for your research, please cite,

@misc{jaxaht2025,
  author = {Learning Agents Research Group},
  title = {JaxAHT},
  year = {2025},
  month = {September},
  note = {Version 1.0.0},
  url = {https://github.com/LARG/jax-aht},
}

Design Philosophy

The JaxAHT library is designed to (1) facilitate research across the entire lifecycle of ad hoc teamwork, and (2) ease the evaluation of ad hoc agents (ego agents) for commonly used AHT benchmark tasks. As such, the benchmark includes:

AHT algorithms
Environments
Evaluation teammates for each environment

The library includes a variety of MARL/AHT algorithms, as AHT research often requires orchestrating multiple algorithms:

An ego agent training algorithm
A teammate generation algorithm
A multi-agent reinforcement learning (MARL) algorithm
A single-agent reinforcement learning method to act as a best response operator.

Research focused on one type of algorithm can require other types for training/evaluation purposes. For example, to evaluate a teammate generation method, an ego agent training method is necessary. Similarly, an ego agent training method requires a set of training teammates, which may be generated by a teammate generation algorithm or a MARL algorithm.

This codebase aims to provide a unified interface for the above AHT procedures, to enable research on both individual procedures and combinations thereof. On the other hand, to facilitate fast iteration, we take inspiration from the single-file model used by projects such as JaxMARL and CleanRL and minimally modularize the code. Algorithms are largely implemented in a single-file format to enable researchers to easily understand and build upon existing methods. However, the agent interface is shared by all methods, to allow agents trained by one algorithm to easily be used by another algorithm---a common workflow for AHT research.

Our modularization is restricted to environments, agents, and populations, which allows us to cleanly interface the algorithm types above, while placing most of the logic for any single algorithm within a single file.

Available Algorithms and Environments

Category	Algorithm	Description	Paper
Ego Agent Training	PPO Ego	Trains a PPO agent against a population of homogeneous partner agents.	-
	LIAM Ego	Trains a LIAM agent against a population of homogeneous partner agents.	Papoudakis et al. 2021
	MeLIBA Ego	Trains a MeLIBA agent against a population of homogeneous partner agents.	Zintgraf et al. 2022
Teammate Generation	FCP (Fictitious Co-Play)	Generates diverse teammates using varying seeds and checkpoints of IPPO.	Strouse et al. 2021
	BRDiv	Generates diverse teammates using best response diversity (BRDiv) metric.	Rahman et al. 2022
	LBRDiv	Generates diverse teammates via emulating the minimum coverage set.	Rahman et al. 2024
	CoMeDi	Generates diverse teammates by optimizing mixed-play.	Sarkar et al. 2023
MARL	IPPO	Multi-agent reinforcement learning using independent PPO agents with parameter sharing	Yu et al. 2022
Open-Ended Training	ROTATE	Open-ended training using cooperative regret maximization	Wang et al. 2025
	PAIRED	Open-ended training based on the PAIRED algorithm from the unsupervised environment design literature.	Dennis et al. 2020
	Open-Ended Minimax	Open-ended training baseline using minimax return optimization.	-

Supported Environments

Environment	Source	Description	Variants	Evaluation Teammates
Level-Based Foraging (LBF)	Jumanji	Cooperative foraging environment where agents must work together to collect food	7x7 grid with full observability	✅
Overcooked-v1	JaxMARL	Cooperative cooking environment where agents must coordinate to prepare and serve dishes	asymm_advantages, coord_ring, counter_circuit, cramped_room, forced_coord	✅

🚀 Installation Guide

Follow instructions at docs/install_instructions.md to install the necessary libraries.

Evaluating trained agents against the heldout evaluation set requires downloading the evaluation agents. We also provide the best returns achieved against each evaluation agent in our experiments. Directories containing both data can be obtained by running the provided data download script:

python download_eval_data.py

▶️ Getting Started:

For a quick introduction to the benchmark, please see our Colab tutorial notebook.

Algorithms are sorted into four main directories in this codebase.

ego_agent_training/: contains algorithms for training an AHT agent against a pre-specified set of teammates
marl/: contains MARL algorithms for training a team of agents from scratch
open_ended_training/: contains open-ended AHT algorithms
teammate_generation/: contains teammate generation algorithms.

Each contains a run.py, that serves as an entry point. We provide an experiments.sh for open-ended and teammate generation methods that runs the algorithm specified at the top of the experiments.sh, on LBF and Overcooked tasks.

📝 Code Overview

🎨 Code Style

JaxMARL follows a single-script training paradigm, which enables jit-compiling the entire RL training loop and makes it simple for researchers to modify algorithms. We follow a similar paradigm, but use agent and population interfaces, along with some common utility functions to avoid code duplication.

✔️ Code Assumptions/Gotchas

The code makes the following assumptions:

Agent policies are assumed to handle "done" signals and reset internally.
Environments have homogeneous agents and discrete actions
Environments are assumed to "auto-reset", i.e. when the episode is done, the step function should check for this and reset the environment if needed.

Gotchas

The metric, returned_episode_returns is automatically tracked and logged by the LogWrapper. It corresponds to summing up the reward returned by env.step() over an episode. Thus, if an environment returns a shaped reward, it corresponds to the shaped return.

🗺️ Project Structure

The project structure is described here. Additional notes about some folders are provided.

agents/: Contains agent related implementations.
common/: Shared utilities and common code.
envs/: Environment implementations and wrappers.
evaluation/: Evaluation and visualization scripts.
ego_agent_training/: All ego agent learning implementations (PPO, LIAM, and MeLIBA).
marl/: MARL algorithm implementations. Currently only supports IPPO.
open_ended_training/: Open-ended learning methods (ROTATE, PAIRED, Minimax Return).
teammate_generation/: Teammate generation algorithms (BRDiv, FCP, CoMeDi).
tests/: Test scripts used during development.

💡Algorithm Implementations

The algorithms in this codebase are divided into four categories, and each is stored in its own directory:

MARL algorithms, located at marl/
AHT (Ad Hoc Teamwork) algorithms
- Ego agent training methods, located at ego_agent_training/
- Two-stage teammate generation methods, located at teammate_generation/
- Open-ended AHT methods, located at open_ended_training/

Note that algorithms from the marl/ and ego_agent_training/ categories are called as subroutines in the other two categories. For example:

FCP uses the marl/ippo implementation as the teammate generation subroutine.
Two-stage teammate generation methods use ego_agent_training/ppo_ego.py as the ego agent training routine.

Running an Algorithm on a Task

Within each directory, there is a run.py which serves as the entry point for all algorithms implemented within the directory.

We use Hydra to manage algorithm and task configurations. In each directory above, there is a configs/ directory with the following subdirectories:

configs/algorithm/: Contains algorithm configs, for each algorithm and task combination.
configs/hydra/: Contains Hydra settings.
configs/task/: Contains environment configs necessary to specify a task.

Given an algorithm and task, Hydra retrieves the appropriate configs from the subdirectories above and merges them into the master config found in configs/base_config_<method_type>.yaml (e.g., configs/base_config_teammate_generation.yaml). The algorithm and task may be manually specified by modifying the master config, or by using Hydra's command line argument support.

For example, the following command runs Fictitious Co-Play on the Level-Based Foraging (LBF) task:

python teammate_generation/run.py task=lbf algorithm=fcp/lbf

Note that Hydra allows the user to modify any config value specified in the algorithm/task config files from the command line. For example, to set the number of training interactions for FCP, use the following command:

python teammate_generation/run.py task=lbf algorithm=fcp/lbf algorithm.TOTAL_TIMESTEPS=1e5

Logging

By default, results are logged to a local results/ directory, as specified within the configs/hydra/hydra_simple.yaml file for each method type, and to the Weights & Biases (wandb) project specified in the master config. All metrics are logged using wandb and can be viewed using the wandb web interface. Please see the wandb documentation for general information about wandb.

Logging settings in each master config allow the user to control whether logging is enabled/disabled.

🤖 Agents

The agents/ directory contains:

Heuristic agents for Overcooked and LBF environments.
Various actor-critic architectures.
Population and agent interfaces for RL agents.

You can test the Overcooked heuristic agents by running, python tests/test_overcooked_agents.py, and the LBF heuristic agents by running, python tests/test_lbf_agents.py.

🚶 Loading Teammates

Certain workflows within this project (namely, ego agent training, heldout evaluation) require teammate policies as inputs. The user may provide these teammate policies by specifying a partner config that may point to heuristic or RL-based partner policies.

By default, the heldout evaluation workflow uses the downloaded evaluation teammates, and the corresponding partner config is specified at evaluation/configs/global_heldout_settings.yaml --- thus, no intervention from the user is necessary to perform heldout evaluations.

However, the ego agent training workflow requires the user to specify a partner agent config. A quick example of how to run an ego agent training algorithm with particular partner config is provided in our tutorial notebook. More details on how to specify the partner config are provided at the top of the ego agent training scripts.

🌳 Environments

Level-Based Foraging (LBF)

This codebase uses the Jumanji LBF implementation. The wrapper for the Jumanji LBF environment is stored in the envs/ directory, at envs/lbf/lbf_wrapper.py. A corresponding test script is stored at tests/test_lbf_wrapper.py. `

Overcooked-v1

We made some modifications to the JaxMARL Overcooked environment to improve the functionality and ensure environments are solvable.

Initialization randomization: Previously, setting random_reset would lead to random initial agent positions, and randomized initial object states (e.g. pot might be initialized with onions already in it, agents might be initialized holding plates, etc.). We separate the functionality of the argument random_reset into two arguments: random_reset and random_obj_state, where random_reset only controls the initial positions of the two agents.
Agent initial positions: previously, in a map with disconnected components, it was possible for two agents to be spawned in the same component, making it impossible to solve the task. The Overcooked-v1 environment initializes agents such that one is always spawned on each side of the map.

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

🔗 See Also

This project was inspired by the following Jax-based RL repositories. Please check them out!

JaxMARL: a library with Jax-based MARL algorithms and environments
Jumanji: a library with Jax implementations of several MARL environments
Minimax: a library with Jax implementations of single-agent UED algorithms
ROTATE: code for the ROTATE paper (Wang et al. 2025), which this benchmark is built off of.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

jax-aht

Design Philosophy

Available Algorithms and Environments

Supported Environments

Table of Contents

🚀 Installation Guide

▶️ Getting Started:

📝 Code Overview

🎨 Code Style

✔️ Code Assumptions/Gotchas

🗺️ Project Structure

💡Algorithm Implementations

Running an Algorithm on a Task

Logging

🤖 Agents

🚶 Loading Teammates

🌳 Environments

Level-Based Foraging (LBF)

Overcooked-v1

📄 License

🔗 See Also

About

Uh oh!

Releases 2

Packages

Uh oh!

Contributors 9

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 602 Commits
agents		agents
assets		assets
common		common
docs		docs
ego_agent_training		ego_agent_training
envs		envs
evaluation		evaluation
marl		marl
open_ended_training		open_ended_training
teammate_generation		teammate_generation
tests		tests
tutorials		tutorials
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
__init__.py		__init__.py
download_eval_data.py		download_eval_data.py
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

License

LARG/jax-aht

Folders and files

Latest commit

History

Repository files navigation

jax-aht

Design Philosophy

Available Algorithms and Environments

Supported Environments

Table of Contents

🚀 Installation Guide

▶️ Getting Started:

📝 Code Overview

🎨 Code Style

✔️ Code Assumptions/Gotchas

🗺️ Project Structure

💡Algorithm Implementations

Running an Algorithm on a Task

Logging

🤖 Agents

🚶 Loading Teammates

🌳 Environments

Level-Based Foraging (LBF)

Overcooked-v1

📄 License

🔗 See Also

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 2

Packages 0

Uh oh!

Contributors 9

Languages

Packages