Skip to content

LARG/jax-aht

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

jax-aht

Welcome to JaxAHT! This is a Jax-based benchmark repository for Ad Hoc Teamwork. For a quick introduction to the benchmark, please see our Colab tutorial notebook.

If you find this repository useful for your research, please cite,

@misc{jaxaht2025,
  author = {Learning Agents Research Group},
  title = {JaxAHT},
  year = {2025},
  month = {September},
  note = {Version 1.0.0},
  url = {https://github.com/LARG/jax-aht},
}

Design Philosophy

Design Philosophy

The JaxAHT library is designed to (1) facilitate research across the entire lifecycle of ad hoc teamwork, and (2) ease the evaluation of ad hoc agents (ego agents) for commonly used AHT benchmark tasks. As such, the benchmark includes:

  • AHT algorithms
  • Environments
  • Evaluation teammates for each environment

The library includes a variety of MARL/AHT algorithms, as AHT research often requires orchestrating multiple algorithms:

  • An ego agent training algorithm
  • A teammate generation algorithm
  • A multi-agent reinforcement learning (MARL) algorithm
  • A single-agent reinforcement learning method to act as a best response operator.

Research focused on one type of algorithm can require other types for training/evaluation purposes. For example, to evaluate a teammate generation method, an ego agent training method is necessary. Similarly, an ego agent training method requires a set of training teammates, which may be generated by a teammate generation algorithm or a MARL algorithm.

This codebase aims to provide a unified interface for the above AHT procedures, to enable research on both individual procedures and combinations thereof. On the other hand, to facilitate fast iteration, we take inspiration from the single-file model used by projects such as JaxMARL and CleanRL and minimally modularize the code. Algorithms are largely implemented in a single-file format to enable researchers to easily understand and build upon existing methods. However, the agent interface is shared by all methods, to allow agents trained by one algorithm to easily be used by another algorithm---a common workflow for AHT research.

Our modularization is restricted to environments, agents, and populations, which allows us to cleanly interface the algorithm types above, while placing most of the logic for any single algorithm within a single file.

Available Algorithms and Environments

Category Algorithm Description Paper
Ego Agent Training PPO Ego Trains a PPO agent against a population of homogeneous partner agents. -
LIAM Ego Trains a LIAM agent against a population of homogeneous partner agents. Papoudakis et al. 2021
MeLIBA Ego Trains a MeLIBA agent against a population of homogeneous partner agents. Zintgraf et al. 2022
Teammate Generation FCP (Fictitious Co-Play) Generates diverse teammates using varying seeds and checkpoints of IPPO. Strouse et al. 2021
BRDiv Generates diverse teammates using best response diversity (BRDiv) metric. Rahman et al. 2022
LBRDiv Generates diverse teammates via emulating the minimum coverage set. Rahman et al. 2024
CoMeDi Generates diverse teammates by optimizing mixed-play. Sarkar et al. 2023
MARL IPPO Multi-agent reinforcement learning using independent PPO agents with parameter sharing Yu et al. 2022
Open-Ended Training ROTATE Open-ended training using cooperative regret maximization Wang et al. 2025
PAIRED Open-ended training based on the PAIRED algorithm from the unsupervised environment design literature. Dennis et al. 2020
Open-Ended Minimax Open-ended training baseline using minimax return optimization. -

Supported Environments

Environment Source Description Variants Evaluation Teammates
Level-Based Foraging (LBF) Jumanji Cooperative foraging environment where agents must work together to collect food 7x7 grid with full observability โœ…
Overcooked-v1 JaxMARL Cooperative cooking environment where agents must coordinate to prepare and serve dishes asymm_advantages, coord_ring, counter_circuit, cramped_room, forced_coord โœ…

Table of Contents

๐Ÿš€ Installation Guide

Follow instructions at docs/install_instructions.md to install the necessary libraries.

Evaluating trained agents against the heldout evaluation set requires downloading the evaluation agents. We also provide the best returns achieved against each evaluation agent in our experiments. Directories containing both data can be obtained by running the provided data download script:

python download_eval_data.py

โ–ถ๏ธ Getting Started:

For a quick introduction to the benchmark, please see our Colab tutorial notebook.

Algorithms are sorted into four main directories in this codebase.

  • ego_agent_training/: contains algorithms for training an AHT agent against a pre-specified set of teammates
  • marl/: contains MARL algorithms for training a team of agents from scratch
  • open_ended_training/: contains open-ended AHT algorithms
  • teammate_generation/: contains teammate generation algorithms.

Each contains a run.py, that serves as an entry point. We provide an experiments.sh for open-ended and teammate generation methods that runs the algorithm specified at the top of the experiments.sh, on LBF and Overcooked tasks.

๐Ÿ“ Code Overview

๐ŸŽจ Code Style

JaxMARL follows a single-script training paradigm, which enables jit-compiling the entire RL training loop and makes it simple for researchers to modify algorithms. We follow a similar paradigm, but use agent and population interfaces, along with some common utility functions to avoid code duplication.

โœ”๏ธ Code Assumptions/Gotchas

The code makes the following assumptions:

  • Agent policies are assumed to handle "done" signals and reset internally.
  • Environments have homogeneous agents and discrete actions
  • Environments are assumed to "auto-reset", i.e. when the episode is done, the step function should check for this and reset the environment if needed.

Gotchas

  • The metric, returned_episode_returns is automatically tracked and logged by the LogWrapper. It corresponds to summing up the reward returned by env.step() over an episode. Thus, if an environment returns a shaped reward, it corresponds to the shaped return.

๐Ÿ—บ๏ธ Project Structure

The project structure is described here. Additional notes about some folders are provided.

  • agents/: Contains agent related implementations.
  • common/: Shared utilities and common code.
  • envs/: Environment implementations and wrappers.
  • evaluation/: Evaluation and visualization scripts.
  • ego_agent_training/: All ego agent learning implementations (PPO, LIAM, and MeLIBA).
  • marl/: MARL algorithm implementations. Currently only supports IPPO.
  • open_ended_training/: Open-ended learning methods (ROTATE, PAIRED, Minimax Return).
  • teammate_generation/: Teammate generation algorithms (BRDiv, FCP, CoMeDi).
  • tests/: Test scripts used during development.

๐Ÿ’กAlgorithm Implementations

The algorithms in this codebase are divided into four categories, and each is stored in its own directory:

  • MARL algorithms, located at marl/
  • AHT (Ad Hoc Teamwork) algorithms
    • Ego agent training methods, located at ego_agent_training/
    • Two-stage teammate generation methods, located at teammate_generation/
    • Open-ended AHT methods, located at open_ended_training/

Note that algorithms from the marl/ and ego_agent_training/ categories are called as subroutines in the other two categories. For example:

  • FCP uses the marl/ippo implementation as the teammate generation subroutine.
  • Two-stage teammate generation methods use ego_agent_training/ppo_ego.py as the ego agent training routine.

Running an Algorithm on a Task

Within each directory, there is a run.py which serves as the entry point for all algorithms implemented within the directory.

We use Hydra to manage algorithm and task configurations. In each directory above, there is a configs/ directory with the following subdirectories:

  • configs/algorithm/: Contains algorithm configs, for each algorithm and task combination.
  • configs/hydra/: Contains Hydra settings.
  • configs/task/: Contains environment configs necessary to specify a task.

Given an algorithm and task, Hydra retrieves the appropriate configs from the subdirectories above and merges them into the master config found in configs/base_config_<method_type>.yaml (e.g., configs/base_config_teammate_generation.yaml). The algorithm and task may be manually specified by modifying the master config, or by using Hydra's command line argument support.

For example, the following command runs Fictitious Co-Play on the Level-Based Foraging (LBF) task:

python teammate_generation/run.py task=lbf algorithm=fcp/lbf

Note that Hydra allows the user to modify any config value specified in the algorithm/task config files from the command line. For example, to set the number of training interactions for FCP, use the following command:

python teammate_generation/run.py task=lbf algorithm=fcp/lbf algorithm.TOTAL_TIMESTEPS=1e5

Logging

By default, results are logged to a local results/ directory, as specified within the configs/hydra/hydra_simple.yaml file for each method type, and to the Weights & Biases (wandb) project specified in the master config. All metrics are logged using wandb and can be viewed using the wandb web interface. Please see the wandb documentation for general information about wandb.

Logging settings in each master config allow the user to control whether logging is enabled/disabled.

๐Ÿค– Agents

The agents/ directory contains:

  • Heuristic agents for Overcooked and LBF environments.
  • Various actor-critic architectures.
  • Population and agent interfaces for RL agents.

You can test the Overcooked heuristic agents by running, python tests/test_overcooked_agents.py, and the LBF heuristic agents by running, python tests/test_lbf_agents.py.

๐Ÿšถ Loading Teammates

Certain workflows within this project (namely, ego agent training, heldout evaluation) require teammate policies as inputs. The user may provide these teammate policies by specifying a partner config that may point to heuristic or RL-based partner policies.

By default, the heldout evaluation workflow uses the downloaded evaluation teammates, and the corresponding partner config is specified at evaluation/configs/global_heldout_settings.yaml --- thus, no intervention from the user is necessary to perform heldout evaluations.

However, the ego agent training workflow requires the user to specify a partner agent config. A quick example of how to run an ego agent training algorithm with particular partner config is provided in our tutorial notebook. More details on how to specify the partner config are provided at the top of the ego agent training scripts.

๐ŸŒณ Environments

Level-Based Foraging (LBF)

This codebase uses the Jumanji LBF implementation. The wrapper for the Jumanji LBF environment is stored in the envs/ directory, at envs/lbf/lbf_wrapper.py. A corresponding test script is stored at tests/test_lbf_wrapper.py. `

Overcooked-v1

We made some modifications to the JaxMARL Overcooked environment to improve the functionality and ensure environments are solvable.

  • Initialization randomization: Previously, setting random_reset would lead to random initial agent positions, and randomized initial object states (e.g. pot might be initialized with onions already in it, agents might be initialized holding plates, etc.). We separate the functionality of the argument random_reset into two arguments: random_reset and random_obj_state, where random_reset only controls the initial positions of the two agents.
  • Agent initial positions: previously, in a map with disconnected components, it was possible for two agents to be spawned in the same component, making it impossible to solve the task. The Overcooked-v1 environment initializes agents such that one is always spawned on each side of the map.

๐Ÿ“„ License

This project is licensed under the MIT License - see the LICENSE file for details.

๐Ÿ”— See Also

This project was inspired by the following Jax-based RL repositories. Please check them out!

  • JaxMARL: a library with Jax-based MARL algorithms and environments
  • Jumanji: a library with Jax implementations of several MARL environments
  • Minimax: a library with Jax implementations of single-agent UED algorithms
  • ROTATE: code for the ROTATE paper (Wang et al. 2025), which this benchmark is built off of.