Synthesizing Post-Training Data for LLMs through Multi-Agent Simulation Paper
MATRIX-Tuned-Model outperforms others, including Llama-3-8B-Instruct, with significantly less data.
-
To do: 🔥 Code release
-
10/18/2024: We released the preprint paper in arxiv.
Our MATRIX generates realistic and diverse scenarios with 1000 real-world-grounded agents and structured communication (agent grouping, inter- and intra-group communication )
Overview of the proposed post-training data generation process (MATRIX-Gen) from scenarios
- MATRIX-Gen-SFT
Models instruction-tuned on Llama3-8B using MATRIX-Gen-SFT consistently outperform those trained on baseline datasets with the same data quantity across both benchmarks.
- MATRIX-Gen-DPO
Models preference-tuned on MATRIX-SFT-Model using MATRIX-Gen-DPO outperform baselines with equivalent data quantities on both benchmarks.
- MATRIX-Gen-Code & MATRIX-Gen-Safe
- MATRIX-Gen-MT
Increasing scales of agents and scenarios significantly improves model performances. Agent-grouping-based structured communication produces the highest quality scenarios, while random communication and no communication yield lower quality results.
Please cite our paper if you find the repository helpful.
@inproceedings{Tang2024SynthesizingPD,
title={Synthesizing Post-Training Data for LLMs through Multi-Agent Simulation},
author={Shuo Tang and Xianghe Pang and Zexi Liu and Bohan Tang and Rui Ye and Xiaowen Dong and Yanfeng Wang and Siheng Chen},
year={2024},
url={https://arxiv.org/abs/2410.14251}
}