Co3SOP: A Synthetic Benchmark for Collaborative 3D Semantic Occupancy Prediction in V2X Autonomous Driving

News

[2025/06/20] The preprint version is available on arXiv
[2025/03/10] The annotations for 3D semantic occupancy prediction are uploaded here.

Introduction

3D semantic occupancy prediction is an emerging perception paradigm in autonomous driving, providing a voxel-level representation of both geometric details and semantic categories. However, the perception capability of a single vehicle is inherently constrained by occlusion, restricted sensor range, and narrow viewpoints. To address these limitations, collaborative perception enables the exchange of complementary information, thereby enhancing the completeness and accuracy. In the absence of a dedicated dataset for collaborative 3D semantic occupancy prediction, we augment an existing collaborative perception dataset by replaying it in CARLA with a high-resolution semantic voxel sensor to provide dense and comprehensive occupancy annotations. In addition, we establish benchmarks with varying prediction ranges designed to systematically assess the impact of spatial extent on collaborative prediction. We further develop a baseline model that performs inter-agent feature fusion via spatial alignment and attention aggregation. Experimental results demonstrate that our baseline model consistently outperforms single-agent models, with increasing gains observed as the prediction range expands.

Annotation Pipeline

Baseline Model

Getting Start

Dataset Preparation

Installation

Baseline Training and Evaluation

Customized Annotation Collection (Optional)

Benchmark Result

1. Benchmark Result for Range [25.6, 25.6, 4.8]m

Methodology	Modality	mIoU	Empty	Buildings	Fences	Other	Poles	Roadlines	Roads	Sidewalks	Vegetation	Vehicles	Walls	Trafficsigns	Ground	Guardrail	Trafficlight	Static	Dynamic	Terrain	Unlabeled
SSCNet	Lidar	13.21	93.01	1.84	0.16	0.00	3.60	0.00	0.23	19.22	41.43	71.73	0.26	0.00	37.73	8.22	0.25	3.68	0.07	26.41	9.26
LMSCNet	Lidar	24.92	96.90	8.67	22.27	0.00	29.57	2.57	86.70	42.24	43.77	85.35	9.97	18.19	62.68	12.02	0.00	18.39	1.57	36.11	21.16
OccFormer	Camera	29.48	97.03	11.63	14.17	0.00	19.67	39.64	87.40	45.32	42.78	75.7	13.41	9.73	67.08	35.53	5.43	16.14	1.82	86.95	38.01
SurroundOcc	Camera	28.71	97.33	10.63	11.06	0.00	17.22	26.78	86.87	46.61	44.92	75.95	12.37	17.27	53.80	48.49	2.11	12.86	2.89	76.58	45.44
Co3SOP-Ego	Camera	29.36	97.30	9.96	12.56	0.01	18.73	36.19	88.53	44.69	45.51	77.53	11.13	11.10	55.08	48.61	1.50	14.61	4.03	82.49	45.26
Co3SOP-Base	Camera	30.04	97.41	10.05	12.37	0.00	20.02	38.43	89.24	46.12	46.36	80.55	12.11	11.16	55.84	53.23	1.27	14.71	3.68	82.93	45.53

2. Benchmark Result for Range [51.2, 51.2, 4.8]m

Methodology	Modality	mIoU	Empty	Buildings	Fences	Other	Poles	Roadlines	Roads	Sidewalks	Vegetation	Vehicles	Walls	Trafficsigns	Ground	Bridge	Guardrail	Trafficlight	Static	Dynamic	Terrain	Unlabeled
SSCNet	Lidar	9.58	91.18	0.17	1.48	0.00	0.14	0.16	25.88	9.57	30.89	48.09	0.49	0.00	0.08	0.03	12.72	0.00	0.94	3.09	2.74	2.31
LMSCNet	Lidar	20.35	95.79	3.09	18.01	0.00	24.95	0.57	75.84	48.66	34.90	75.63	10.39	0.02	31.81	0.00	6.07	0.00	4.37	0.04	36.93	21.47
OccFormer	Camera	25.41	95.04	11.93	12.57	0.35	12.62	22.10	75.30	51.41	39.77	51.26	15.53	7.68	57.79	2.95	41.41	3.75	11.61	7.10	53.91	35.83
SurroundOcc	Camera	25.76	95.33	7.57	11.60	1.77	13.51	22.13	79.53	45.23	35.60	52.34	12.92	11.72	52.90	2.32	42.17	2.03	10.08	6.46	75.08	37.88
Co3SOP-Ego	Camera	25.85	95.21	7.52	13.18	1.36	10.91	24.78	78.95	43.38	35.72	54.02	13.13	10.35	54.45	2.17	38.22	3.25	11.70	8.45	75.21	38.53
Co3SOP-Base	Camera	27.50	95.30	8.19	13.70	0.52	16.16	29.12	82.35	42.92	36.30	66.48	13.99	9.16	51.96	2.26	48.54	3.30	12.09	7.84	80.75	37.18

3. Benchmark Result for Range [76.8, 76.8, 4.8]m

Methodology	Modality	mIoU	Empty	Buildings	Fences	Other	Poles	Roadlines	Roads	Sidewalks	Vegetation	Vehicles	Walls	Trafficsigns	Ground	Bridge	Guardrail	Trafficlight	Static	Dynamic	Terrain	Unlabeled
SSCNet	Lidar	10.04	87.33	0.19	0.41	16.18	0.00	0.00	0.14	20.28	22.91	39.35	0.18	0.00	22.18	0.07	10.21	0.00	3.40	0.62	17.01	0.53
LMSCNet	Lidar	17.62	93.70	1.79	9.26	0.00	17.92	0.00	67.99	53.27	23.91	62.94	10.08	0.04	20.59	0.00	3.37	0.00	2.82	0.00	33.43	21.72
OccFormer	Camera	24.12	92.67	13.43	11.04	2.74	10.17	15.53	73.69	59.51	35.30	33.35	11.55	2.49	64.75	5.45	36.90	0.11	13.44	9.11	53.05	34.63
SurroundOcc	Camera	24.68	93.61	6.91	8.84	12.50	4.78	14.09	74.87	55.30	29.08	29.20	10.03	6.23	64.77	3.74	39.65	1.10	10.22	8.33	75.08	44.03
Co3SOP-Ego	Camera	24.81	93.46	7.32	9.88	12.04	4.29	14.55	75.54	53.53	31.18	34.32	10.54	7.41	62.58	4.02	40.02	2.18	11.18	8.80	71.54	41.13
Co3SOP-Base	Camera	27.00	93.78	9.15	11.38	6.83	7.12	16.08	80.28	55.47	34.26	50.98	12.70	10.02	68.88	4.39	44.89	2.42	13.16	9.48	74.43	42.44

Citation

If you find our work useful for your research, please consider citing the paper:

@article{wu2025synthetic,
  title={A Synthetic Benchmark for Collaborative 3D Semantic Occupancy Prediction in V2X Autonomous Driving},
  author={Wu, Hanlin and Lin, Pengfei and Javanmardi, Ehsan and Bao, Naren and Qian, Bo and Si, Hao and Tsukada, Manabu},
  journal={arXiv preprint arXiv:2506.17004},
  year={2025}
}

Acknowledgements

Many thanks to these excellent projects:

Name		Name	Last commit message	Last commit date
Latest commit History 53 Commits
Baseline		Baseline
CarlaSensor		CarlaSensor
Docs		Docs
Figures		Figures
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Co3SOP: A Synthetic Benchmark for Collaborative 3D Semantic Occupancy Prediction in V2X Autonomous Driving

News

Table of Contents

Introduction

Annotation Pipeline

Baseline Model

Getting Start

Benchmark Result

Citation

Acknowledgements

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

tlab-wide/Co3SOP

Folders and files

Latest commit

History

Repository files navigation

Co3SOP: A Synthetic Benchmark for Collaborative 3D Semantic Occupancy Prediction in V2X Autonomous Driving

News

Table of Contents

Introduction

Annotation Pipeline

Baseline Model

Getting Start

Benchmark Result

Citation

Acknowledgements

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages