[BUG] Cellpose-3D produces inferior results for specific datasets

Hello Cellpose team,
I have been benchmarking CellPosev4 on various datasets, but I seem to run into issues with Platynereis-CBG, where an earlier version of CellPose performed incredibly well on (TL;DR -> last paragraph). I would appreciate your help in figuring out whether the updated CellPoseSAM truly has an unconsidered failure mode or whether I missed something (although I just exchanged folder paths between experiments). Below, I will illustrate the results per dataset (3 runs with different splits per dataset), and you can evaluate how plausible that is. Note that I fine-tuned using the following settings:

`weight_decay=0.1, learning_rate=1e-5, n_epochs=100, min_train_masks = 5`

Parhyale (the Stardist-3d dataset):
| IoU threshold | mAP  | Std. dev |
|--------------|------|----------|
| 0.1 | 0.603 | 0.034 |
| 0.2 | 0.569 | 0.017 |
| 0.3 | 0.522 | 0.010 |
| 0.4 | 0.445 | 0.029 |
| 0.5 | 0.346 | 0.045 |
| 0.6 | 0.249 | 0.041 |
| 0.7 | 0.135 | 0.027 |
| 0.8 | 0.042 | 0.010 |
| 0.9 | 0.002 | 0.002 |

Mouse Skull (also provided by the EmbedSeg team):
| IoU threshold | mAP  | Std. dev |
|--------------|------|----------|
| 0.1 | 0.702 | 0.017 |
| 0.2 | 0.690 | 0.011 |
| 0.3 | 0.669 | 0.016 |
| 0.4 | 0.648 | 0.023 |
| 0.5 | 0.608 | 0.018 |
| 0.6 | 0.560 | 0.024 |
| 0.7 | 0.481 | 0.021 |
| 0.8 | 0.365 | 0.004 |
| 0.9 | 0.044 | 0.006 |

Confocal Zebrafish Neurons (a custom dataset we provide):
| IoU threshold | mAP  | Std. dev |
|--------------|------|----------|
| 0.1 | 0.793 | 0.054 |
| 0.2 | 0.769 | 0.076 |
| 0.3 | 0.702 | 0.113 |
| 0.4 | 0.602 | 0.111 |
| 0.5 | 0.439 | 0.118 |
| 0.6 | 0.207 | 0.080 |
| 0.7 | 0.043 | 0.023 |
| 0.8 | 0.000 | 0.000 |
| 0.9 | 0.000 | 0.000 |

However, for the dataset Platy-CBG (see here: https://github.com/juglab/EmbedSeg/releases/tag/v0.1.0),
results are far from what has been reported by the EmbedSeg team. In fact, when I set  model.eval(x = testing_image, do_3D = True, channel_axis = -1, z_axis = 0, anisotropy=5.0), I get an map of almost 0 already at an iou threshold of 0.1. When I change this to model.eval(x = testing_image, do_3D = False, channel_axis = -1, z_axis = 0, stitch_threshold = 0.3, anisotropy=5.0), things improve a little:

| IoU threshold | mAP  | Std. dev |
|--------------|------|----------|
| 0.1 | 0.364 | 0.146 |
| 0.2 | 0.268 | 0.113 |
| 0.3 | 0.181 | 0.094 |
| 0.4 | 0.094 | 0.065 |
| 0.5 | 0.038 | 0.028 |
| 0.6 | 0.009 | 0.006 |
| 0.7 | 0.001 | 0.001 |
| 0.8 | 0.000 | 0.000 |
| 0.9 | 0.000 | 0.000 |

Still, this is miles away from previous results. The Platy-CBG dataset is perhaps more challenging because it involves high sparsity in some regions, high density in others, variability in cell sizes and shapes, and brightness variability within cells. However, good segmentation approaches routinely perform well on it due to a rather high SNR. Can you help me figure out what went wrong? I attached a picture of one of the predictions (left: image, middle: ground truth, right: preds):

<img width="2134" height="737" alt="Image" src="https://github.com/user-attachments/assets/0bb6ad1c-5ef6-4872-9e0f-fb5f3ee2a33e" />

TL;DR: Cellpose-SAM v4 performs well on several 3D datasets, but drastically underperforms on the Platy-CBG dataset, where earlier Cellpose achieved near-perfect results. High SNR and reproducible training suggest this may be a regression in 3D/SAM behavior rather than a data issue.




Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] Cellpose-3D produces inferior results for specific datasets #1389

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

IoU threshold	mAP	Std. dev
0.1	0.603	0.034
0.2	0.569	0.017
0.3	0.522	0.010
0.4	0.445	0.029
0.5	0.346	0.045
0.6	0.249	0.041
0.7	0.135	0.027
0.8	0.042	0.010
0.9	0.002	0.002

IoU threshold	mAP	Std. dev
0.1	0.702	0.017
0.2	0.690	0.011
0.3	0.669	0.016
0.4	0.648	0.023
0.5	0.608	0.018
0.6	0.560	0.024
0.7	0.481	0.021
0.8	0.365	0.004
0.9	0.044	0.006

IoU threshold	mAP	Std. dev
0.1	0.793	0.054
0.2	0.769	0.076
0.3	0.702	0.113
0.4	0.602	0.111
0.5	0.439	0.118
0.6	0.207	0.080
0.7	0.043	0.023
0.8	0.000	0.000
0.9	0.000	0.000

IoU threshold	mAP	Std. dev
0.1	0.364	0.146
0.2	0.268	0.113
0.3	0.181	0.094
0.4	0.094	0.065
0.5	0.038	0.028
0.6	0.009	0.006
0.7	0.001	0.001
0.8	0.000	0.000
0.9	0.000	0.000

[BUG] Cellpose-3D produces inferior results for specific datasets #1389

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions