Skip to content

[BUG] Cellpose-3D produces inferior results for specific datasets #1389

@postnubilaphoebus

Description

@postnubilaphoebus

Hello Cellpose team,
I have been benchmarking CellPosev4 on various datasets, but I seem to run into issues with Platynereis-CBG, where an earlier version of CellPose performed incredibly well on (TL;DR -> last paragraph). I would appreciate your help in figuring out whether the updated CellPoseSAM truly has an unconsidered failure mode or whether I missed something (although I just exchanged folder paths between experiments). Below, I will illustrate the results per dataset (3 runs with different splits per dataset), and you can evaluate how plausible that is. Note that I fine-tuned using the following settings:

weight_decay=0.1, learning_rate=1e-5, n_epochs=100, min_train_masks = 5

Parhyale (the Stardist-3d dataset):

IoU threshold mAP Std. dev
0.1 0.603 0.034
0.2 0.569 0.017
0.3 0.522 0.010
0.4 0.445 0.029
0.5 0.346 0.045
0.6 0.249 0.041
0.7 0.135 0.027
0.8 0.042 0.010
0.9 0.002 0.002

Mouse Skull (also provided by the EmbedSeg team):

IoU threshold mAP Std. dev
0.1 0.702 0.017
0.2 0.690 0.011
0.3 0.669 0.016
0.4 0.648 0.023
0.5 0.608 0.018
0.6 0.560 0.024
0.7 0.481 0.021
0.8 0.365 0.004
0.9 0.044 0.006

Confocal Zebrafish Neurons (a custom dataset we provide):

IoU threshold mAP Std. dev
0.1 0.793 0.054
0.2 0.769 0.076
0.3 0.702 0.113
0.4 0.602 0.111
0.5 0.439 0.118
0.6 0.207 0.080
0.7 0.043 0.023
0.8 0.000 0.000
0.9 0.000 0.000

However, for the dataset Platy-CBG (see here: https://github.com/juglab/EmbedSeg/releases/tag/v0.1.0),
results are far from what has been reported by the EmbedSeg team. In fact, when I set model.eval(x = testing_image, do_3D = True, channel_axis = -1, z_axis = 0, anisotropy=5.0), I get an map of almost 0 already at an iou threshold of 0.1. When I change this to model.eval(x = testing_image, do_3D = False, channel_axis = -1, z_axis = 0, stitch_threshold = 0.3, anisotropy=5.0), things improve a little:

IoU threshold mAP Std. dev
0.1 0.364 0.146
0.2 0.268 0.113
0.3 0.181 0.094
0.4 0.094 0.065
0.5 0.038 0.028
0.6 0.009 0.006
0.7 0.001 0.001
0.8 0.000 0.000
0.9 0.000 0.000

Still, this is miles away from previous results. The Platy-CBG dataset is perhaps more challenging because it involves high sparsity in some regions, high density in others, variability in cell sizes and shapes, and brightness variability within cells. However, good segmentation approaches routinely perform well on it due to a rather high SNR. Can you help me figure out what went wrong? I attached a picture of one of the predictions (left: image, middle: ground truth, right: preds):

Image

TL;DR: Cellpose-SAM v4 performs well on several 3D datasets, but drastically underperforms on the Platy-CBG dataset, where earlier Cellpose achieved near-perfect results. High SNR and reproducible training suggest this may be a regression in 3D/SAM behavior rather than a data issue.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions