-
Notifications
You must be signed in to change notification settings - Fork 593
Description
Hello Cellpose team,
I have been benchmarking CellPosev4 on various datasets, but I seem to run into issues with Platynereis-CBG, where an earlier version of CellPose performed incredibly well on (TL;DR -> last paragraph). I would appreciate your help in figuring out whether the updated CellPoseSAM truly has an unconsidered failure mode or whether I missed something (although I just exchanged folder paths between experiments). Below, I will illustrate the results per dataset (3 runs with different splits per dataset), and you can evaluate how plausible that is. Note that I fine-tuned using the following settings:
weight_decay=0.1, learning_rate=1e-5, n_epochs=100, min_train_masks = 5
Parhyale (the Stardist-3d dataset):
| IoU threshold | mAP | Std. dev |
|---|---|---|
| 0.1 | 0.603 | 0.034 |
| 0.2 | 0.569 | 0.017 |
| 0.3 | 0.522 | 0.010 |
| 0.4 | 0.445 | 0.029 |
| 0.5 | 0.346 | 0.045 |
| 0.6 | 0.249 | 0.041 |
| 0.7 | 0.135 | 0.027 |
| 0.8 | 0.042 | 0.010 |
| 0.9 | 0.002 | 0.002 |
Mouse Skull (also provided by the EmbedSeg team):
| IoU threshold | mAP | Std. dev |
|---|---|---|
| 0.1 | 0.702 | 0.017 |
| 0.2 | 0.690 | 0.011 |
| 0.3 | 0.669 | 0.016 |
| 0.4 | 0.648 | 0.023 |
| 0.5 | 0.608 | 0.018 |
| 0.6 | 0.560 | 0.024 |
| 0.7 | 0.481 | 0.021 |
| 0.8 | 0.365 | 0.004 |
| 0.9 | 0.044 | 0.006 |
Confocal Zebrafish Neurons (a custom dataset we provide):
| IoU threshold | mAP | Std. dev |
|---|---|---|
| 0.1 | 0.793 | 0.054 |
| 0.2 | 0.769 | 0.076 |
| 0.3 | 0.702 | 0.113 |
| 0.4 | 0.602 | 0.111 |
| 0.5 | 0.439 | 0.118 |
| 0.6 | 0.207 | 0.080 |
| 0.7 | 0.043 | 0.023 |
| 0.8 | 0.000 | 0.000 |
| 0.9 | 0.000 | 0.000 |
However, for the dataset Platy-CBG (see here: https://github.com/juglab/EmbedSeg/releases/tag/v0.1.0),
results are far from what has been reported by the EmbedSeg team. In fact, when I set model.eval(x = testing_image, do_3D = True, channel_axis = -1, z_axis = 0, anisotropy=5.0), I get an map of almost 0 already at an iou threshold of 0.1. When I change this to model.eval(x = testing_image, do_3D = False, channel_axis = -1, z_axis = 0, stitch_threshold = 0.3, anisotropy=5.0), things improve a little:
| IoU threshold | mAP | Std. dev |
|---|---|---|
| 0.1 | 0.364 | 0.146 |
| 0.2 | 0.268 | 0.113 |
| 0.3 | 0.181 | 0.094 |
| 0.4 | 0.094 | 0.065 |
| 0.5 | 0.038 | 0.028 |
| 0.6 | 0.009 | 0.006 |
| 0.7 | 0.001 | 0.001 |
| 0.8 | 0.000 | 0.000 |
| 0.9 | 0.000 | 0.000 |
Still, this is miles away from previous results. The Platy-CBG dataset is perhaps more challenging because it involves high sparsity in some regions, high density in others, variability in cell sizes and shapes, and brightness variability within cells. However, good segmentation approaches routinely perform well on it due to a rather high SNR. Can you help me figure out what went wrong? I attached a picture of one of the predictions (left: image, middle: ground truth, right: preds):
TL;DR: Cellpose-SAM v4 performs well on several 3D datasets, but drastically underperforms on the Platy-CBG dataset, where earlier Cellpose achieved near-perfect results. High SNR and reproducible training suggest this may be a regression in 3D/SAM behavior rather than a data issue.