Add precision recall calculation #235

Matvezy · 2025-05-29T01:16:52Z

Description

Adding precision and recall calculation, as well as per-class metrics.

Type of change

Please delete options that are not relevant.

New feature (non-breaking change which adds functionality)

How has this change been tested, please provide a testcase or example of how you tested the change?

Tested locally

Any specific deployment considerations

No

Docs

Docs updated? What were the changes:

MiXaiLL76 · 2025-05-29T07:30:08Z

I propose to replace pycocotools validation with faster-coco-eval.
#231
The library implements mechanisms for calculating pre-rec curves.

https://nbviewer.org/github/MiXaiLL76/faster_coco_eval/blob/main/examples/curve_example.ipynb
https://github.com/MiXaiLL76/faster_coco_eval/blob/main/examples/curve_example.ipynb

MiXaiLL76 · 2025-05-29T20:02:24Z

@Matvezy Maybe it will be better this way?

def extended_metrics(self):
    """Computes extended evaluation metrics for object detection results.

    Calculates per-class and overall (macro) metrics such as mean average precision (mAP) at IoU thresholds,
    precision, recall, and F1-score. Results are computed using evaluation results stored in the object.
    For each class, if categories are used, metrics are reported separately and for the overall dataset.

    Returns:
        dict: A dictionary with the following keys:
            - 'class_map' (list of dict): List of per-class and overall metrics, each as a dictionary containing:
                - 'class' (str): Class name or "all" for macro metrics.
                - 'map@50:95' (float): Mean average precision at IoU 0.50:0.95.
                - 'map@50' (float): Mean average precision at IoU 0.50.
                - 'precision' (float): Macro-averaged precision.
                - 'recall' (float): Macro-averaged recall.
            - 'map' (float): Overall mean average precision at IoU 0.50.
            - 'precision' (float): Macro-averaged precision for the best F1-score.
            - 'recall' (float): Macro-averaged recall for the best F1-score.

    Notes:
        - Uses COCO-style evaluation results (precision and scores arrays).
        - Filters out classes with NaN results in any metric.
        - The best F1-score across recall thresholds is used to select macro precision and recall.
    """
    # Extract IoU and recall thresholds from parameters
    iou_thrs, rec_thrs = self.params.iouThrs, self.params.recThrs

    # Indices for IoU=0.50, first area, and last max dets
    iou50_idx, area_idx, maxdet_idx = (int(np.argwhere(np.isclose(iou_thrs, 0.50))), 0, -1)
    P = self.eval["precision"]
    S = self.eval["scores"]

    # Get precision for IoU=0.50, area, and max dets
    prec_raw = P[iou50_idx, :, :, area_idx, maxdet_idx]
    prec = prec_raw.copy().astype(float)
    prec[prec < 0] = np.nan

    # Compute F1 score for each class and recall threshold
    f1_cls = 2 * prec * rec_thrs[:, None] / (prec + rec_thrs[:, None])
    f1_macro = np.nanmean(f1_cls, axis=1)
    best_j = int(f1_macro.argmax())

    # Macro precision and recall at the best F1 score
    macro_precision = float(np.nanmean(prec[best_j]))
    macro_recall = float(rec_thrs[best_j])

    # Score vector for the best recall threshold
    score_vec = S[iou50_idx, best_j, :, area_idx, maxdet_idx].astype(float)
    score_vec[prec_raw[best_j] < 0] = np.nan

    per_class = []
    if self.params.useCats:
        # Map category IDs to names
        cat_ids = self.params.catIds
        cat_id_to_name = {c["id"]: c["name"] for c in self.cocoGt.loadCats(cat_ids)}
        for k, cid in enumerate(cat_ids):
            # Precision per category
            p_slice = P[:, :, k, area_idx, maxdet_idx]
            valid = p_slice > -1
            ap_50_95 = float(p_slice[valid].mean()) if valid.any() else float("nan")
            ap_50 = (
                float(p_slice[iou50_idx][p_slice[iou50_idx] > -1].mean())
                if (p_slice[iou50_idx] > -1).any()
                else float("nan")
            )

            pc = float(prec[best_j, k]) if prec_raw[best_j, k] > -1 else float("nan")
            rc = macro_recall

            # Filter out dataset class if any metric is NaN
            if np.isnan(ap_50_95) or np.isnan(ap_50) or np.isnan(pc) or np.isnan(rc):
                continue

            per_class.append({
                "class": cat_id_to_name[int(cid)],
                "map@50:95": ap_50_95,
                "map@50": ap_50,
                "precision": pc,
                "recall": rc,
            })

    # Add metrics for all classes combined
    per_class.append({
        "class": "all",
        "map@50:95": self.stats_as_dict["AP_all"],
        "map@50": self.stats_as_dict["AP_50"],
        "precision": macro_precision,
        "recall": macro_recall,
    })

    return {
        "class_map": per_class,
        "map": self.stats_as_dict["AP_50"],
        "precision": macro_precision,
        "recall": macro_recall,
    }

Thanks for the feature, I liked it and I "stole" it a little)
https://github.com/MiXaiLL76/faster_coco_eval/blob/std20/faster_coco_eval/core/faster_eval_api.py#L206
https://github.com/MiXaiLL76/faster_coco_eval/blob/std20/examples/eval_example.ipynb

Matvezy · 2025-05-29T21:38:06Z

Hi @MiXaiLL76 thanks a lot for both of your suggestions! We are definitely interested in a faster way of computing the metrics :)
This would be out of scope for this PR. But I would strongly encourage you to open a PR with your suggested changes and we would take a look at it then. Thanks again!

…lculation Add precision recall calculation

kgordon-daoai · 2025-09-08T22:15:54Z

Is it a bug to have same recall for every class? Why set rc = macro_recall?

For example, we can get the recall:
R = coco_eval.eval["recall"]

And then get per-class recall inside the per-class loop:
rc = R[iou50_idx, k, area_idx, maxdet_idx]

But maybe I'm missing something

Matvezy added 3 commits May 29, 2025 01:11

inti commit

eea897a

drop prints

e5a9df6

all classes

8012da2

fix -1 masking and superclass reporting

f3c2dd2

Matvezy added 2 commits May 30, 2025 01:30

add optio to run test after train finsihes with run_test

65db331

add optio to run test after train finsihes with run_test

64831dd

probicheaux marked this pull request as ready for review May 30, 2025 06:36

probicheaux approved these changes May 30, 2025

View reviewed changes

Matvezy merged commit 7fb9e50 into develop Jun 10, 2025
1 check passed

spryntec pushed a commit to spryntec/rf-detr that referenced this pull request Jul 23, 2025

Merge pull request roboflow#235 from roboflow/add_precision_recall_ca…

f3e5b8b

…lculation Add precision recall calculation

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add precision recall calculation #235

Add precision recall calculation #235

Uh oh!

Matvezy commented May 29, 2025

Uh oh!

MiXaiLL76 commented May 29, 2025

Uh oh!

MiXaiLL76 commented May 29, 2025

Uh oh!

Matvezy commented May 29, 2025

Uh oh!

Uh oh!

kgordon-daoai commented Sep 8, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Add precision recall calculation #235

Add precision recall calculation #235

Uh oh!

Conversation

Matvezy commented May 29, 2025

Description

Type of change

How has this change been tested, please provide a testcase or example of how you tested the change?

Any specific deployment considerations

Docs

Uh oh!

MiXaiLL76 commented May 29, 2025

Uh oh!

MiXaiLL76 commented May 29, 2025

Uh oh!

Matvezy commented May 29, 2025

Uh oh!

Uh oh!

kgordon-daoai commented Sep 8, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants