When reading the source code for cytominer_eval.operations.precision_recall() I noticed that the similarity_melted_df variable counts each replicate pair twice, e.g. A1 --> A2 and A2 --> A1.
This becomes a problem because only the first replicate_group_col in lines 49-52 is subsequently used for grouping:
49 replicate_group_cols = [
50 "{x}{suf}".format(x=x, suf=pair_ids[list(pair_ids)[0]]["suffix"]) # [0] keeps only the first of two grouping columns
51 for x in replicate_groups
52 ]
In the next step, each group is passed to calculate_precision_recall():
59 precision_recall_df_at_k = similarity_melted_df.groupby(
60 replicate_group_cols
61 ).apply(lambda x: calculate_precision_recall(x, k=k_))
62 precision_recall_df = precision_recall_df.append(precision_recall_df_at_k)
With the effect that all samples from within a group are counted twice. However, samples from outside the group are only counted once because group_by will filter out one direction.
Let me clarify this with an example. Consider 5 samples, the first 3 from group 'A', the second 2 from group 'B', both with greater within-group than between group correlations:

Then what calculate_precision_recall will see is this:

For example, one can see that the sample_pair_a column has a row for A1-->A2 and one for A2-->A1 but only one for A1-->B1. B1-->A1 is missing because of the way the melted data frame is generated and the grouping is performed. One can also see that the similarity metrics for within group connections appear in duplicates.
Accordingly the outcome for precision and recall at k=4 is the following:

Precision: all 4 closest connections are from within group for A but only 2 for group B.
Recall: 4/6 connections found for A but all 2 found for B.
In summary, the computations are not entirely correct, especially for smaller groups. Also consider that with odd values for k only one of the two connections of the symmetric pair is used.
Admittedly, this is a bit mind-boggling. I recommend using a debugger if you want to trace all the steps in detail by yourself.
Proposed solution: I would suggest to count each pair only once when creating the melted data frame.
When reading the source code for
cytominer_eval.operations.precision_recall()I noticed that the similarity_melted_df variable counts each replicate pair twice, e.g. A1 --> A2 and A2 --> A1.This becomes a problem because only the first replicate_group_col in lines 49-52 is subsequently used for grouping:
In the next step, each group is passed to
calculate_precision_recall():With the effect that all samples from within a group are counted twice. However, samples from outside the group are only counted once because
group_bywill filter out one direction.Let me clarify this with an example. Consider 5 samples, the first 3 from group 'A', the second 2 from group 'B', both with greater within-group than between group correlations:
Then what

calculate_precision_recallwill see is this:For example, one can see that the
sample_pair_acolumn has a row forA1-->A2and one forA2-->A1but only one forA1-->B1.B1-->A1is missing because of the way the melted data frame is generated and the grouping is performed. One can also see that the similarity metrics for within group connections appear in duplicates.Accordingly the outcome for precision and recall at k=4 is the following:

Precision: all 4 closest connections are from within group for A but only 2 for group B.
Recall: 4/6 connections found for A but all 2 found for B.
In summary, the computations are not entirely correct, especially for smaller groups. Also consider that with odd values for k only one of the two connections of the symmetric pair is used.
Admittedly, this is a bit mind-boggling. I recommend using a debugger if you want to trace all the steps in detail by yourself.
Proposed solution: I would suggest to count each pair only once when creating the melted data frame.