Add overall predictive probability utility and validate model-comparison math by shawnrhoads · Pull Request #18 · shawnrhoads/pyEM

shawnrhoads · 2026-02-18T19:13:24Z

Motivation

Align the integrated BIC implementation with the model-comparison math that aggregates per-subject log-mean-exp of likelihoods and then applies the parameter-count penalty.
Provide a simple, numerically-robust utility to report the overall predictive probability (geometric-mean predictive probability across all modeled choices) from per-subject NLL outputs.
Harden existing likelihood-based utilities to prevent silent misuse of aggregation metrics.

Description

Added overall_predictive_probability_from_nll in pyem.utils.stats which returns the geometric-mean predictive probability (or its log) computed as exp(-sum(nll) / nchoices_total).
Kept calc_BICint behavior as the integrated approach: per-subject log-mean-exp over posterior samples using logsumexp, summing across subjects, and adding the complexity penalty npar * log(total_trials), and ensured total-trial counting sums across all subjects.
Retained/ensured validation in pseudo_r2_from_nll to raise ValueError for unsupported metric values.
Added/updated unit tests in tests/test_stats.py covering overall_predictive_probability_from_nll (formula match, log-return, and input validation), calc_BICint trial-count aggregation and numerical stability, and pseudo_r2_from_nll metric validation.

Testing

Ran pytest -q tests/test_stats.py and the test suite passed.
Ran pytest -q tests/test_compare.py::test_model_compare_basic tests/test_utils.py::test_model_identifiability and both tests passed.
All added tests and the targeted existing checks completed successfully (no failures).

Codex Task

Copilot

Pull request overview

This PR enhances likelihood-based statistical utilities in the pyem package by fixing the integrated BIC calculation to properly aggregate trials across all subjects, improving numerical stability, and adding a new utility to compute overall predictive probability from per-subject NLL values.

Changes:

Fixed calc_BICint to count total trials across all subjects (previously only counted first subject) and improved numerical stability using logsumexp
Added overall_predictive_probability_from_nll utility function to compute geometric-mean predictive probability with proper NaN handling
Added input validation to pseudo_r2_from_nll to prevent silent misuse of invalid metric values

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.

File	Description
pyem/utils/stats.py	Fixed BIC trial counting bug, added logsumexp for numerical stability, added new predictive probability utility with validation, and hardened pseudo_r2_from_nll with metric validation
tests/test_stats.py	Added comprehensive unit tests covering trial aggregation, numerical stability, input validation, and formula correctness for all modified and new functions

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Add overall predictive probability utility and tests

11b87df

shawnrhoads added the codex label Feb 18, 2026 — with ChatGPT Codex Connector

shawnrhoads requested a review from Copilot February 18, 2026 20:36

Copilot started reviewing on behalf of shawnrhoads February 18, 2026 20:36 View session

Copilot AI reviewed Feb 18, 2026

View reviewed changes

shawnrhoads marked this pull request as ready for review February 18, 2026 20:40

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add overall predictive probability utility and validate model-comparison math#18

Add overall predictive probability utility and validate model-comparison math#18
shawnrhoads wants to merge 1 commit intomainfrom
review-functions-for-model-comparison-metrics

shawnrhoads commented Feb 18, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

shawnrhoads commented Feb 18, 2026

Motivation

Description

Testing

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants