Add Whole-Session Time-Binned Analysis and Correlation with Continuous Behavioral Variables

## Summary

The Nelson lab bins recordings into fixed-duration segments (e.g., 2-minute bins) and correlates per-bin photometry statistics — mean fluorescence or transient count — with a continuous behavioral variable measured at the same temporal resolution (e.g., akinesia severity score, locomotion velocity). This analysis pattern is distinct from PSTH (event-triggered) and from tonic epoch comparison; it is a time-resolved, whole-session correlation that is not currently supported by GuPPy.

## Motivation

- Rodrigo Paz (Nelson lab) described this as a primary analysis for their Parkinson's disease model experiments: "we take two minutes of signal, and we average either the number of transients or the raw fluorescence, and we correlate that with a manually scored thing [akinesia severity], for that bin"
- The correlation spans the whole session — comparing the fluorescence bin at one time point to the behavioral score at the same time point — to ask whether signal and behavior co-vary across the session
- This is specifically useful for long experiments where a manipulation happens and the lab wants to see how fluorescence tracks behavioral recovery or deterioration over time
- The existing cross-correlation module (`src/guppy/analysis/cross_correlation.py`) computes sample-by-sample correlation between two continuous time series; it is not designed for computing per-bin statistics and correlating them with an independent behavioral score vector
- Alexandra noted that photobleaching correction (Issue 01) is prerequisite for this analysis to be valid: "if nothing was happening, for the signal to be dead flat for three hours. So that we can say, how does the velocity correlate during this two-minute bin here, and this two-minute bin here"
- The lab currently performs this analysis in Excel or custom scripts after exporting GuPPy z-scores, fragmenting the workflow

## Proposed Solution

- Add a new `src/guppy/analysis/binned_correlation.py` module implementing:
  - Time-binning of the processed signal into fixed-duration, non-overlapping bins with a configurable width (e.g., `bin_width_sec = 120`)
  - Per-bin statistic computation: mean z-score or dF/F; optionally transient count per bin if Step 5 has already been run
  - Pearson and Spearman correlation of the per-bin signal statistic against a user-supplied continuous behavioral variable vector of matching length
  - Output: a CSV per session with bin start times, per-bin signal statistic, per-bin behavioral variable value, and overall correlation coefficient and p-value
  - Optional scatter plot (signal statistic vs. behavioral variable per bin) saved alongside the CSV
- The behavioral variable is provided as a CSV with one column per measure, sampled at a frequency compatible with the chosen bin width (the module resamples/averages to bin resolution if needed)
- Add a `step6_binned_correlation` keyword-only function to `src/guppy/testing/api.py` accepting `behavioral_data_path` and `bin_width_sec` as parameters
- Add a GUI panel or sub-step for configuring and running this analysis, ideally accessible from the visualization dashboard after Step 5

## Open Questions

- Should the bin width be fixed across all sessions in a batch, or per-session? Fixed is simpler and more reproducible for group comparisons; per-session would require the user to set it individually
- How should the behavioral variable be aligned to photometry bins when the behavioral data has a different sampling rate? Simple bin-averaging of the behavioral variable is the safest default; interpolation is an alternative
- Should computing transient count per bin require that Step 5 (transient detection) has been run first, or should the binned analysis be independently runnable using only z-score output from Step 4?
- Is there a multi-session group-level requirement — e.g., pooling bins across sessions before computing correlation, or computing per-session correlations and then averaging the coefficients? Rodrigo's description suggests per-session, but group-level analysis may be needed for publication
- What correlation method to default to: Pearson (assumes normality, sensitive to outliers) vs. Spearman (rank-based, more robust)? Both should likely be computed and reported
- Should bins with high artifact content (flagged during artifact removal) be excluded from the correlation, or is that outside scope for a first version?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Whole-Session Time-Binned Analysis and Correlation with Continuous Behavioral Variables #213

Summary

Motivation

Proposed Solution

Open Questions

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Add Whole-Session Time-Binned Analysis and Correlation with Continuous Behavioral Variables #213

Description

Summary

Motivation

Proposed Solution

Open Questions

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions