Skip to content

Add Whole-Session Time-Binned Analysis and Correlation with Continuous Behavioral Variables #213

@pauladkisson

Description

@pauladkisson

Summary

The Nelson lab bins recordings into fixed-duration segments (e.g., 2-minute bins) and correlates per-bin photometry statistics — mean fluorescence or transient count — with a continuous behavioral variable measured at the same temporal resolution (e.g., akinesia severity score, locomotion velocity). This analysis pattern is distinct from PSTH (event-triggered) and from tonic epoch comparison; it is a time-resolved, whole-session correlation that is not currently supported by GuPPy.

Motivation

  • Rodrigo Paz (Nelson lab) described this as a primary analysis for their Parkinson's disease model experiments: "we take two minutes of signal, and we average either the number of transients or the raw fluorescence, and we correlate that with a manually scored thing [akinesia severity], for that bin"
  • The correlation spans the whole session — comparing the fluorescence bin at one time point to the behavioral score at the same time point — to ask whether signal and behavior co-vary across the session
  • This is specifically useful for long experiments where a manipulation happens and the lab wants to see how fluorescence tracks behavioral recovery or deterioration over time
  • The existing cross-correlation module (src/guppy/analysis/cross_correlation.py) computes sample-by-sample correlation between two continuous time series; it is not designed for computing per-bin statistics and correlating them with an independent behavioral score vector
  • Alexandra noted that photobleaching correction (Issue 01) is prerequisite for this analysis to be valid: "if nothing was happening, for the signal to be dead flat for three hours. So that we can say, how does the velocity correlate during this two-minute bin here, and this two-minute bin here"
  • The lab currently performs this analysis in Excel or custom scripts after exporting GuPPy z-scores, fragmenting the workflow

Proposed Solution

  • Add a new src/guppy/analysis/binned_correlation.py module implementing:
    • Time-binning of the processed signal into fixed-duration, non-overlapping bins with a configurable width (e.g., bin_width_sec = 120)
    • Per-bin statistic computation: mean z-score or dF/F; optionally transient count per bin if Step 5 has already been run
    • Pearson and Spearman correlation of the per-bin signal statistic against a user-supplied continuous behavioral variable vector of matching length
    • Output: a CSV per session with bin start times, per-bin signal statistic, per-bin behavioral variable value, and overall correlation coefficient and p-value
    • Optional scatter plot (signal statistic vs. behavioral variable per bin) saved alongside the CSV
  • The behavioral variable is provided as a CSV with one column per measure, sampled at a frequency compatible with the chosen bin width (the module resamples/averages to bin resolution if needed)
  • Add a step6_binned_correlation keyword-only function to src/guppy/testing/api.py accepting behavioral_data_path and bin_width_sec as parameters
  • Add a GUI panel or sub-step for configuring and running this analysis, ideally accessible from the visualization dashboard after Step 5

Open Questions

  • Should the bin width be fixed across all sessions in a batch, or per-session? Fixed is simpler and more reproducible for group comparisons; per-session would require the user to set it individually
  • How should the behavioral variable be aligned to photometry bins when the behavioral data has a different sampling rate? Simple bin-averaging of the behavioral variable is the safest default; interpolation is an alternative
  • Should computing transient count per bin require that Step 5 (transient detection) has been run first, or should the binned analysis be independently runnable using only z-score output from Step 4?
  • Is there a multi-session group-level requirement — e.g., pooling bins across sessions before computing correlation, or computing per-session correlations and then averaging the coefficients? Rodrigo's description suggests per-session, but group-level analysis may be needed for publication
  • What correlation method to default to: Pearson (assumes normality, sensitive to outliers) vs. Spearman (rank-based, more robust)? Both should likely be computed and reported
  • Should bins with high artifact content (flagged during artifact removal) be excluded from the correlation, or is that outside scope for a first version?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions