Support multiple additive terms by ilia-kats · Pull Request #163 · bioFAM/mofaflex

ilia-kats · 2026-01-16T16:25:12Z

This is a major refactor that completely modularizes the code base and adds support for multiple additive terms, i.e. models of the form $Y = Z_1 W_1 + Z_2 + W_2 + X$. Currently, only one type of term is implemented: The MofaFlex term, which takes the form $Y = Z W$, but additional term types can easily be added by subclassing the Term class.

This builds on prevous work modularizing the priors and introducing a dynamic API. Each term provides its own API, which can be accessed from the user-facing model object as e.g. model.terms[term_name].get_factors. To simplify the common special case of only a single additive term, in that situation the user-facing model objects forwards requests for any unknown attributes to its single term.

There are still some more unit tests needed and the Getting started tutorial needs a major overhaul, but that can all be done incrementally after this is merged: Since this touches every single part of the code base, it's blocking all other work, so it's time to merge and get on with it.

The additive term is responsible for predicting, the overall model handles the likelihoods. Training works. TODO: - refactor priors - have the terms and the model save their state to disk - port R2 calculation - API + docs - plotting - .tl namespace

There is now only one class per prior, without the split into Pyro priors and wrapper classes. This counteracts the proliferation of wrapper classes and allows us to use the same architecture for priors and terms.

This makes the likelihoods stateful and responsible for calculating whatever statistics they need and transforming the model prediction. This fully decouples the likelihoods from the preprocessing and is a prerequisite for a unified R2 calculation that is decoupled from prediction calculation: Now also the NB and Bernoulli likelihoods perform a shift of the prediction. Thus, a zero factor will now be transformed to the null model and no longer produce negative R2 values. The R2 calculation can thus operate directly on the full prediction without any knowledge of how that prediction was obtained.

R2 is now being calculated for the entire model, for each additive term, and for each component (e.g. factor) of each additive term.

this will make it possible to construct a term outside of the main MOFAFLEX class

move the validation of subclasses and subclass registry handling to a class decorator

return read-only mappings from the public API

apparently the ability to use class method properties was removed in Python 3.13

With the new API, it is now possible to use different configurations for the same prior in different groups/views. The dynamic API now handles that case by merging the results of all priors of the same class.

Warping is now applied to all groups of the GP prior. If warping is not required for some groups, a separate GaussianProcess instances with warping turned off can be used.

all covariates are now handled through get_datasets in the priors, the MofaFlex term class takes care of constructing CovariateDatasets for factor priors and storing the covariates for weight priors. All covariates are now passed as kwargs to model and guide.

This improves sparsity with the spike and slab prior as well as the quality of the results

annotations, re-run citeseq tutorial

initialization now happens in a device context

less warnings

pass them to term hooks

untrained model that only work on a trained model and vice versa

codecov · 2026-01-16T16:33:52Z

Codecov Report

❌ Patch coverage is 91.23638% with 185 lines in your changes missing coverage. Please review.
✅ Project coverage is 89.01%. Comparing base (7ff6adb) to head (07d7061).
⚠️ Report is 4 commits behind head on main.

Files with missing lines	Patch %	Lines
src/mofaflex/_core/terms/mofaflex.py	90.95%	50 Missing ⚠️
src/mofaflex/_core/terms/base.py	79.41%	21 Missing ⚠️
src/mofaflex/_core/model.py	93.33%	16 Missing ⚠️
src/mofaflex/_core/mofaflex.py	90.58%	16 Missing ⚠️
.../_core/priors/gaussian_process/gaussian_process.py	87.27%	14 Missing ⚠️
src/mofaflex/_core/utils.py	93.42%	10 Missing ⚠️
src/mofaflex/pl/_plotting.py	83.01%	9 Missing ⚠️
src/mofaflex/_core/priors/spike_slab.py	91.39%	8 Missing ⚠️
src/mofaflex/_core/api/_generate.py	88.88%	6 Missing ⚠️
src/mofaflex/_core/terms/api.py	73.68%	5 Missing ⚠️
... and 12 more

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #163      +/-   ##
==========================================
- Coverage   90.05%   89.01%   -1.05%     
==========================================
  Files          53       53              
  Lines        4758     5080     +322     
==========================================
+ Hits         4285     4522     +237     
- Misses        473      558      +85

Files with missing lines	Coverage Δ
src/mofaflex/__init__.py	`100.00% <100.00%> (ø)`
src/mofaflex/_core/__init__.py	`100.00% <100.00%> (ø)`
src/mofaflex/_core/api/__init__.py	`100.00% <100.00%> (ø)`
src/mofaflex/_core/api/likelihoods.py	`100.00% <100.00%> (ø)`
src/mofaflex/_core/api/priors.py	`100.00% <100.00%> (+10.00%)`	⬆️
src/mofaflex/_core/datasets/__init__.py	`100.00% <100.00%> (ø)`
src/mofaflex/_core/datasets/utils.py	`81.60% <100.00%> (+0.65%)`	⬆️
src/mofaflex/_core/dist.py	`100.00% <ø> (ø)`
src/mofaflex/_core/likelihoods/__init__.py	`100.00% <100.00%> (ø)`
src/mofaflex/_core/likelihoods/bernoulli.py	`97.29% <100.00%> (+1.29%)`	⬆️
... and 31 more

... and 2 files with indirect coverage changes

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

sphinx-tabs is not yet compatible with Sphinx 9, see executablebooks/sphinx-tabs#209

src/mofaflex/_core/terms/base.py

arberqoku · 2026-01-18T13:30:22Z

src/mofaflex/_core/model.py

+            for view_name, view in group.items():
+                if view.numel() == 0:  # can occur in the last batch of an epoch if the batch is small
+                    continue
+                prediction = None


Think this might be simplified a bit, could not wrap my head around the multi-nested structure.

prediction = 0 has_prediction = False for term in self._terms.values(): try: term_prediction = term[group_name][view_name] except KeyError: continue prediction += term_prediction has_prediction = True if has_prediction: ...

I'm sorry, I don't really see how that's different from the existing code, except you're using an addtional boolean instead of the None sentinel. Starting with prediction=0 will result in an additional element-wise addition, which may be expensive.

I just assumed the 0 instead of None would be safer wrt arithmetic errors.

src/mofaflex/_core/model.py

arberqoku · 2026-01-18T13:36:01Z

src/mofaflex/_core/model.py

+
+        return lr_func
+
+    def on_train_start(self, data: MofaFlexDataset):


I wonder if performing some sort of validation / error handling here would save some time for errors being raised later or being propagated during training. Maybe something like:

def _validate_terms(self, data): for term in self._terms.values(): term.validate(data, self._likelihoods)

before on_train_start and then call self._validate_terms(data) inside on_train_start. Not sure what could go wrong but I remember the issue we had when we do all the processing which takes a bit of time but had issues with the model config during training (now term config).

Validation of arguments should be performed in the constructor as much as possible. But I don't think it makes sense to implement validation ourselves, it's probably better to use something like Pydantic for this. But that is out of scope for this PR IMHO.

ilia-kats added 30 commits December 12, 2025 13:20

make the mofaflex term responsible for its setup, also refactor priors

31e7a2f

There is now only one class per prior, without the split into Pyro priors and wrapper classes. This counteracts the proliferation of wrapper classes and allows us to use the same architecture for priors and terms.

minor simplifications and fixes

1b05d9b

make R2 calculation work again

88ab2be

R2 is now being calculated for the entire model, for each additive term, and for each component (e.g. factor) of each additive term.

minor fixes

a34106b

terms/mofaflex: do all the initialization immediately prior to training

6772b9b

this will make it possible to construct a term outside of the main MOFAFLEX class

implement save/load for the entire model hierarchy

9555046

move likelihood initialization to the model

1a678ea

properly handle likelihoods for nonnegative views

33ff948

simplify all the base classes

a104acb

move the validation of subclasses and subclass registry handling to a class decorator

generate API wrappers for terms and likelihoods

ea53634

remove dead code and reorganize

47eda11

minor simplification

ea33955

new API for creating and training models

cfa856f

move the dynamic prior API to the mofaflex term

882f652

allow specifying a likelihood for multiple views

1e226e5

type hint and API fixes

c3cbb00

return read-only mappings from the public API

Likelihoods: make known_likelihoods() a normal class method

004ad26

apparently the ability to use class method properties was removed in Python 3.13

add wrapper class for terms that only exposes the public API

bf5332b

MofaFlex term: fix dynamic API for multiple priors of the same type

398a77e

With the new API, it is now possible to use different configurations for the same prior in different groups/views. The dynamic API now handles that case by merging the results of all priors of the same class.

GP prior: make warping a boolean argument

07211e9

Warping is now applied to all groups of the GP prior. If warping is not required for some groups, a separate GaussianProcess instances with warping turned off can be used.

MOFAFLEX: forward getattr to the term if there is only one term

e3ccc71

generate API docs for priors and terms

554db74

simplify

0f8b345

datasets: implement filtering by groups/views in get_covariats

0ef577b

simplify spike-and-slab prior

5f0e407

minor simplification

d323d5d

add docstrings and generate docs for likelihoods

5c3e784

ilia-kats added 14 commits January 14, 2026 16:55

fix R2 for non-Gaussian likelihoods

67b4cee

finalize plotting API + docstrings

689ffc4

Normal likelihood: initialize variational dispersion to data scale

7aa9b62

This improves sparsity with the spike and slab prior as well as the quality of the results

fix InformedHorseshoe and re-run Kang tutorial

e334945

fix informed horseshoe prior when only a subset of its views have

88e5513

annotations, re-run citeseq tutorial

update test plots

b1b628a

fix negative binomial likelihood

fbb91ba

fix MofaFlex getters with ordered=True and re-run Xenium tutorial

eceffc6

GP: don't use buffers anymore

274aa92

initialization now happens in a device context

some likelihood fixes for datasets with a single obs

bc8af3e

less warnings

cleanup: only define plate dimensions in the main model

1dbd4dc

pass them to term hooks

raise informative exceptions if the user tries to invoke methods on an

0f67321

untrained model that only work on a trained model and vice versa

make get_dispersion work again

e0a9b55

add missing docstring

ae6c837

ilia-kats requested review from arberqoku and florinwalter January 16, 2026 16:25

ilia-kats force-pushed the additive_terms branch from 682f60b to 603e15d Compare January 16, 2026 16:50

bound Sphinx to <9

a613e04

sphinx-tabs is not yet compatible with Sphinx 9, see executablebooks/sphinx-tabs#209

ilia-kats force-pushed the additive_terms branch from 603e15d to a613e04 Compare January 16, 2026 17:00

arberqoku reviewed Jan 18, 2026

View reviewed changes

src/mofaflex/_core/terms/base.py Show resolved Hide resolved

arberqoku reviewed Jan 18, 2026

View reviewed changes

src/mofaflex/_core/terms/base.py Show resolved Hide resolved

arberqoku reviewed Jan 18, 2026

View reviewed changes

src/mofaflex/_core/model.py Show resolved Hide resolved

arberqoku reviewed Jan 18, 2026

View reviewed changes

ilia-kats added 2 commits January 19, 2026 09:29

more detailed docstrings

0e0c401

add changelog

07d7061

ilia-kats requested a review from arberqoku January 19, 2026 08:34

arberqoku merged commit 46e77c3 into main Jan 19, 2026
9 checks passed

ilia-kats deleted the additive_terms branch January 19, 2026 09:06

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support multiple additive terms#163

Support multiple additive terms#163
arberqoku merged 61 commits intomainfrom
additive_terms

ilia-kats commented Jan 16, 2026

Uh oh!

codecov bot commented Jan 16, 2026 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

arberqoku Jan 18, 2026

Uh oh!

ilia-kats Jan 18, 2026

Uh oh!

arberqoku Jan 19, 2026

Uh oh!

Uh oh!

arberqoku Jan 18, 2026

Uh oh!

ilia-kats Jan 18, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants


		return lr_func

		def on_train_start(self, data: MofaFlexDataset):

Conversation

ilia-kats commented Jan 16, 2026

Uh oh!

codecov bot commented Jan 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Uh oh!

Uh oh!

arberqoku Jan 18, 2026

Choose a reason for hiding this comment

Uh oh!

ilia-kats Jan 18, 2026

Choose a reason for hiding this comment

Uh oh!

arberqoku Jan 19, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

arberqoku Jan 18, 2026

Choose a reason for hiding this comment

Uh oh!

ilia-kats Jan 18, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

codecov bot commented Jan 16, 2026 •

edited

Loading