-
Notifications
You must be signed in to change notification settings - Fork 381
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
docs: update sysvi docs and images (#3225)
@Hrovatin --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
- Loading branch information
1 parent
d28a67f
commit 859cc1e
Showing
10 changed files
with
168 additions
and
19 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,20 +1,153 @@ | ||
# SysVI | ||
|
||
:::{note} | ||
This page is under construction. | ||
::: | ||
|
||
**sysVI** (Python class {class}`~scvi.external.SysVI`) is a ... | ||
**sysVI** (cross-SYStem Variational Inference, | ||
Python class {class}`~scvi.external.SysVI`) | ||
is a representation learning models that can remove substantial batch effects. | ||
|
||
The advantages of SysVI are: | ||
|
||
- ... | ||
- Improved integration: For datasets with **substantial batch effects** | ||
(e.g., cross-species or organoid-tissue), where other models often fail. | ||
It provides a good tradeoff between batch correction and preservation of | ||
cell-type and sub-cell-type biological variation. | ||
- Tunable integration: The **integration strength is directly tunable** | ||
via cycle consistency loss. | ||
- Generally applicable: The model operates on | ||
**approximately normally distributed data** | ||
(e.g. normalized and log+1 transformed scRNA-seq data), which makes it | ||
more generally applicable than just scRNA-seq. | ||
- Scalable: Can integrate very large datasets if using a GPU. | ||
|
||
The limitations of SysVI include: | ||
|
||
- ... | ||
- Weak batch effects: For datasets with **small batch effects** | ||
(e.g. multiple subjects from a single laboratory) we recommend using scVI instead, | ||
as it has slightly higher biological preservation in this setting. | ||
For determining whether a dataset has substantial batch effects | ||
please refer to our paper. | ||
- Model selection: The best performance is achieved if | ||
**selecting the best model** from multiple | ||
runs with a few different cycle consistency loss weights and random seed | ||
initialisations, as explained in the tutorial. | ||
However, we provide **defaults** that generate decent results in | ||
many settings. | ||
|
||
|
||
```{topic} Tutorials: | ||
- {doc}`/tutorials/notebooks/scrna/sysVI` | ||
``` | ||
|
||
```{topic} References: | ||
- Paper: Hrovatin and Moinfar, et al. | ||
Integrating single-cell RNA-seq datasets with substantial batch effects. | ||
bioRxiv (2023): https://doi.org/10.1101/2023.11.03.565463 | ||
- Talk on caveats of scRNA-seq integration and strategies for removing | ||
substantial batch effects: https://www.youtube.com/watch?v=i-a4BjAn90E | ||
``` | ||
|
||
## Method background | ||
|
||
The model is based on a variational autoencoder (VAE), with the integrated | ||
representation corresponding to the latent space embedding of the cells. | ||
|
||
### Stronger batch correction with cycle-consistency loss | ||
|
||
Vanilla VAEs struggle to achieve strong batch correction without loosing | ||
substantial biological variation. This issue arises as the VAE loss | ||
does not directly penalize the presence of batch covariate information in the | ||
latent space. | ||
Instead, conditional VAEs assume that batch covariate information will be | ||
omitted from the latent space, which has limited-capacity, | ||
as it is separately injected into the decoder. Namely, its presence in the | ||
latent space is "unnecessary" for the reconstruction (Hrovatin and Moinfar, 2023). | ||
|
||
To achieve stronger integration than vanilla VAEs, SysVI employs | ||
cycle-consistency loss in the latent space. In particular, the model embeds a cell | ||
from one system (i.e. the covariate representing substantial batch effect) | ||
into latent space and then decodes it using another category of the system covariate. | ||
In this way it generates a biologically identical cell with a | ||
different batch effect. The generated cell is then likewise embedded into the | ||
latent space and the distance between the embeddings of the original and | ||
the switched-batch cell are computed. The model is trained to minimize this distance. | ||
|
||
:::{figure} figures/sysvi_cycleconsistency.png | ||
:align: center | ||
:alt: Cycle consistency loss used to increase batch correction in SysVI. | ||
:class: img-fluid | ||
::: | ||
|
||
Benefits of this approach: | ||
- As only cells with identical biological background are compared, this method | ||
retains good biological preservation even when removing | ||
substantial batch effects. This distinguishes it from alternative approaches | ||
that compare cells with different biological backgrounds | ||
(e.g. via adversarial loss; see Hrovatin and Moinfar (2023) for details). | ||
- The integration strength can be directly tuned via the cycle-consistency | ||
loss weight. | ||
|
||
### Improved biological preservation via the VampPrior | ||
|
||
Vanilla VAEs employ standard normal prior for regularizing latent space. | ||
However, this prior is very restrictive and can lead to loss of | ||
important biological variation in the latent space. | ||
|
||
Instead, we use the | ||
VampPrior ([Tomczak, 2017](https://doi.org/10.48550/arXiv.1705.07120)), | ||
which permits a more expressive latent space. VampPrior is a multi-modal | ||
prior for which the mode positions are learned during the training. | ||
|
||
:::{figure} figures/sysvi_vampprior.png | ||
:align: center | ||
:alt: VampPrior used to increase the preservation of biological variation in SysVI. | ||
:class: img-fluid | ||
::: | ||
|
||
Benefits of this approach: | ||
- More expressive latent space leads to increased preservation of | ||
biological variability. | ||
- VampPrior was more robust with respect to the number of modes than the | ||
better-know Gaussian mixture prior. | ||
|
||
### Application flexibility due to using normally distributed inputs | ||
|
||
Many scRNA-seq integration models are specially designed to work with | ||
scRNA-seq data, e.g. raw counts that follows negative binomial distribution. | ||
However, due to this, these models can not be directly used for other | ||
types of data. | ||
|
||
We observed that for representation learning this specialised setup is not | ||
strictly required. - SysVI is designed for data following normal distribution, while | ||
performing competitively in comparison to the more specialised models | ||
on scRNA-seq data. | ||
To make scRNA-seq data approximately normally distributed we preprocess it via | ||
size-factor normalization and log+1 transformation. | ||
|
||
Thus, SysVI could be also applied to other types of normally distributed data. | ||
However, we did not specifically test its performance on other data types. | ||
|
||
## Other tips & tricks for data integration | ||
|
||
Besides the benefits of the SysVI model, our paper | ||
([Hrovatin and Moinfar, 2023](https://doi.org/10.1101/2023.11.03.565463)) | ||
and | ||
[talk](https://www.youtube.com/watch?v=i-a4BjAn90E) | ||
provide additional advice on scRNA-seq integration that apply beyond SysVI. | ||
The two most important insights are: | ||
- Try to make the **integration task as easy for the model** as possible. | ||
This means that data should be pre-processed in a way that already eliminates | ||
some of the batch differences, when possible: | ||
- Use intersection of HVGs across batches with substantial batch effects | ||
(e.g. the systems). | ||
- Mitigate known technical artefacts, such as ambient gene expression | ||
([Hrovatin and Sikkema, 2024](https://doi.org/10.1038/s41592-024-02532-y)). | ||
- Ensure that **the metrics used to evaluate integration are of high-quality**: | ||
- They should be able to capture the key properties required for downstream tasks. | ||
For example, the standard cell-type based biological preservation metrics do | ||
not assess whether subtler biological differences, such as within-cell-type | ||
disease effects, are preserved. | ||
- Be cautious of potential biases within integration metric scores. - | ||
The scores may not directly correspond to the desired data property, | ||
being influenced by other factors, or | ||
certain models may be able to trick the metrics. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters