ConvMut

Concept

Convergent evolution in protein antigens is common across pathogens and has also been documented in SARS-CoV-2 (hCoV-19); the most likely reason is the need to evade the selective pressure exerted by previous infection- or vaccine-elicited immunity. There is a pressing need for tools that allow automated analysis of convergent mutations.

In response to this need, we developed ConvMut, a tool to analyze genetic sequence data to identify patterns of recurrent mutations in SARS-CoV-2 evolution. To this end, we exploited the granular phylogenetic tree representation developed by PANGO, allowing us to observe what we call deltas, i.e., groups of mutations that are acquired on top of the immediately upstream tree nodes. Deltas comprise amino acid substitutions, insertions, and deletions. ConvMut can perform individual protein analysis to identify the most common single mutations acquired independently in a given subtree (starting from a user-selected root). Such mutations are represented in a barplot that can be sorted by frequency or position, and filtered by region of interest. Lineages are then gathered into clusters according to their sets of shared mutations. Finally, an interactive graph orders the evolutionary steps of clusters, details the acquired amino acid changes for each sublineage, and allows us to trace the evolutionary path until a selected lineage.

Other unique tools are paired with the main functionality of ConvMut to support a complete analysis, such as a frequency analysis for a given nucleotide or amino acid changes at a given residue across a selected phylogenetic subtree.

ConvMut will facilitate the design of antiviral anti-Spike monoclonal antibodies and Spike-based vaccines with longer-lasting efficacy, minimizing development and marketing failures.

Implementations

The ConvMut software is deployed and made usable on the GISAID platform https://gisaid.org/ to all users with a valid registration, leveraging the real-time updated EpiCoV database.

For use on local servers, we provide an open implementation that can be downloaded from this GitHub repository and run on any correctly-formatted input and deployed as described next.

Open Implementation

Requirements

This software requires a computer with:

an installation of Docker CLI or Docker Desktop
a terminal / command prompt
Docker CLI v2 (or above) or Docker Desktop >=4.0.0;
storage space: minimum 16GB
memory: minimum 16GB

Setup and Installation

All the commands are intended to be used in a terminal window inside the directory where this file resides.

Software dependencies are installed through PIP within a Docker container (convmut-open-data-base-image) that prepares the virtual environment for running the software. To prepare the virtual environment, start Docker and run:

docker compose build base && docker compose build

Notes for software developers: whenever a change to the dependencies (i.e., the requirements.txt file) is made, rebuild the virtual environment with the option --no-cache to apply the change.

Usage

Download and update of the data

Open a new terminal on your system, navigate to the ConvMut directory and launch the Input Data Updater Service by executing the command

docker compose run --rm open-data-updater

Whenever you need to update an outdated version of the data, repeat the instructions in this section. If the Input Data Updater Service detects a newer version of the data, it will be downloaded. The data update will be automatically reflected in the Application Frontend with some delay, or one can force the update immediately by restarting the application as described in the next section.

Troubleshooting tips: if the files are not downloaded or updated, there might be an issue with your internet connection, or the source files might have changed location or access protocol. The Input Data Updater Service normally silences any failure, but you can make failures explicit by appending the options --no-silent-date-check --no-silent-download to the above command.

Application Startup

Open a new terminal on your system, navigate to the ConvMut directory and launch both the Application Data Updater Service and the Streamlit Service by executing the command

docker compose up open-data-frontend

Open a browser and navigate to the address https://localhost:65265.
Start exploring!

[!WARNING] Startup times On the very first run, the application takes usually ~10 minutes before getting ready. During this time, the necessary Application Data is generated. Subsequent runs won't introduce any delay even if you stop/restart the application until the Dynamic Input Data is updated.

Uninstall

Open Docker Desktop and delete the images and containers related to ConvMut. Then delete this repository.

Detailed System Overview

The system can be viewed as a stack of layers, including data and services.

The Data provider layer includes the application's input data sources, specifically, we employ Pangolin designations and UCSC-built lineage constellations.

The download of the Dynamic Input Data into a Docker Data Volume is managed by the Input Data Updater Service of the Application Backend. When launched, the Input Data Updater Service compares the local files with those available from the corresponding sources and downloads the related files whenever an updated version is available. Within Dynamic Input Data, we expect some files containing information about the current clades (e.g., Pango-lineages), their mutations (along a chosen phylogeny), and hierarchical relationships.

The Application Backend (packaged as a Docker Application Container) integrates:

the Input Data Updater Service providing and updating the Dynamic Input Data
the Application Data Updater Service that transforms the dynamic input data (served through the Docker Data Volume) into a format that offers better performances for the computation of queries (see Application Data); the task is run at the application startup and after it, in the background to re-compute the application data whenever the input source is updated.
the Streamlit Service is a service backed by the application framework Streamlit, enclosing API, UI modules and the main application logic.

By Application Data we refer to a collection of files encoded in a format that helps the application to efficiently answer the user queries. This is composed of:

files representing the domain knowledge (e.g., protein annotations, the reference sequence) – as this information is not supposed to change in the future, we refer to this as "static data" and ship it together with the application;
data files dynamically generated by the Application Data Updater Service.

Finally, we have the Application Frontend, a graphical interface allowing the user to explore the convergent mutations.

Acknowledgements

We gratefully acknowledge all data contributors, i.e., the Authors and their Originating laboratories responsible for obtaining the specimens, and their Submitting laboratories for generating the genetic sequence and metadata and sharing via the GISAID Initiative, on which this research is based.

Citation

Consider citing this work in your research as:

Tommaso Alfonsi,  Anna Bernasconi, Emma Fanfoni, Cesare Ernesto Maria Gruber, Fabrizio Maggi,  Daniele Focosi.
ConvMut: Exploration of viral convergent mutations along phylogenies. biorXiv
https://doi.org/10.1101/2024.12.16.628620

Contact us

https://annabernasconi.faculty.polimi.it/

[email protected]

Phone: +39 02 2399 3494

Name		Name	Last commit message	Last commit date
Latest commit History 124 Commits
.app_config		.app_config
.streamlit		.streamlit
data/static		data/static
modules		modules
pages		pages
utilz		utilz
.dockerignore		.dockerignore
.gitignore		.gitignore
Dockerfile_base		Dockerfile_base
Dockerfile_data_updater		Dockerfile_data_updater
Dockerfile_frontend		Dockerfile_frontend
LICENSE		LICENSE
README.md		README.md
docker-compose.yaml		docker-compose.yaml
input_data_updater_service.py		input_data_updater_service.py
preprocessing.py		preprocessing.py
requirements.txt		requirements.txt
streamlit_app.py		streamlit_app.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ConvMut

Concept

Implementations

Open Implementation

Requirements

Setup and Installation

Usage

Download and update of the data

Application Startup

Uninstall

Detailed System Overview

Acknowledgements

Citation

Contact us

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 2

Uh oh!

Languages

License

DEIB-GECO/open-ConvMut

Folders and files

Latest commit

History

Repository files navigation

ConvMut

Concept

Implementations

Open Implementation

Requirements

Setup and Installation

Usage

Download and update of the data

Application Startup

Uninstall

Detailed System Overview

Acknowledgements

Citation

Contact us

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 2

Uh oh!

Languages

Packages