Skip to content

ncbi/vadr

Repository files navigation

VADR - Viral Annotation DefineR

Version 1.7; September 2025

VADR is a suite of tools for classifying and analyzing sequences homologous to a set of reference models of viral genomes or gene families. It includes models that can be used to validate and annotate Norovirus, Dengue virus, SARS-CoV-2 virus as well as other flaviviruses, caliciviruses, and coronaviruses, plus influenza virus, mpox virus, and respiratory syncitial virus (RSV). Additional models are available to download or can be created using the v-build.pl program.


Quick-start: install VADR and classify and annotate viral sequences using v-scan.pl

Install VADR:

Download this file:

https://raw.githubusercontent.com/ncbi/vadr/master/vadr-install.sh

possibly with a command like:

curl -o vadr-install.sh https://raw.githubusercontent.com/ncbi/vadr/master/vadr-install.sh

And execute it, with one of the following commands depending on your system type:

sh ./vadr-install.sh linux

OR

sh ./vadr-install.sh macosx-silicon

OR

sh ./vadr-install.sh macosx-intel

Then follow the instructions output at the end of the installation for updating your .bashrc or .cshrc file and defining important environment variables that VADR relies on.

Run v-scan.pl to annotate viral sequences

Given a fasta sequence file called my.fa with any combination of flavivirus, calicivirus, coronavirus, influenza, RSV, or Mpox sequences, run:

v-scan.pl -m in.fa out

This will list each stage of the processing and ultimately create an output directory called out and fill it with output files. Short descriptions of the output files will be printed to the screen. More detailed explanation of output file types can be found here. For a more detailed walk-through example of v-scan.pl see this page.


VADR programs

The VADR v-scan.pl script classifies and annotates sequences that match to any of your VADR model libraries. Once v-scan.pl determines the library to use for a given set of sequences, it runs a different VADR program called v-annotate.pl which identifies the appropriate model in the library to use for each sequence and defines the annotation based on that most similar model. v-scan.pl will automatically run v-annotate.pl using the recommended settings (v-annotate.pl command-line options) for each library but alternatively, users can run the v-annotate.pl separately. Example usage of v-annotate.pl can be found here.

Another VADR script, v-build.pl, is used to create the models from individual sequences from GenBank or from input multiple sequence alignments, potentially with secondary structure annotation. v-build.pl stores the GenBank feature annotation in the model, and v-annotate.pl maps that annotation (e.g. CDS coordinates) onto the sequences it annotates. Example usage of v-build.pl can be found here. An advanced tutorial on building VADR models using RSV as an example can be found here.

v-annotate.pl identifies unexpected or divergent attributes of the sequences it annotates (e.g. invalid or early stop codons in CDS features) and reports them to the user in the form of alerts. A subset of alerts are fatal and cause a sequence to fail. A sequence passes if zero fatal alerts are reported for it. VADR is used by GenBank staff to evaluate incoming sequence submissions of some viruses (currently Norovirus, Dengue virus, and SARS-CoV-2). Submitted Norovirus, Dengue virus and SARS-CoV-2 sequences that pass v-annotate.pl are accepted into GenBank.

The homology search and alignment components of VADR scripts, the most computationally expensive steps, are performed by the Infernal, HMMER, FASTA, MINIMAP2 and BLAST software packages, which are downloaded and installed with VADR installation.


VADR model libraries

VADR installation includes the following model libraries:

library model key (short name) rigorously tested? number of models notes
Caliciviridae calici norovirus models only 49 norovirus models used by GenBank
Flaviviridae flavi dengue and HCV models only 156 dengue models used by GenBank
Coronaviridae corona SARS-CoV-2 only 55 SARS-CoV-2 models used by GenBank
influenza flu yes 70 described in Database article
Mpox mpxv yes 1
respiratory syncitial virus (RSV) rsv yes 2

Additional models are available. See this page for a list of all available models and additional information.


VADR documentation


Contributors

  • VADR includes contributions and input from current and former colleagues at NCBI, including:

    Rodney Brister

    Vince Calhoun

    Sergiy Gotvyanskyy

    Eneida Hatcher

    Sophia Hu

    Ilene Karsch-Mizrachi

    Rich McVeigh

    Susan Schafer

    Alejandro Schäffer

    Lara Shonkwiler

    Beverly Underwood

    Yuri Wolf

    Linda Yankie


Reference

  • The recommended citation for influenza analysis using VADR is: Vincent C Calhoun, Eneida L Hatcher, Linda Yankie, Eric P Nawrocki; Influenza sequence validation and annotation using VADR. Database. baae091. (2024). https://doi.org/10.1093/database/baae091

  • The recommended citation for using VADR for SARS-CoV-2 analysis: Eric P Nawrocki; Faster SARS-CoV-2 sequence validation and annotation for GenBank using VADR. NAR Genom Bioinform. 2023 Jan 20;5(1)::lqad002. (2023). https://doi.org/10.1093/nargab/lqad002

  • The recommended citation for all other uses of VADR is: Alejandro A Schäffer, Eneida L Hatcher, Linda Yankie, Lara Shonkwiler, J Rodney Brister, Ilene Karsch-Mizrachi, Eric P Nawrocki; VADR: validation and annotation of virus sequence submissions to GenBank. BMC Bioinformatics 21, 211 (2020). https://doi.org/10.1186/s12859-020-3537-3


Questions, comments or feature requests? Send a mail to [email protected].

About

Viral Annotation DefineR: classification and annotation of viral sequences based on RefSeq annotation

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages