https://hpc.nih.gov/training/gatk_tutorial/preproc.html#preproc-tools
https://speciationgenomics.github.io/
https://genome.sph.umich.edu/wiki/Abecasis_Lab
An Introduction to Statistical Genetic Data Analysis
Handbook of statistical genomics
http://faculty.washington.edu/tathornt/SISG2019.html
https://yanglab.westlake.edu.cn/
https://github.com/AstraZeneca-NGS/VarDict
http://samtools.github.io/bcftools/howtos/cnv-calling.html
1000 Genomes:https://www.internationalgenome.org/home
The 1000 Genomes Project created a catalogue of common human genetic variation, using openly consented samples from people who declared themselves to be healthy. The reference data resources generated by the project remain heavily used by the biomedical science community.
dbSNP contains human single nucleotide variations, microsatellites, and small-scale insertions and deletions along with publication, population frequency, molecular consequence, and genomic and RefSeq mapping information for both common variations and clinical mutations.
HapMap (short for "haplotype map") is the nickname of the International HapMap Project, an international project that seeks to relate variations in human DNA sequences with genes associated with health. A haplotype is a set of DNA variations, or polymorphisms, that tend to be inherited together. A haplotype can refer to a combination of alleles or to a set of single nucleotide polymorphisms (SNPs) found on the same chromosome. The HapMap describes common patterns of genetic variation among people.
ClinVar: https://www.ncbi.nlm.nih.gov/clinvar
ClinVar 数据库整合了基因组变异及其与人类健康关系的信息
dbVar is NCBI's database of human genomic Structural Variation — large variants >50 bp including insertions, deletions, duplications, inversions, mobile elements, translocations, and complex variants.
ClinGen:https://clinicalgenome.org/
ClinGen is a National Institutes of Health (NIH)-funded resource dedicated to building a central resource that defines the clinical relevance of genes and variants for use in precision medicine and research.
GnomAD: http://www.gnomad-sg.org/
The Genome Aggregation Database (gnomAD) is a resource developed by an international coalition of investigators, with the goal of aggregating and harmonizing both exome and genome sequencing data from a wide variety of large-scale sequencing projects, and making summary data available for the wider scientific community.
VerifyBamID2:https://github.com/Griffan/VerifyBamID 检查BAM文件中的读数是否与特定样品的先前基因型匹配
LUMPY:https://github.com/arq5x/lumpy-sv
DELLY:https://github.com/dellytools/delly
ClinSV: https://github.com/KCCG/ClinSV
Paragraph: https://github.com/Illumina/paragraph
GRIDSS - the Genomic Rearrangement IDentification Software Suite: https://github.com/PapenfussLab/gridss
Manta: https://github.com/Illumina/manta
https://github.com/lgmgeo/AnnotSV
truvari: https://github.com/acenglish/truvari survivor: https://github.com/fritzsedlazeck/SURVIVOR
gridss:https://github.com/PapenfussLab/gridss
Shapeit2: https://mathgen.stats.ox.ac.uk/genetics_software/shapeit/shapeit.html
Shapeit4: https://github.com/odelaneau/shapeit4
Shapeit5: https://github.com/odelaneau/shapeit5
hapcut2: https://github.com/vibansal/hapcut2
nPhase: https://github.com/OmarOakheart/nPhase
Impute2: https://mathgen.stats.ox.ac.uk/impute/impute_v2.html
Beagle: https://faculty.washington.edu/browning/beagle/beagle.html
Minimac4: https://github.com/statgen/Minimac4
eagleimp: https://github.com/ikmb/eagleimp
eagle2: https://alkesgroup.broadinstitute.org/Eagle/
https://www.jurgott.org/linkage/runPM.html
https://github.com/BoevaLab/FREEC 下载release版本,解压安装即可
https://github.com/abyzovlab/CNVpytor
https://bioconductor.org/packages/release/bioc/vignettes/cn.mops/inst/doc/cn.mops.pdf
https://zzz.bwh.harvard.edu/xhmm/index.shtml
https://github.com/BoevaLab/ONCOCNV
https://github.com/yuchaojiang/CODEX2
https://github.com/lima1/PureCN
https://github.com/sztup/scarHRD
https://github.com/mskcc/facets
https://github.com/igm-team/ERDS
SnpEff: https://pcingola.github.io/SnpEff
VEP: https://github.com/Ensembl/ensembl-vep
gvanno: https://github.com/sigven/gvanno.git
Nirvana: https://github.com/Illumina/Nirvana
https://github.com/brentp/vcfanno
https://annovar.openbioinformatics.org/en/latest/
人、小鼠、蠕虫、果蝇、酵母等基因组结构变异功能注释,主要可以做如下3类注释
1 基于基因的注释
确认SNPs和CNVs造成的编码蛋白氨基酸的变化和影响
2 基于区域的注释
确定基因组特定区域的变异,如转录因子结合位点、片段重复区域、GWAS hits区域、组蛋白修饰区域等
3 Filter-based annotation,SNPs和indel
确认变异是否在特定的数据库中被记录描述,如dbDSNP,1000 Genome Project, gnomAD等,计算有害突变等得分
#https://www.biostars.org/p/196985/
perl annotate_variation.pl -downdb -webfrom annovar avdblist humandb/ -buildver hg38
perl annotate_variation.pl -downdb clinvar_20220320 -webfrom annovar humandb/ -buildver hg19
perl convert2annovar.pl -format vcf4 case_22BY12800_filter.vcf > case_22BY12800_filter.vcf_variant.avinput
perl annotate_variation.pl -filter -dbtype clinvar_20220320 --thread 30 -buildver hg19 case_22BY12800_filter.vcf_variant.avinput humandb/
# annotation
perl table_annovar.pl example/ex1.avinput humandb/ \
-buildver hg19 \
-out myanno \
-remove \
-protocol refGene,cytoBand,exac03,avsnp147,dbnsfp30a \
-operation g,r,f,f,f \
-nastring . \
-csvout
# -buildver hg19 表示使用的参考基因组版本为hg19
# -out myanno 指定输出文件前缀为myanno
# -remove 表示删除中间文件
# -protocol 后跟注释来源数据库名称,每个protocal名称或注释类型之间只有一个逗号,并且没有空白
# -operation 后跟指定的注释类型,和protocol指定的数据库顺序是一致的,g代表gene-based、r代表region-based、f代表filter-based
# -nastring . 表示用.替代缺省值
# -csvout 表示最后输出.csv文件
# 可用-vcfinput选项直接对vcf文件进行注释
perl table_annovar.pl example/ex2.vcf humandb/ -buildver hg19 -out myanno -remove -protocol refGene,cytoBand,genomicSuperDups,esp6500si_all,1000g2012apr_all,snp138,ljb23_all -operation g,r,r,f,f,f,f -nastring . -vcfinput
https://cran.r-project.org/web/packages/detectRUNS/vignettes/detectRUNS.vignette.html
https://gatk.broadinstitute.org/hc/en-us/articles/360035531432-Genotype-Refinement-workflow-for-germline-short-variants
https://denovo-db.gs.washington.edu/denovo-db/
https://samtools.github.io/bcftools/howtos/plugin.trio-dnm2.html
https://varscan.sourceforge.net/trio-calling-de-novo-mutations.html
akt:https://github.com/Illumina/akt
king:https://www.kingrelatedness.com
https://github.com/getian107/PRScs
https://gwaslab.com/
https://pbreheny.github.io/adv-gwas-tutorial/index.html
https://github.com/Cloufield/GWASTutorial
EMMAX
https://genome.sph.umich.edu/wiki/EMMAX
GEMMA
https://github.com/genetics-statistics/GEMMA#installation
GMMAT
https://github.com/hanchenphd/GMMAT
Bolt-LMM
https://alkesgroup.broadinstitute.org/BOLT-LMM/BOLT-LMM_manual.html
Fast-LMM
https://github.com/fastlmm/FaST-LMM
GENESIS
https://github.com/UW-GAC/GENESIS
regenie
https://github.com/rgcgithub/regenie
SAIGE
https://github.com/saigegit/SAIGE
mtag
https://github.com/JonJala/mtag
susieR
https://github.com/stephenslab/susieR
java -jar ~/software/snpEff/SnpSift.jar split -j *bed.vcf > merged.vcf
java -jar ~/software/snpEff/SnpSift.jar sort *bed.vcf > merged.vcf
#不同在于注释行的差异,其他均一致
https://github.com/reneshbedre/bioinfokit
https://www.bioconductor.org/packages/release/bioc/vignettes/TitanCNA/inst/doc/TitanCNA.pdf
https://www.biostars.org/p/386231/
https://gatk.broadinstitute.org/hc/en-us/articles/360035531912?id=11029
http://samtools.github.io/hts-specs/VCFv4.3.pdf 页数8
https://www.biostars.org/p/343818/
https://zhuanlan.zhihu.com/p/373217037