One Health genomic analysis of CTX-M-producing E. coli

This repository outlines the analysis pipeline for the paper S. Jiang. et al. Cross-sectoral sharing of CTX-M-producing Escherichia coli: A One Health analysis to understand dissemination modes （in review)

Genome assembly and QC assessment

High-quality reads were assembled using SPAdes (https://github.com/ablab/spades)

python spades.py --pe1-1 file1 --pe1-2 file2 -o assmebly --careful -k 21,33,55,77,99,127

Species confirmation by GTDB-Tk (https://github.com/Ecogenomics/GTDBTk)

gtdbtk classify_wf --genome_dir genomes --out_dir gtdbtk/classify --cpus 10 --skip_ani_screen

QC assessment by checkM (https://github.com/Ecogenomics/CheckM)

checkm lineage_wf -x fasta input_bins output_folder

Genome annotation and population genomics

Antibiotic resistance genes (ARGs) identification using AMRFinderPlus (https://github.com/ncbi/amr)

amrfinder -n seq.fna --organism Escherichia

The lineages were assigned by PopPUNK (https://poppunk.readthedocs.io/en/latest/index.html)

poppunk --create-db --output EC_database --r-files list.txt --threads 8
poppunk --fit-model lineages --ref-db EC --ranks 1,2,3
poppunk_visualise --ref-db EC --cytoscape --network-file EC/EC_graph.gt

Phylogenetic analysis

core genome alignment was generated using snippy (https://github.com/tseemann/snippy), and recombination sites were removed with Gubbins (https://github.com/nickjcroucher/gubbins). A maximum-likelihood phylogenetic tree was then constructed using IQ-TREE (http://www.iqtree.org/) based on clean core genome SNP alignments.

snippy --outdir mut1 --ref ref.gbk --ctgs mut1.fasta
run_gubbins.py -p gubbins clean.full.aln
snp-sites -c gubbins.filtered_polymorphic_sites.fasta > clean.core.aln
iqtree -s clean.core.aln --boot-trees --wbtl -m GTR+I+G -B 1000 -nt 18

Source prediction using DAPC

Call the core SNPs

snippy-core --ref ref.gbk s1.fna s2.fna ...

Discriminant Analysis of Principal Components (DAPC) analysis

if (!requireNamespace("vcfR", quietly = TRUE)) install.packages("vcfR")
if (!requireNamespace("adegenet", quietly = TRUE)) install.packages("adegenet")
if (!requireNamespace("ggplot2", quietly = TRUE)) install.packages("ggplot2")

library(vcfR)
library(adegenet)
library(ggplot2)

train_vcf_file <- "train_population_data.vcf" #The train list for source prediction was uploaded in the repository
supplementary_vcf_file <- "HK_individuals.vcf" 
train_vcf <- read.vcfR(train_vcf_file)
supplementary_vcf <- read.vcfR(supplementary_vcf_file)
train_genlight <- vcfR2genlight(train_vcf)
predict_genlight <- vcfR2genlight(supplementary_vcf)

dapc <- dapc(train_genlight, grp$grp)
pred.sup <- predict.dapc(dapc, newdata=predict_genlight)
predict_coords <- pred.sup$ind.scores

Mobile genetic elements identification

PLASMe is used to identify plasmid contigs (https://github.com/HubertTang/PLASMe)

python PLASMe.py input.fasta plasme_predict.fna

All contigs were further mapped to the E. coli K-12 chromosome for validation

blastn_scripts.py -i contig.fna -db K-12 -o map.results.txt --minid 0.7 --mincov 0.7 -t 8

The predicted plasmids were further clustered by genetic distance

## plasmid gene annotation
prokka /path/to/"$sample".fasta --quiet --outdir /path/to/prokka_output/"$sample" --force --prefix $sample

## The pangenome of plasmids was generated by Roary
roary *.gff -cd 95 -f plasmid_pangenome

## Pairwise Jaccard similarity coefficient calculated between genomes using scripts _pw_similarity.py_
python pw_similarity.py -i binary_presc_absc.tsv -o example1 -r "isolates" -s "jaccard" -f 0

## The community detection was generated based on similarity using the Louvain algorithm (https://github.com/taynaud/python-louvain)
usage: louvain_community.py [-h] -i INPUT -o OUTPUT [--resolution RESOLUTION]

Calculate Louvain communities from Mash distance results.

optional arguments:
  -h, --help            show this help message and exit
  -i INPUT, --input INPUT
                        Input Mash distance results file (tab-delimited).
  -o OUTPUT, --output OUTPUT
                        Output file for Louvain community results.
  --resolution RESOLUTION
                        Resolution parameter for Louvain algorithm (default:
                        1.0)

The mobile genetic elements are predicted by mapping against a reference using conseq.py

usage: conseq.py [-h] -r REFERENCE -q QUERY [-p PREFIX] -c COVERAGE -t TAB_OUTPUT -o FASTA_OUTPUT

Run nucmer, calculate coverage, and filter query contigs by coverage.

options:
  -h, --help            show this help message and exit
  -r REFERENCE, --reference REFERENCE
                        Path to the reference input file.
  -q QUERY, --query QUERY
                        Path to the query input file.
  -p PREFIX, --prefix PREFIX
                        Prefix for nucmer output files (default: nucmer_output).
  -c COVERAGE, --coverage COVERAGE
                        Minimum coverage threshold for filtering contigs.
  -t TAB_OUTPUT, --tab_output TAB_OUTPUT
                        Path to the output tab file with contig lengths, coverage, and reference.
  -o FASTA_OUTPUT, --fasta_output FASTA_OUTPUT
                        Path to the output FASTA file for filtered contigs

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
Blastn_script.py		Blastn_script.py
README.md		README.md
conseq.py		conseq.py
louvain_community.py		louvain_community.py
train.list		train.list

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

One Health genomic analysis of CTX-M-producing E. coli

Genome assembly and QC assessment

Genome annotation and population genomics

Phylogenetic analysis

Source prediction using DAPC

Mobile genetic elements identification

About

Uh oh!

Releases

Packages

Languages

JasonJiang42/HK_One_Health_analysis

Folders and files

Latest commit

History

Repository files navigation

One Health genomic analysis of CTX-M-producing E. coli

Genome assembly and QC assessment

Genome annotation and population genomics

Phylogenetic analysis

Source prediction using DAPC

Mobile genetic elements identification

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages