CTS (CDM Task Service) job wrapper for skani, a fast average nucleotide identity (ANI) calculator for genomes, contigs, and MAGs.
This is the generic, no-refdata variant. For querying user genomes against the GTDB R232 reference sketch set, use cdm_skani_gtdb instead.
- Published to
ghcr.io/kbaseincubator/cdm_skani - Skani version: 0.3.1 (pinned, see below)
- Entrypoint:
skani(no subcommand) - appenddist,triangle,search, orsketchas the first argument
The skani binary is copied out of ecogenomic/gtdbtk:2.7.2, which ships skani 0.3.1. We pin to that exact binary so that cdm_skani_gtdb (which shares this same binary) is guaranteed to read the skani sketches that gtdbtk 2.7.2 built inside the GTDB R232 reference bundle. Sketch-format compatibility across skani versions is documented as "use the same version that built the database"; sticking to 0.3.1 across the pair removes that failure mode by construction.
skani has four subcommands. CTS args follow args=["<subcommand>", ...flags..., tscli.insert_files()].
job = tscli.submit_job(
"ghcr.io/kbaseincubator/cdm_skani:0.1.0",
[query_genome, reference_genome],
"cts/io/<user>/output/skani_dist/run1",
cluster="kbase",
declobber=True,
output_mount_point="/out",
args=[
"dist",
"-o", "/out/ani.tsv",
"-q", "/path/to/query.fna", # see CTS docs for input mount layout
"-r", "/path/to/reference.fna",
"-t", "4",
],
num_containers=1,
cpus=4, memory="8GB", runtime="PT30M",
)args=[
"triangle",
"-o", "/out/ani_matrix.tsv",
"-E", # edge-list output (otherwise: similarity matrix)
"-t", "4",
tscli.insert_files(), # all user genomes via placeholder
]For querying against the GTDB R232 reference set, use cdm_skani_gtdb instead - it has the refdata bundled at registration time.
args=[
"search",
"-d", "/path/to/sketch_db/", # built earlier with `skani sketch`
"-o", "/out/hits.tsv",
"-t", "4",
tscli.insert_files(), # query genomes
]args=[
"sketch",
"-o", "/out/sketch_db",
"-t", "4",
tscli.insert_files(), # reference genomes to sketch
]skani writes a TSV with the following columns (per skani dist, triangle -E, and search):
Ref_file, Query_file, ANI, Align_fraction_ref, Align_fraction_query, Ref_name, Query_name.
For triangle without -E, output is a phylip-format similarity matrix instead.
Shaw & Yu, Nature Methods (2023), DOI 10.1038/s41592-023-02018-3.