Replies: 2 comments
-
The The easiest way to achieve what you're looking for would be to set the QUAL of the variants such that those present in the first sample have a higher quality than all of the others and use I'm actively refactoring the truvari api to facilitate vcf manipulations, so as a test of that work here's an example of how to do this using truvari v5 (unreleased, but available in the repo's develop branch) # Set the quality score to 1 for variants present in the sample of interest, 0 otherwise
import joblib
import truvari
vcf = truvari.VariantFile("merged.vcf.gz")
out = truvari.VariantFile("merged.setqual.vcf", 'w', header=vcf.header)
qual_lookup = {}
for entry in vcf:
qual_lookup[entry.id] = entry.qual
entry.qual = 1 if entry.is_present(sample="SampOfInterest") else 0
out.write(entry)
out.close()
truvari.compress_index_vcf("merged.setqual.vcf")
# Save quals to put back in later
joblib.dump(qual_lookup, "qual_lookup.jl")
# run truvari collapse --keep maxqual -i merged.setqual.vcf.gz
# Replacing QUALs after collapse
vcf = truvari.VariantFile("collapsed.output.vcf.gz")
out = truvari.VariantFile("collapsed.replacedqual.vcf", 'w', header=vcf.header)
qual_lookup = joblib.load("qual_lookup.jl")
for entry in vcf:
entry.qual = qual_lookup[entry.id]
out.write(entry)
out.close()
truvari.compress_index_vcf("collapsed.replacedqual.vcf")
|
Beta Was this translation helpful? Give feedback.
-
Thank you so much for the quick response and clarification! This is a
perfect work around.
…On Tue, Jan 7, 2025 at 10:34 AM Adam English ***@***.***> wrote:
The --keep first chooses the first variant by position as the
representative of a collapsed cluster of variants and does not have any
consideration of the samples' presence.
The easiest way to achieve what you're looking for would be to set the
QUAL of the variants such that those present in the first sample have a
higher quality than all of the others and use --keep maxqual. This would
have the downside of losing the QUAL information, but that could be
replaced in a second step. I'll assume you have IDs in the VCF for each
entry and if not you can use truvari anno addid merged.vcf.gz if you'd
like to add them.
I'm actively refactoring the truvari api to facilitate vcf manipulations,
so as a test of that work here's an example of how to do this using truvari
v5 (unreleased, but available in the repo's develop branch)
# Set the quality score to 1 for variants present in the sample of interest, 0 otherwiseimport joblibimport truvari
vcf = truvari.VariantFile("merged.vcf.gz")out = truvari.VariantFile("merged.setqual.vcf", 'w', header=vcf.header)qual_lookup = {}for entry in vcf:
qual_lookup[entry.id] = entry.qual
entry.qual = 1 if entry.is_present(sample="NA24385") else 0
out.write(entry)out.close()truvari.compress_index_vcf("merged.setqual.vcf")# Save quals to put back in laterjoblib.dump(qual_lookup, "qual_lookup.jl")
# run truvari collapse --keep maxqual -i merged.setqual.vcf.gz
# Replacing QUALs after collapsevcf = truvari.VariantFile("collapsed.output.vcf.gz")out = truvari.VariantFile("collapsed.replacedqual.vcf", 'w', header=vcf.header)qual_lookup = joblib.load("qual_lookup.jl")for entry in vcf:
entry.qual = qual_lookup[entry.id]
out.write(entry)out.close()truvari.compress_index_vcf("collapsed.replacedqual.vcf")
- Disclaimer: I only did 1 quick test of this script. It should work,
but no guarantees.
—
Reply to this email directly, view it on GitHub
<https://urldefense.com/v3/__https://github.com/ACEnglish/truvari/discussions/253*discussioncomment-11764430__;Iw!!K-Hz7m0Vt54!g4reuIuwNxz4qL1ArBVlxkmhjDNnyhhXqN8mf5U7sK0lPuYfIGww_0cwiTHDtwdUlO5ETX3zxlCUWSyk2OAWv-A$>,
or unsubscribe
<https://urldefense.com/v3/__https://github.com/notifications/unsubscribe-auth/A3LED72ZQA5U3SEX56K2NB32JQM3FAVCNFSM6AAAAABUYGDXVWVHI2DSMVQWIX3LMV43URDJONRXK43TNFXW4Q3PNVWWK3TUHMYTCNZWGQ2DGMA__;!!K-Hz7m0Vt54!g4reuIuwNxz4qL1ArBVlxkmhjDNnyhhXqN8mf5U7sK0lPuYfIGww_0cwiTHDtwdUlO5ETX3zxlCUWSyk5pkp6Xo$>
.
You are receiving this because you authored the thread.Message ID:
***@***.***>
|
Beta Was this translation helpful? Give feedback.
-
Hi There!
I am using Truvari collapse (Truvari v4.2.2) and I am trying to utilize the -k first option, but it doesn't seem to be working (ie. the variants in the merged output do not have the coordinates from the first sample in the input file), so I am wondering if I am using the command correctly.
The command I am using is:
truvari collapse -i bcftools_merged.vcf.gz -o truvari_merged.vcf -c truvari_collapsed.vcf -f hg38.fasta -k first -s 50 -S 10000000 --passonly -B 50 -r 2000 -p 0 -P 0.2 -O 0.2 --chain
My input bcftools_merged vcf has a "sample of interest" in the first sample column. If I understand the Truvari command correctly, I would expect that if a variant from that first sample is merged with variants in other samples, the representation of the merged variant in the truvari_merged.vcf would have the coordinates of the first sample in the truvari_merged.vcf... But that is not what I am seeing in my output.
Am I misunderstanding the -k first option?
Thank you!!
Beta Was this translation helpful? Give feedback.
All reactions