Hello,
I am using degenotate to do per-site degeneracy annotation for several genes in a reference genome, in order to search for non-synonymous mutations in a population genetics dataset. For some genes, the base reported in the "Reference nucleotide" column in the output table did not correspond to the base in the reference genome (and the REF allele in a VCF called using the same reference), which confused me until I realized that the bases reported by degenotate are the reverse-complement of the reference genome. Here is an exemple:
$ head Degeneracy_ASIP/degeneracy-all-sites.bed
chr20 2283028 2283029 rna-XM_038159009.1:419 2 A * T:Y;C:Y
chr20 2283029 2283030 rna-XM_038159009.1:418 2 A * T:L;C:S
chr20 2283030 2283031 rna-XM_038159009.1:417 0 T * A:K;C:Q;G:E
chr20 2283031 2283032 rna-XM_038159009.1:416 2 T C A:*;G:W
chr20 2283032 2283033 rna-XM_038159009.1:415 0 G C A:Y;T:F;C:S
chr20 2283033 2283034 rna-XM_038159009.1:414 0 T C A:S;C:R;G:G
chr20 2283034 2283035 rna-XM_038159009.1:413 2 G K T:N;C:N
chr20 2283035 2283036 rna-XM_038159009.1:412 0 A K T:M;C:T;G:R
chr20 2283036 2283037 rna-XM_038159009.1:411 0 A K T:*;C:Q;G:E
chr20 2283037 2283038 rna-XM_038159009.1:410 4 C P
$ samtools faidx ref.fasta chr20:2283029-2283038
>chr20:2283029-2283038
TTAACACTTG
This is not always the case, as for other genes the output of degenotate matches the reference genome:
$ head Degeneracy_BCO2/degeneracy-all-sites.bed
chr24 5872641 5872642 rna-XM_038161555.1:0 0 A M T:L;C:L;G:V
chr24 5872642 5872643 rna-XM_038161555.1:1 0 T M A:K;C:T;G:R
chr24 5872643 5872644 rna-XM_038161555.1:2 0 G M A:I;T:I;C:I
chr24 5872644 5872645 rna-XM_038161555.1:3 0 A M T:L;C:L;G:V
chr24 5872645 5872646 rna-XM_038161555.1:4 0 T M A:K;C:T;G:R
chr24 5872646 5872647 rna-XM_038161555.1:5 0 G M A:I;T:I;C:I
chr24 5872647 5872648 rna-XM_038161555.1:6 2 A R T:*;G:G
chr24 5872648 5872649 rna-XM_038161555.1:7 0 G R A:K;T:I;C:T
chr24 5872649 5872650 rna-XM_038161555.1:8 2 A R T:S;C:S
chr24 5872650 5872651 rna-XM_038161555.1:9 0 G G A:S;T:C;C:R
$ samtools faidx ref.fasta chr24:5872642-5872651
>chr24:5872642-5872651
ATGATGAGAG
Is this a normal behaviour? And if so, what is the reason?
Sorry if this is an obvious question. I could not find an answer in the documentation.
Best,
Loïs Rancilhac
Hello,
I am using degenotate to do per-site degeneracy annotation for several genes in a reference genome, in order to search for non-synonymous mutations in a population genetics dataset. For some genes, the base reported in the "Reference nucleotide" column in the output table did not correspond to the base in the reference genome (and the REF allele in a VCF called using the same reference), which confused me until I realized that the bases reported by degenotate are the reverse-complement of the reference genome. Here is an exemple:
This is not always the case, as for other genes the output of degenotate matches the reference genome:
Is this a normal behaviour? And if so, what is the reason?
Sorry if this is an obvious question. I could not find an answer in the documentation.
Best,
Loïs Rancilhac