Skip to content

Degeneracy per site: Degenotate sometimes reports the reverse-complement of the base in the reference genome #56

Description

@rancilhac

Hello,

I am using degenotate to do per-site degeneracy annotation for several genes in a reference genome, in order to search for non-synonymous mutations in a population genetics dataset. For some genes, the base reported in the "Reference nucleotide" column in the output table did not correspond to the base in the reference genome (and the REF allele in a VCF called using the same reference), which confused me until I realized that the bases reported by degenotate are the reverse-complement of the reference genome. Here is an exemple:

$ head Degeneracy_ASIP/degeneracy-all-sites.bed
chr20   2283028 2283029 rna-XM_038159009.1:419  2       A       *       T:Y;C:Y
chr20   2283029 2283030 rna-XM_038159009.1:418  2       A       *       T:L;C:S
chr20   2283030 2283031 rna-XM_038159009.1:417  0       T       *       A:K;C:Q;G:E
chr20   2283031 2283032 rna-XM_038159009.1:416  2       T       C       A:*;G:W
chr20   2283032 2283033 rna-XM_038159009.1:415  0       G       C       A:Y;T:F;C:S
chr20   2283033 2283034 rna-XM_038159009.1:414  0       T       C       A:S;C:R;G:G
chr20   2283034 2283035 rna-XM_038159009.1:413  2       G       K       T:N;C:N
chr20   2283035 2283036 rna-XM_038159009.1:412  0       A       K       T:M;C:T;G:R
chr20   2283036 2283037 rna-XM_038159009.1:411  0       A       K       T:*;C:Q;G:E
chr20   2283037 2283038 rna-XM_038159009.1:410  4       C       P

$ samtools faidx ref.fasta chr20:2283029-2283038
>chr20:2283029-2283038
TTAACACTTG

This is not always the case, as for other genes the output of degenotate matches the reference genome:

$ head Degeneracy_BCO2/degeneracy-all-sites.bed
chr24   5872641 5872642 rna-XM_038161555.1:0    0       A       M       T:L;C:L;G:V
chr24   5872642 5872643 rna-XM_038161555.1:1    0       T       M       A:K;C:T;G:R
chr24   5872643 5872644 rna-XM_038161555.1:2    0       G       M       A:I;T:I;C:I
chr24   5872644 5872645 rna-XM_038161555.1:3    0       A       M       T:L;C:L;G:V
chr24   5872645 5872646 rna-XM_038161555.1:4    0       T       M       A:K;C:T;G:R
chr24   5872646 5872647 rna-XM_038161555.1:5    0       G       M       A:I;T:I;C:I
chr24   5872647 5872648 rna-XM_038161555.1:6    2       A       R       T:*;G:G
chr24   5872648 5872649 rna-XM_038161555.1:7    0       G       R       A:K;T:I;C:T
chr24   5872649 5872650 rna-XM_038161555.1:8    2       A       R       T:S;C:S
chr24   5872650 5872651 rna-XM_038161555.1:9    0       G       G       A:S;T:C;C:R

$ samtools faidx ref.fasta chr24:5872642-5872651
>chr24:5872642-5872651
ATGATGAGAG

Is this a normal behaviour? And if so, what is the reason?
Sorry if this is an obvious question. I could not find an answer in the documentation.

Best,
Loïs Rancilhac

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions