Skip to content

Substitutions

kseniakh edited this page Mar 10, 2017 · 1 revision

Substitutions

Substitution - a substitution of some reference sequence region with another sequence of the same exact length not present anywhere in the reference genome. SNPs can be considered as a subcategory of substitutions.



Figure 1: Substitution example



If a substitution difference has caused alignment fragmentation, it is output in the query_struct.gff and ref_struct.gff files, otherwise it is output in the query_snps.gff and ref_snps.gff files.



An example with the substitution entries in the query_snps.gff file:

##gff-version 3
##sequence-region	query_1	1	57855
query_1	NucDiff_v2.0	SO:1000002	501	505	.	.	.	ID=SNP_1;Name=substitution;subst_len=5;query_dir=1;ref_sequence=ref_1;ref_coord=501-505;query_bases=ttgcg;ref_bases=gcctt;color=#42C042
query_1	NucDiff_v2.0	SO:1000002	9786	9786	.	.	.	ID=SNP_2;Name=substitution;subst_len=1;query_dir=1;ref_sequence=ref_1;ref_coord=8579-8579;query_bases=a;ref_bases=c;color=#42C042



The query_snps.gff file contains the following information (see Figure 1 for notations):

GFF3 fields Content Notes
col 1 Query_seq
col 2 NucDiff_v2.0 name and current version of the tool
col 3 SO:1000002 Sequence Ontology accession number corresponding to the "substitution" SO term
col 4 St_q
col 5 End_q
col 6/col 7/col8 . score/strand/phase fields are not used
col 9, ID "SNP_1" ID in query_snps.gff is equal to ID in ref_snps.gff
col 9, Name "substitution"
col 9, subst_len Length(Substitution)
col 9, query_dir "1" or "-1" -1 if the substituted fragment should be reverse complemented before its insertion to a Ref_seq
col 9, ref_sequence Ref_seq
col 9, ref_coord St_r - End_r
col 9, query_bases ATGC's
col 9, ref_bases ATGC's the subsequence is reverse complemented if the query_dir value is equal to -1



An example with the substitution entries in ref_snps.gff :

##gff-version 3
##sequence-region	ref_1	1	57855
ref_1	NucDiff_v2.0	SO:1000002	3516	3519	.	.	.	ID=SV_1;Name=substitution;subst_len=4;query_dir=1;query_sequence=query_1;query_coord=3616-3619;color=#42C042
ref_1	NucDiff_v2.0	SO:1000002	13718	13745	.	.	.	ID=SV_2;Name=substitution;subst_len=28;query_dir=1;query_sequence=query_1;query_coord=15633-15660;color=#42C042



The ref_snps.gff file contains the following information (see Figure 1 for notations):

GFF3 fields Content Notes
col 1 Ref_seq
col 2 NucDiff_v2.0 name and current version of the tool
col 3 SO:1000002 Sequence Ontology accession number corresponding to the "substitution" SO term
col 4 St_r
col 5 End_r
col 6/col 7/col8 . score/strand/phase fields are not used
col 9, ID "SNP_1" ID in ref_snps.gff is equal to ID in query_snps.gff
col 9, Name "substitution"
col 9, subst_len Length(Substitution)
col 9, query_dir "1" or "-1" -1 if the substituted fragment should be reverse complemented before its insertion to a Ref_seq
col 9, query_sequence Query_seq
col 9, query_coord St_q - End_q
col 9, query_bases ATGC's the subsequence is reverse complemented if the query_dir value is equal to -1
col 9, ref_bases ATGC's



An example with the substitution entries in query_struct.gff :

##gff-version 3
##sequence-region	query_1	1	57855
query_1	NucDiff_v2.0	SO:1000002	52127	52205	.	.	.	ID=SV_1;Name=substitution;subst_len=79;query_dir=1;ref_sequence=ref_1;ref_coord=51357-51435;color=#42C042
query_1	NucDiff_v2.0	SO:1000002	53207	53355	.	.	.	ID=SV_2;Name=substitution;subst_len=149;query_dir=1;ref_sequence=ref_1;ref_coord=52757-52905;color=#42C042
query_1	NucDiff_v2.0	SO:1000002	55606	55955	.	.	.	ID=SV_3;Name=substitution;subst_len=350;query_dir=1;ref_sequence=ref_1;ref_coord=55556-55905;col



The query_struct.gff file contains the following information (see Figure 1 for notations):

GFF3 fields Content Notes
col 1 Query_seq
col 2 NucDiff_v2.0 name and current version of the tool
col 3 SO:1000002 Sequence Ontology accession number corresponding to the "substitution" SO term
col 4 St_q
col 5 End_q
col 6/col 7/col8 . score/strand/phase fields are not used
col 9, ID "SV_1" ID in query_struct.gff is equal to ID in ref_struct.gff
col 9, Name "substitution"
col 9, subst_len Length(Substitution)
col 9, query_dir "1" or "-1" -1 if the substituted fragment should be reverse complemented before its insertion to a Ref_seq
col 9, ref_sequence Ref_seq
col 9, ref_coord St_r - End_r



An example with the substitution entries in ref_struct.gff :

##gff-version 3
##sequence-region	ref_1	1	57855
ref_1	NucDiff_v2.0	SO:1000002	51357	51435	.	.	.	ID=SV_1;Name=substitution;subst_len=79;query_dir=1;query_sequence=query_1;query_coord=52127-52205;color=#42C042
ref_1	NucDiff_v2.0	SO:1000002	52757	52905	.	.	.	ID=SV_2;Name=substitution;subst_len=149;query_dir=1;query_sequence=query_1;query_coord=53207-53355;color=#42C042
ref_1	NucDiff_v2.0	SO:1000002	55556	55905	.	.	.	ID=SV_3;Name=substitution;subst_len=350;query_dir=1;query_sequence=query_1;query_coord=55606-55955;color=#42C042



The ref_struct.gff file contains the following information (see Figure 1 for notations):

GFF3 fields Content Notes
col 1 Ref_seq
col 2 NucDiff_v2.0 name and current version of the tool
col 3 SO:1000002 Sequence Ontology accession number corresponding to the "substitution" SO term
col 4 St_r
col 5 End_r
col 6/col 7/col8 . score/strand/phase fields are not used
col 9, ID "SV_1" ID in ref_struct.gff is equal to ID in query_struct.gff
col 9, Name "substitution"
col 9, subst_len Length(Substitution)
col 9, query_dir "1" or "-1" -1 if the substituted fragment should be reverse complemented before its insertion to a Ref_seq
col 9, query_sequence Query_seq
col 9, query_coord St_q - End_q

Clone this wiki locally