Skip to content
kseniakh edited this page Mar 10, 2017 · 1 revision

Gaps

Gap - a substitution where a reference subsequence is replaced by an unknown sequence (N's) of the same length. If the query has an enlarged gap, then this will be classified as a combination of a gap and an inserted gap, while a shortened gap is classified as a gap and a simple deletion.



Figure 1: Gap example



If a gap difference has caused alignment fragmentation, it is output in the query_struct.gff and ref_struct.gff files, otherwise it is output in the query_snps.gff and ref_snps.gff files.



An example with the gap entries in the query_snps.gff file:

##gff-version 3
##sequence-region	query_1	1	57855
query_1	NucDiff_v2.0	SO:1000002	17031	17035	.	.	.	ID=SNP_12;Name=gap;subst_len=5;query_dir=1;ref_sequence=ref_1;ref_coord=14821-14825;query_bases=NNNNN;ref_bases=tgcga;color=#42C042
query_1	NucDiff_v2.0	SO:1000002	18036	18065	.	.	.	ID=SNP_14;Name=gap;subst_len=30;query_dir=1;ref_sequence=ref_1;ref_coord=15876-15905;query_bases=NNNNNNNNNNNNNNNNNNNNNNNNNNNNNN;ref_bases=actaactatgcgataatgcctagaacttat;color=#42C042



The query_snps.gff file contains the following information (see Figure 1 for notations):

GFF3 fields Content Notes
col 1 Query_seq
col 2 NucDiff_v2.0 name and current version of the tool
col 3 SO:1000002 Sequence Ontology accession number corresponding to the "substitution" SO term
col 4 St_q
col 5 End_q
col 6/col 7/col8 . score/strand/phase fields are not used
col 9, ID "SNP_1" ID in query_snps.gff is equal to ID in ref_snps.gff
col 9, Name "gap"
col 9, subst_len Length(Gap)
col 9, query_dir "1" or "-1" -1 if the substituted fragment should be reverse complemented before its insertion to a Ref_seq
col 9, ref_sequence Ref_seq
col 9, ref_coord St_r - End_r
col 9, query_bases N's
col 9, ref_bases ATGC's the subsequence is reverse complemented if the query_dir value is equal to -1



An example with the gap entries in ref_snps.gff :

##gff-version 3
##sequence-region	ref_1	1	57855
ref_1	NucDiff_v2.0	SO:1000002	14821	14825	.	.	.	ID=SNP_12;Name=gap;subst_len=5;query_dir=1;query_sequence=query_1;query_coord=17031-17035;query_bases=NNNNN;ref_bases=tgcga;color=#42C042
ref_1	NucDiff_v2.0	SO:1000002	15876	15905	.	.	.	ID=SNP_14;Name=gap;subst_len=30;query_dir=1;query_sequence=query_1;query_coord=18036-18065;query_bases=NNNNNNNNNNNNNNNNNNNNNNNNNNNNNN;ref_bases=actaactatgcgataatgcctagaacttat;color=#42C042



The ref_snps.gff file contains the following information (see Figure 1 for notations):

GFF3 fields Content Notes
col 1 Ref_seq
col 2 NucDiff_v2.0 name and current version of the tool
col 3 SO:1000002 Sequence Ontology accession number corresponding to the "substitution" SO term
col 4 St_r
col 5 End_r
col 6/col 7/col8 . score/strand/phase fields are not used
col 9, ID "SNP_1" ID in ref_snps.gff is equal to ID in query_snps.gff
col 9, Name "gap"
col 9, subst_len Length(Gap)
col 9, query_dir "1" or "-1" -1 if the substituted fragment should be reverse complemented before its insertion to a Ref_seq
col 9, query_sequence Query_seq
col 9, query_coord St_q - End_q
col 9, query_bases N's the subsequence is reverse complemented if the query_dir value is equal to -1
col 9, ref_bases ATGC's



An example with the gap entries in query_struct.gff :

##gff-version 3
##sequence-region	query_1	1	57855
query_1	NucDiff_v2.0	SO:1000002	54356	54605	.	.	.	ID=SV_1;Name=gap;subst_len=250;query_dir=1;ref_sequence=ref_1;ref_coord=54156-54405;color=#42C042
query_1	NucDiff_v2.0	SO:1000002	55606	55955	.	.	.	ID=SV_2;Name=gap;subst_len=350;query_dir=1;ref_sequence=ref_1;ref_coord=55556-55905;color=#42C042
query_1	NucDiff_v2.0	SO:1000002	56956	57355	.	.	.	ID=SV_3;Name=gap;subst_len=400;query_dir=1;ref_sequence=ref_1;ref_coord=56956-57355;color=#42C042



The query_struct.gff file contains the following information (see Figure 1 for notations):

GFF3 fields Content Notes
col 1 Query_seq
col 2 NucDiff_v2.0 name and current version of the tool
col 3 SO:1000002 Sequence Ontology accession number corresponding to the "substitution" SO term
col 4 St_q
col 5 End_q
col 6/col 7/col8 . score/strand/phase fields are not used
col 9, ID "SV_1" ID in query_struct.gff is equal to ID in ref_struct.gff
col 9, Name "gap"
col 9, subst_len Length(Gap)
col 9, query_dir "1" or "-1" -1 if the substituted fragment should be reverse complemented before its insertion to a Ref_seq
col 9, ref_sequence Ref_seq
col 9, ref_coord St_r - End_r



An example with the gap entries in ref_struct.gff :

##gff-version 3
##sequence-region	ref_1	1	57855
ref_1	NucDiff_v2.0	SO:1000002	54156	54405	.	.	.	ID=SV_1;Name=gap;subst_len=250;query_dir=1;query_sequence=query_1;query_coord=54356-54605;color=#42C042
ref_1	NucDiff_v2.0	SO:1000002	55556	55905	.	.	.	ID=SV_2;Name=gap;subst_len=350;query_dir=1;query_sequence=query_1;query_coord=55606-55955;color=#42C042
ref_1	NucDiff_v2.0	SO:1000002	56956	57355	.	.	.	ID=SV_3;Name=gap;subst_len=400;query_dir=1;query_sequence=query_1;query_coord=56956-57355;color=#42C042



The ref_struct.gff file contains the following information (see Figure 1 for notations):

GFF3 fields Content Notes
col 1 Ref_seq
col 2 NucDiff_v2.0 name and current version of the tool
col 3 SO:1000002 Sequence Ontology accession number corresponding to the "substitution" SO term
col 4 St_r
col 5 End_r
col 6/col 7/col8 . score/strand/phase fields are not used
col 9, ID "SV_1" ID in ref_struct.gff is equal to ID in query_struct.gff
col 9, Name "gap"
col 9, subst_len Length(Gap)
col 9, query_dir "1" or "-1" -1 if the substituted fragment should be reverse complemented before its insertion to a Ref_seq
col 9, query_sequence Query_seq
col 9, query_coord St_q - End_q

Clone this wiki locally