-
Notifications
You must be signed in to change notification settings - Fork 10
Gaps
Gap - a substitution where a reference subsequence is replaced by an unknown sequence (N's) of the same length. If the query has an enlarged gap, then this will be classified as a combination of a gap and an inserted gap, while a shortened gap is classified as a gap and a simple deletion.
Figure 1: Gap example
If a gap difference has caused alignment fragmentation, it is output in the query_struct.gff and ref_struct.gff files, otherwise it is output in the query_snps.gff and ref_snps.gff files.
An example with the gap entries in the query_snps.gff file:
##gff-version 3
##sequence-region query_1 1 57855
query_1 NucDiff_v2.0 SO:1000002 17031 17035 . . . ID=SNP_12;Name=gap;subst_len=5;query_dir=1;ref_sequence=ref_1;ref_coord=14821-14825;query_bases=NNNNN;ref_bases=tgcga;color=#42C042
query_1 NucDiff_v2.0 SO:1000002 18036 18065 . . . ID=SNP_14;Name=gap;subst_len=30;query_dir=1;ref_sequence=ref_1;ref_coord=15876-15905;query_bases=NNNNNNNNNNNNNNNNNNNNNNNNNNNNNN;ref_bases=actaactatgcgataatgcctagaacttat;color=#42C042
The query_snps.gff file contains the following information (see Figure 1 for notations):
| GFF3 fields | Content | Notes |
|---|---|---|
| col 1 | Query_seq | |
| col 2 | NucDiff_v2.0 | name and current version of the tool |
| col 3 | SO:1000002 | Sequence Ontology accession number corresponding to the "substitution" SO term |
| col 4 | St_q | |
| col 5 | End_q | |
| col 6/col 7/col8 | . | score/strand/phase fields are not used |
| col 9, ID | "SNP_1" | ID in query_snps.gff is equal to ID in ref_snps.gff |
| col 9, Name | "gap" | |
| col 9, subst_len | Length(Gap) | |
| col 9, query_dir | "1" or "-1" | -1 if the substituted fragment should be reverse complemented before its insertion to a Ref_seq |
| col 9, ref_sequence | Ref_seq | |
| col 9, ref_coord | St_r - End_r | |
| col 9, query_bases | N's | |
| col 9, ref_bases | ATGC's | the subsequence is reverse complemented if the query_dir value is equal to -1 |
An example with the gap entries in ref_snps.gff :
##gff-version 3
##sequence-region ref_1 1 57855
ref_1 NucDiff_v2.0 SO:1000002 14821 14825 . . . ID=SNP_12;Name=gap;subst_len=5;query_dir=1;query_sequence=query_1;query_coord=17031-17035;query_bases=NNNNN;ref_bases=tgcga;color=#42C042
ref_1 NucDiff_v2.0 SO:1000002 15876 15905 . . . ID=SNP_14;Name=gap;subst_len=30;query_dir=1;query_sequence=query_1;query_coord=18036-18065;query_bases=NNNNNNNNNNNNNNNNNNNNNNNNNNNNNN;ref_bases=actaactatgcgataatgcctagaacttat;color=#42C042
The ref_snps.gff file contains the following information (see Figure 1 for notations):
| GFF3 fields | Content | Notes |
|---|---|---|
| col 1 | Ref_seq | |
| col 2 | NucDiff_v2.0 | name and current version of the tool |
| col 3 | SO:1000002 | Sequence Ontology accession number corresponding to the "substitution" SO term |
| col 4 | St_r | |
| col 5 | End_r | |
| col 6/col 7/col8 | . | score/strand/phase fields are not used |
| col 9, ID | "SNP_1" | ID in ref_snps.gff is equal to ID in query_snps.gff |
| col 9, Name | "gap" | |
| col 9, subst_len | Length(Gap) | |
| col 9, query_dir | "1" or "-1" | -1 if the substituted fragment should be reverse complemented before its insertion to a Ref_seq |
| col 9, query_sequence | Query_seq | |
| col 9, query_coord | St_q - End_q | |
| col 9, query_bases | N's | the subsequence is reverse complemented if the query_dir value is equal to -1 |
| col 9, ref_bases | ATGC's |
An example with the gap entries in query_struct.gff :
##gff-version 3
##sequence-region query_1 1 57855
query_1 NucDiff_v2.0 SO:1000002 54356 54605 . . . ID=SV_1;Name=gap;subst_len=250;query_dir=1;ref_sequence=ref_1;ref_coord=54156-54405;color=#42C042
query_1 NucDiff_v2.0 SO:1000002 55606 55955 . . . ID=SV_2;Name=gap;subst_len=350;query_dir=1;ref_sequence=ref_1;ref_coord=55556-55905;color=#42C042
query_1 NucDiff_v2.0 SO:1000002 56956 57355 . . . ID=SV_3;Name=gap;subst_len=400;query_dir=1;ref_sequence=ref_1;ref_coord=56956-57355;color=#42C042
The query_struct.gff file contains the following information (see Figure 1 for notations):
| GFF3 fields | Content | Notes |
|---|---|---|
| col 1 | Query_seq | |
| col 2 | NucDiff_v2.0 | name and current version of the tool |
| col 3 | SO:1000002 | Sequence Ontology accession number corresponding to the "substitution" SO term |
| col 4 | St_q | |
| col 5 | End_q | |
| col 6/col 7/col8 | . | score/strand/phase fields are not used |
| col 9, ID | "SV_1" | ID in query_struct.gff is equal to ID in ref_struct.gff |
| col 9, Name | "gap" | |
| col 9, subst_len | Length(Gap) | |
| col 9, query_dir | "1" or "-1" | -1 if the substituted fragment should be reverse complemented before its insertion to a Ref_seq |
| col 9, ref_sequence | Ref_seq | |
| col 9, ref_coord | St_r - End_r |
An example with the gap entries in ref_struct.gff :
##gff-version 3
##sequence-region ref_1 1 57855
ref_1 NucDiff_v2.0 SO:1000002 54156 54405 . . . ID=SV_1;Name=gap;subst_len=250;query_dir=1;query_sequence=query_1;query_coord=54356-54605;color=#42C042
ref_1 NucDiff_v2.0 SO:1000002 55556 55905 . . . ID=SV_2;Name=gap;subst_len=350;query_dir=1;query_sequence=query_1;query_coord=55606-55955;color=#42C042
ref_1 NucDiff_v2.0 SO:1000002 56956 57355 . . . ID=SV_3;Name=gap;subst_len=400;query_dir=1;query_sequence=query_1;query_coord=56956-57355;color=#42C042
The ref_struct.gff file contains the following information (see Figure 1 for notations):
| GFF3 fields | Content | Notes |
|---|---|---|
| col 1 | Ref_seq | |
| col 2 | NucDiff_v2.0 | name and current version of the tool |
| col 3 | SO:1000002 | Sequence Ontology accession number corresponding to the "substitution" SO term |
| col 4 | St_r | |
| col 5 | End_r | |
| col 6/col 7/col8 | . | score/strand/phase fields are not used |
| col 9, ID | "SV_1" | ID in ref_struct.gff is equal to ID in query_struct.gff |
| col 9, Name | "gap" | |
| col 9, subst_len | Length(Gap) | |
| col 9, query_dir | "1" or "-1" | -1 if the substituted fragment should be reverse complemented before its insertion to a Ref_seq |
| col 9, query_sequence | Query_seq | |
| col 9, query_coord | St_q - End_q |