-
Notifications
You must be signed in to change notification settings - Fork 10
Substitutions
Substitution - a substitution of some reference sequence region with another sequence of the same exact length not present anywhere in the reference genome. SNPs can be considered as a subcategory of substitutions.
Figure 1: Substitution example
If a substitution difference has caused alignment fragmentation, it is output in the query_struct.gff and ref_struct.gff files, otherwise it is output in the query_snps.gff and ref_snps.gff files.
An example with the substitution entries in the query_snps.gff file:
##gff-version 3
##sequence-region query_1 1 57855
query_1 NucDiff_v2.0 SO:1000002 501 505 . . . ID=SNP_1;Name=substitution;subst_len=5;query_dir=1;ref_sequence=ref_1;ref_coord=501-505;query_bases=ttgcg;ref_bases=gcctt;color=#42C042
query_1 NucDiff_v2.0 SO:1000002 9786 9786 . . . ID=SNP_2;Name=substitution;subst_len=1;query_dir=1;ref_sequence=ref_1;ref_coord=8579-8579;query_bases=a;ref_bases=c;color=#42C042
The query_snps.gff file contains the following information (see Figure 1 for notations):
| GFF3 fields | Content | Notes |
|---|---|---|
| col 1 | Query_seq | |
| col 2 | NucDiff_v2.0 | name and current version of the tool |
| col 3 | SO:1000002 | Sequence Ontology accession number corresponding to the "substitution" SO term |
| col 4 | St_q | |
| col 5 | End_q | |
| col 6/col 7/col8 | . | score/strand/phase fields are not used |
| col 9, ID | "SNP_1" | ID in query_snps.gff is equal to ID in ref_snps.gff |
| col 9, Name | "substitution" | |
| col 9, subst_len | Length(Substitution) | |
| col 9, query_dir | "1" or "-1" | -1 if the substituted fragment should be reverse complemented before its insertion to a Ref_seq |
| col 9, ref_sequence | Ref_seq | |
| col 9, ref_coord | St_r - End_r | |
| col 9, query_bases | ATGC's | |
| col 9, ref_bases | ATGC's | the subsequence is reverse complemented if the query_dir value is equal to -1 |
An example with the substitution entries in ref_snps.gff :
##gff-version 3
##sequence-region ref_1 1 57855
ref_1 NucDiff_v2.0 SO:1000002 3516 3519 . . . ID=SV_1;Name=substitution;subst_len=4;query_dir=1;query_sequence=query_1;query_coord=3616-3619;color=#42C042
ref_1 NucDiff_v2.0 SO:1000002 13718 13745 . . . ID=SV_2;Name=substitution;subst_len=28;query_dir=1;query_sequence=query_1;query_coord=15633-15660;color=#42C042
The ref_snps.gff file contains the following information (see Figure 1 for notations):
| GFF3 fields | Content | Notes |
|---|---|---|
| col 1 | Ref_seq | |
| col 2 | NucDiff_v2.0 | name and current version of the tool |
| col 3 | SO:1000002 | Sequence Ontology accession number corresponding to the "substitution" SO term |
| col 4 | St_r | |
| col 5 | End_r | |
| col 6/col 7/col8 | . | score/strand/phase fields are not used |
| col 9, ID | "SNP_1" | ID in ref_snps.gff is equal to ID in query_snps.gff |
| col 9, Name | "substitution" | |
| col 9, subst_len | Length(Substitution) | |
| col 9, query_dir | "1" or "-1" | -1 if the substituted fragment should be reverse complemented before its insertion to a Ref_seq |
| col 9, query_sequence | Query_seq | |
| col 9, query_coord | St_q - End_q | |
| col 9, query_bases | ATGC's | the subsequence is reverse complemented if the query_dir value is equal to -1 |
| col 9, ref_bases | ATGC's |
An example with the substitution entries in query_struct.gff :
##gff-version 3
##sequence-region query_1 1 57855
query_1 NucDiff_v2.0 SO:1000002 52127 52205 . . . ID=SV_1;Name=substitution;subst_len=79;query_dir=1;ref_sequence=ref_1;ref_coord=51357-51435;color=#42C042
query_1 NucDiff_v2.0 SO:1000002 53207 53355 . . . ID=SV_2;Name=substitution;subst_len=149;query_dir=1;ref_sequence=ref_1;ref_coord=52757-52905;color=#42C042
query_1 NucDiff_v2.0 SO:1000002 55606 55955 . . . ID=SV_3;Name=substitution;subst_len=350;query_dir=1;ref_sequence=ref_1;ref_coord=55556-55905;col
The query_struct.gff file contains the following information (see Figure 1 for notations):
| GFF3 fields | Content | Notes |
|---|---|---|
| col 1 | Query_seq | |
| col 2 | NucDiff_v2.0 | name and current version of the tool |
| col 3 | SO:1000002 | Sequence Ontology accession number corresponding to the "substitution" SO term |
| col 4 | St_q | |
| col 5 | End_q | |
| col 6/col 7/col8 | . | score/strand/phase fields are not used |
| col 9, ID | "SV_1" | ID in query_struct.gff is equal to ID in ref_struct.gff |
| col 9, Name | "substitution" | |
| col 9, subst_len | Length(Substitution) | |
| col 9, query_dir | "1" or "-1" | -1 if the substituted fragment should be reverse complemented before its insertion to a Ref_seq |
| col 9, ref_sequence | Ref_seq | |
| col 9, ref_coord | St_r - End_r |
An example with the substitution entries in ref_struct.gff :
##gff-version 3
##sequence-region ref_1 1 57855
ref_1 NucDiff_v2.0 SO:1000002 51357 51435 . . . ID=SV_1;Name=substitution;subst_len=79;query_dir=1;query_sequence=query_1;query_coord=52127-52205;color=#42C042
ref_1 NucDiff_v2.0 SO:1000002 52757 52905 . . . ID=SV_2;Name=substitution;subst_len=149;query_dir=1;query_sequence=query_1;query_coord=53207-53355;color=#42C042
ref_1 NucDiff_v2.0 SO:1000002 55556 55905 . . . ID=SV_3;Name=substitution;subst_len=350;query_dir=1;query_sequence=query_1;query_coord=55606-55955;color=#42C042
The ref_struct.gff file contains the following information (see Figure 1 for notations):
| GFF3 fields | Content | Notes |
|---|---|---|
| col 1 | Ref_seq | |
| col 2 | NucDiff_v2.0 | name and current version of the tool |
| col 3 | SO:1000002 | Sequence Ontology accession number corresponding to the "substitution" SO term |
| col 4 | St_r | |
| col 5 | End_r | |
| col 6/col 7/col8 | . | score/strand/phase fields are not used |
| col 9, ID | "SV_1" | ID in ref_struct.gff is equal to ID in query_struct.gff |
| col 9, Name | "substitution" | |
| col 9, subst_len | Length(Substitution) | |
| col 9, query_dir | "1" or "-1" | -1 if the substituted fragment should be reverse complemented before its insertion to a Ref_seq |
| col 9, query_sequence | Query_seq | |
| col 9, query_coord | St_q - End_q |