Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Structural categories not coordinated with isoform IDs #265

Open
evayfang2019 opened this issue Dec 6, 2024 · 4 comments
Open

Structural categories not coordinated with isoform IDs #265

evayfang2019 opened this issue Dec 6, 2024 · 4 comments
Labels
weird results Something looks odd in the resulting files

Comments

@evayfang2019
Copy link

Hello! Thank you for developing this useful tool!

The output Samples.novel_vs_known.SQANTI-like.tsv looks strange. Structural categories are not coordinated with isoform IDs.
e.g.
transcript228.chr1.nnic: structural_category is full_splice_match.
transcript2.KI270744.1.nnic KI270744.1: structural_category is intergenic.

                   isoform chrom strand length exons  structural_category    associated_gene associated_transcript ref_length ref_exons diff_to_TSS diff_to_TTS
1   transcript11.chr1.nnic  chr1      -   2407    11 novel_not_in_catalog  ENSG00000227232.5     ENST00000488147.1       1351        11        -480           0
2   transcript47.chr1.nnic  chr1      -   2374    10 novel_not_in_catalog  ENSG00000279457.4     ENST00000623083.4       1397        10        -792        -297
3   transcript53.chr1.nnic  chr1      -   2079     9 novel_not_in_catalog  ENSG00000227232.5     ENST00000488147.1       1351        11        -442           0
4    transcript74.chr1.nic  chr1      -   4749     8 novel_not_in_catalog  ENSG00000279457.4     ENST00000623083.4       1397        10         149        -297
5  transcript114.chr1.nnic  chr1      -   1553     5 novel_not_in_catalog  ENSG00000243485.5     ENST00000473358.1        712         3         226       -1056
6   transcript173.chr1.nic  chr1      +   1808     7 novel_not_in_catalog ENSG00000228794.10     ENST00000666741.1       5483         8         -21        3696
7  transcript182.chr1.nnic  chr1      +   6731     7     novel_in_catalog ENSG00000228794.10     ENST00000445118.7       6616         5         -11           0
8   transcript215.chr1.nic  chr1      +   6833     6     novel_in_catalog ENSG00000228794.10     ENST00000445118.7       6616         5         -64           0
9  transcript228.chr1.nnic  chr1      +   6575     6    full_splice_match ENSG00000228794.10     ENST00000445118.7       6616         5          -8           0
10 transcript300.chr1.nnic  chr1      +   1556     2 novel_not_in_catalog ENSG00000228794.10     ENST00000666741.1       5483         8        1680        2222


                           isoform      chrom strand length exons  structural_category   associated_gene associated_transcript ref_length ref_exons diff_to_TSS diff_to_TTS
18600  transcript2.KI270721.1.nnic KI270721.1      +   2507     6 novel_not_in_catalog ENSG00000276345.1     ENST00000612848.1        740         5         108           7
18601   transcript9.KI270721.1.nic KI270721.1      +   4053     4     novel_in_catalog ENSG00000276345.1     ENST00000612848.1        740         5          80           7
18602 transcript20.KI270734.1.nnic KI270734.1      -   1939    12 novel_not_in_catalog ENSG00000277196.4     ENST00000615165.1       1990        14          62           0
18603 transcript30.KI270734.1.nnic KI270734.1      -   1901    10 novel_not_in_catalog ENSG00000277196.4     ENST00000621424.4       2405        15         164           0
18604 transcript52.KI270734.1.nnic KI270734.1      +   1397     4 novel_not_in_catalog ENSG00000278817.1     ENST00000613204.1       1213         5          76           0
18605  transcript2.KI270744.1.nnic KI270744.1      -   2493     4           intergenic              <NA>                  <NA>         NA        NA          NA          NA
version: IsoQuant 3.6.1

I look through the issues#202, but a similar issue still occurs.
Pls help and improve!

Best
Yiwei

@andrewprzh
Copy link
Collaborator

Dear @evayfang2019

transcript2.KI270744.1.nnic KI270744.1: structural_category is intergenic.

This one is expected, since IsoQuant adds either nic (novel in catalog) or nnic (novel not in catalog) to transcript ids.
Intergenic is certainly novel not in catalog, as the transcript probably belongs to a new gene found in the intergenic region.

transcript228.chr1.nnic: structural_category is full_splice_match.

This is odd, could you send me coordinates of this transcript and its exons?

Best
Andrey

@andrewprzh andrewprzh added the weird results Something looks odd in the resulting files label Dec 11, 2024
@evayfang2019
Copy link
Author

Dear Andrey

Thank you for your reply!

chr1	IsoQuant	transcript	827590	859446	.	+	.	gene_id "ENSG00000228794.10"; transcript_id "transcript228.chr1.nnic"; similar_reference_id "ENST00000445118.7"; alternatives "fsm,tss_match:8,exon_elongation_5:8,tes_match_precise:0,correct_polya_site_right:859446"; Canonical "False"; exons "6";
chr1	IsoQuant	exon	827590	827775	.	+	.	gene_id "ENSG00000228794.10"; transcript_id "transcript228.chr1.nnic"; exon_number "1"; exon_id "16"; 
chr1	IsoQuant	exon	829003	829104	.	+	.	gene_id "ENSG00000228794.10"; transcript_id "transcript228.chr1.nnic"; exon_number "2"; exon_id "ENSE00001656290.1"; 
chr1	IsoQuant	exon	851927	852110	.	+	.	gene_id "ENSG00000228794.10"; transcript_id "transcript228.chr1.nnic"; exon_number "3"; exon_id "ENSE00001678509.1"; 
chr1	IsoQuant	exon	852671	852766	.	+	.	gene_id "ENSG00000228794.10"; transcript_id "transcript228.chr1.nnic"; exon_number "4"; exon_id "ENSE00001778823.1"; 
chr1	IsoQuant	exon	853391	853438	.	+	.	gene_id "ENSG00000228794.10"; transcript_id "transcript228.chr1.nnic"; exon_number "5"; exon_id "chr1.13"; 
chr1	IsoQuant	exon	853488	859446	.	+	.	gene_id "ENSG00000228794.10"; transcript_id "transcript228.chr1.nnic"; exon_number "6"; exon_id "chr1.14"; 

image

The gene gtf file: ENSG00000228794.10.zip

Best
Yiwei

@andrewprzh
Copy link
Collaborator

Dear @evayfang2019

The novel transcript contains one more exon compared to the reference one. The novel intron coordinates are 853438-853488. So it is a legit novel transcript.

So the problem is why it is classified as FSM. I suspect there is something going on with short introns <=50bp, and this one is precisely 50bp. I'll double check the assignment procedure and get back to you.

Could you also send me the isoquant.log file so I know all parameters of the run?

Best
Andrey

@evayfang2019
Copy link
Author

Thank you!

isoquant.log

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
weird results Something looks odd in the resulting files
Projects
None yet
Development

No branches or pull requests

2 participants