Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Order of input contigs influences the results of combineTCR #293

Closed
michael-kotliar opened this issue Dec 20, 2023 · 4 comments
Closed

Order of input contigs influences the results of combineTCR #293

michael-kotliar opened this issue Dec 20, 2023 · 4 comments

Comments

@michael-kotliar
Copy link

removeNA parameter (I use v1.11.0 installed from GitHub) doesn't work as expected. combineTCR function returns different results depending on the order of the contigs in the input file. We don't see big difference unless we set removeNA=TRUE and/or filterMulti=TRUE.

For example, for input where TRB is before TRA

barcode			is_cell	contig_id			high_confidence	length	chain	v_gene	d_gene	j_gene	c_gene	full_length	productive
AACTGGTCATTAGCCA-1	true	AACTGGTCATTAGCCA-1_contig_1	true		499 	TRB	TRBV9	TRBD1	TRBJ2-1	TRBC2	true		true
AACTGGTCATTAGCCA-1	true	AACTGGTCATTAGCCA-1_contig_2	true		563	TRA	TRAV22		TRAJ17	TRAC	true		true
AACTGGTCATTAGCCA-1	true	AACTGGTCATTAGCCA-1_contig_3	true		507	TRB	TRBV6-4		TRBJ2-7	TRBC2	true		true

the output of combineTCR function will be

barcode			is_cell		contig_id			high_confidence		length	chain	v_gene	d_gene	j_gene	c_gene	full_length	productive
AACTGGTCATTAGCCA-1	true		AACTGGTCATTAGCCA-1_contig_1	true			499	TRB	TRBV9	TRBD1	TRBJ2-1	TRBC2	true		true 

But for input where TRA is before TRB (this also happens when we set filterMulti=TRUE, because in addition to deletion of one TRB we group by barcode and chain)

barcode			is_cell	contig_id			high_confidence	length	chain	v_gene	d_gene	j_gene	c_gene	full_length	productive
AACTGGTCATTAGCCA-1	true	AACTGGTCATTAGCCA-1_contig_2	true		563	TRA	TRAV22		TRAJ17	TRAC	true		true
AACTGGTCATTAGCCA-1	true	AACTGGTCATTAGCCA-1_contig_1	true		499 	TRB	TRBV9	TRBD1	TRBJ2-1	TRBC2	true		true
AACTGGTCATTAGCCA-1	true	AACTGGTCATTAGCCA-1_contig_3	true		507	TRB	TRBV6-4		TRBJ2-7	TRBC2	true		true

The output of combineTCR function will include <NA> for d_gene. Thus, this line will be removed when we set removeNA=TRUE, because instead of checking only TCR1 and TCR2 columns we use na.omit for all columns here.

barcode			is_cell		contig_id			high_confidence	length	chain	v_gene	d_gene	j_gene	c_gene	full_length	productive
AACTGGTCATTAGCCA-1	true		AACTGGTCATTAGCCA-1_contig_2	true		563	TRA	TRAV22	<NA>	TRAJ17	TRAC	true		true
@michael-kotliar michael-kotliar changed the title Order of contigs influences the results of combineTCR Order of input contigs influences the results of combineTCR Dec 20, 2023
@ncborcherding
Copy link
Member

Hey Michael,

Thanks for outlining the problem above - is there a difference in the clone designations i.e. headers CTaa, CTnt, CTstrict, CTgene?

The headers/columns you include above are actually ignored after combineTCR().

Nick

@michael-kotliar
Copy link
Author

michael-kotliar commented Dec 21, 2023

Hi Nick,

I think all the other columns look ok, but because d_gene has NA, certain cells will be mistakenly removed when setting removeNA=TRUE, and almost all cells will be removed when combining removeNA=TRUE and filterMulti=TRUE. This happens because for filterMulti we group by barcode and chain, making the order of chains to be first TRA then TRB, and as a result d_gene will be NA for all cells except those which have only TRB chain.

@ncborcherding
Copy link
Member

Hey Michael,

Apologies for the confusion - I am knee deep working on v2 of scRepertoire (the current main branch).

You may be right that there is unintended removal of contigs using removeNA=TRUE. In scRepertoire v2, combineTCR() output is:

Screenshot 2023-12-21 at 9 17 05 AM

Everything is concatenated and the individual gene calls are not retained, so this should no longer be an issue.

Nick

@michael-kotliar
Copy link
Author

Great! Then I'll start using the latest version from the main branch.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants