Aligning targets against functional genes only #1891
-
Hello, I'm using MiXCR to determine clones from some human TCR-sequencing data using V primers. I'm currently using the built-in set of references libraries included in MiXCR. All works well, but some of the clones returned by mixcr contain pseudogenes or ORF genes (e.g. TRBV3-2). In this example a very similar sequence exists, corresponding to TRBV3-1 that is functional and I would expect that the true clone should be TRBV3-1 not TRBV3-2 for example. Is there any way to tell that only functional genes should be considered? I guess that this should be a parameter during the alignment step, to remove all non-functional genes from this alignment, but I didn't find any parameter for this. I saw that in the IMGT library (https://github.com/repseqio/library-imgt/releases) there is a descriptor "isFunctional" for each gene, which probably also exists in the built-in MiXCR references? But I don't know how I could make use of this? If this feature is not implemented, I could of course build my own reference library removing the non-functional genes, but that may not be the best solution. Thank you very much! Regards, Julien |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 1 reply
-
Hi Julien, Unfortunately, we do not have this feature, and the correct approach would be to create a custom library from a subset of genes, as you mentioned. |
Beta Was this translation helpful? Give feedback.
Hi Julien,
I would first recommend checking the actual sequence of the assembled clone. V regions are very similar, and it is very possible that the primer you designed for TRBV3-1 anneals to TRBV3-2, which may actually exist in the data—unless it is a synthetic library where such genes do not exist. MiXCR calls the genes by aligning to the full reference and if you see that TRBV3-2 has a higher score than TRBV3-1 it is very likely you actually have it in the sample, which is a common thing.
Unfortunately, we do not have this feature, and the correct approach would be to create a custom library from a subset of genes, as you mentioned.