How to choose the best match preset for a group of data like this? #1895

linqy-immune · 2025-01-26T05:28:51Z

linqy-immune
Jan 26, 2025

Checklist before submitting the issue:

The issue is strongly related to the MiXCR software
The issue can be reproduced with the most recent version of MiXCR
There is no answer to the question in the official documentation and there is no duplicate issue in the bug tracker
Inspection of raw alignments with exportAlignmentsPretty shows that data has the expected architecture, and sample preparation artefacts are not the reason of the problem (if this is the matter of the issue)

Hello! Recently I attempt to analyze a group of fastq data downloaded from NCBI by mixcr, but stuck in the first step as I ran the code"java -Xmx4g -Xms3g -jar C:\mixcr\mixcr.jar align -s hs -p abhelix-human-rna-xcr -f "D:\SRR14516117.fastq.gz" alignments.vdjca", the export result was "Alignment: 0%
Alignment: 100% ETA: 00:00:00
====================== report: align ======================
Analysis time: 2.93s
Total sequencing reads: 6764
Successfully aligned reads: 1364 (20.17%)
Coverage (percent of successfully aligned):
CDR3: 1363 (99.93%)
FR3_TO_FR4: 0 (0%)
CDR2_TO_FR4: 0 (0%)
FR2_TO_FR4: 0 (0%)
CDR1_TO_FR4: 0 (0%)
VDJRegion: 0 (0%)
Alignment failed: no hits (not TCR/IG?): 4562 (67.45%)
Alignment failed: absence of V hits: 273 (4.04%)
Alignment failed: absence of J hits: 565 (8.35%)
Overlapped: 0 (0%)
Overlapped and aligned: 0 (0%)
Overlapped and not aligned: 0 (0%)
Alignment-aided overlaps, percent of overlapped and aligned: 0 (NaN%)
Partial aligned reads, percent of successfully aligned: 1 (0.07%)
Realigned with forced non-floating bound: 0 (0%)
Realigned with forced non-floating right bound in left read: 0 (0%)
Realigned with forced non-floating left bound in right read: 0 (0%)
TRB chains: 1364 (100%)
TRB non-functional: 27 (1.98%)
Trimming report:
R1 reads trimmed left: 0 (0%)
R1 reads trimmed right: 0 (0%)
Average R1 nucleotides trimmed left: 0.0
Average R1 nucleotides trimmed right: 0.0"
Almost all the presets available in mixcr I have tried, but the alignment successful rate was still 0.
Then I looked forward to the original article, the data source was arranged by the followed information from the original article,
"High-throughput sequencing of TCRB:
Analysis of the TCRB CDR3 regions of gastric cancer patients was performed on cryopreserved samples. Briefly, total RNA was extracted from 600 mL peripheral blood, 200 mg frozen tumor, or 200 mg healthy mucosal tissue samples using the RNeasy Mini Kit (QIAGEN) and converted to cDNA (RevertAid First Strand cDNA Synthesis Kit; Fermentas) with a constant region-specific primer (RT primer: 50-ATCTCTGCTTCT-GATGGCTCA-30). A multiplex PCR system was introduced to amplify the CDR3 region of rearranged TCRB loci. A set of forward primers, each specific to one or a set of functional TCR V
b segments, and a reverse primer specific to the constant region of TCRB, were used to generate amplicons that cover the entire CDR3 region. PCR products were loaded on 3% agarose gels(Sigma–Aldrich), and bands centered at »220–240 bp were excised and purified using the QIAquick Gel Extraction kit (QIAGEN). Purified products were sequenced using the Ion
Torrent PGM platform (Life Technologies).
Processing of raw reads:
Ion Torrent Suite software filters were used for data pre-processing to exclude low quality reads and erroneous sequences derived from unrecognized multiplex barcodes. Raw sequence data were converted to FASTQ format using an Ion Torrent PGM built-in plugin. The resulting FASTQ files were imported to the MATLAB software. The TCRB CDR3 region was identified according to the International ImMunoGeneTics (IMGT) collaboration, beginning with the second conserved cysteine encoded by the 30 portion of the V b gene segment and ending with the conserved phenylalanine encoded by the 50 portion of the J b gene segment. The number ofNT s between these codons determines the length of the CDR3 region. A manual algorithm was used to identify which V and J segments contributed to each TCRB CDR3 sequence. Sequences with lengths shorter than 110 bp, an average Phred quality score < 25, minimum Phred score < 20, or those with no exact match to the TRBC constant region primer were discarded. In addition, sequences with out-of-frame rearrangements, ambiguous V- and J-b segment alignment, V-b segment pseudogenes, or a CDR3 AA junction lacking a 50 cysteine or 30 phenylalanine were discarded. Resulting sequences were further analyzed using MATLAB 2013b (Math-Works) via manual scripts, and graphed using Excel (Office 2013, Microsoft) and Prism 5 software (GraphPad)."
I am a big fan of mixcr and really wanna analyze the data by mixcr but not other softwares. Could you please give me some advice about this or whether I can use mixcr for this group of data for a satisfactory result. Thank you soooo much!!!

mizraelson · 2025-01-28T03:25:31Z

mizraelson
Jan 28, 2025
Collaborator

I would use the command bellow:

mixcr analyze generic-amplicon \
    --species hsa \
    --rna \
    --floating-left-alignment-boundary \
    --floating-right-alignment-boundary C \
      input_R1.fastq.gz \
      result

However, from what I can see, the data is very noisy, and there are a lot of reads that do not cover the TCRB region.

0 replies

linqy-immune · 2025-01-30T03:08:40Z

linqy-immune
Jan 30, 2025
Author

OK. I have given it a try and it seems helpful. Thank you soooo much!!!

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to choose the best match preset for a group of data like this? #1895

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 2 comments

{{title}}

{{title}}

Select a reply

How to choose the best match preset for a group of data like this? #1895

linqy-immune Jan 26, 2025

Checklist before submitting the issue:

Replies: 2 comments

mizraelson Jan 28, 2025 Collaborator

linqy-immune Jan 30, 2025 Author

linqy-immune
Jan 26, 2025

mizraelson
Jan 28, 2025
Collaborator

linqy-immune
Jan 30, 2025
Author