How to choose the best match preset for a group of data like this? #1895
linqy-immune
started this conversation in
General
Replies: 2 comments
-
I would use the command bellow:
However, from what I can see, the data is very noisy, and there are a lot of reads that do not cover the TCRB region. |
Beta Was this translation helpful? Give feedback.
0 replies
-
OK. I have given it a try and it seems helpful. Thank you soooo much!!! |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Checklist before submitting the issue:
Hello! Recently I attempt to analyze a group of fastq data downloaded from NCBI by mixcr, but stuck in the first step as I ran the code"java -Xmx4g -Xms3g -jar C:\mixcr\mixcr.jar align -s hs -p abhelix-human-rna-xcr -f "D:\SRR14516117.fastq.gz" alignments.vdjca", the export result was "Alignment: 0%
Alignment: 100% ETA: 00:00:00
====================== report: align ======================
Analysis time: 2.93s
Total sequencing reads: 6764
Successfully aligned reads: 1364 (20.17%)
Coverage (percent of successfully aligned):
CDR3: 1363 (99.93%)
FR3_TO_FR4: 0 (0%)
CDR2_TO_FR4: 0 (0%)
FR2_TO_FR4: 0 (0%)
CDR1_TO_FR4: 0 (0%)
VDJRegion: 0 (0%)
Alignment failed: no hits (not TCR/IG?): 4562 (67.45%)
Alignment failed: absence of V hits: 273 (4.04%)
Alignment failed: absence of J hits: 565 (8.35%)
Overlapped: 0 (0%)
Overlapped and aligned: 0 (0%)
Overlapped and not aligned: 0 (0%)
Alignment-aided overlaps, percent of overlapped and aligned: 0 (NaN%)
Partial aligned reads, percent of successfully aligned: 1 (0.07%)
Realigned with forced non-floating bound: 0 (0%)
Realigned with forced non-floating right bound in left read: 0 (0%)
Realigned with forced non-floating left bound in right read: 0 (0%)
TRB chains: 1364 (100%)
TRB non-functional: 27 (1.98%)
Trimming report:
R1 reads trimmed left: 0 (0%)
R1 reads trimmed right: 0 (0%)
Average R1 nucleotides trimmed left: 0.0
Average R1 nucleotides trimmed right: 0.0"
Almost all the presets available in mixcr I have tried, but the alignment successful rate was still 0.
Then I looked forward to the original article, the data source was arranged by the followed information from the original article,
"High-throughput sequencing of TCRB:
Analysis of the TCRB CDR3 regions of gastric cancer patients was performed on cryopreserved samples. Briefly, total RNA was extracted from 600 mL peripheral blood, 200 mg frozen tumor, or 200 mg healthy mucosal tissue samples using the RNeasy Mini Kit (QIAGEN) and converted to cDNA (RevertAid First Strand cDNA Synthesis Kit; Fermentas) with a constant region-specific primer (RT primer: 50-ATCTCTGCTTCT-GATGGCTCA-30). A multiplex PCR system was introduced to amplify the CDR3 region of rearranged TCRB loci. A set of forward primers, each specific to one or a set of functional TCR V
b segments, and a reverse primer specific to the constant region of TCRB, were used to generate amplicons that cover the entire CDR3 region. PCR products were loaded on 3% agarose gels(Sigma–Aldrich), and bands centered at »220–240 bp were excised and purified using the QIAquick Gel Extraction kit (QIAGEN). Purified products were sequenced using the Ion
Torrent PGM platform (Life Technologies).
Processing of raw reads:
Ion Torrent Suite software filters were used for data pre-processing to exclude low quality reads and erroneous sequences derived from unrecognized multiplex barcodes. Raw sequence data were converted to FASTQ format using an Ion Torrent PGM built-in plugin. The resulting FASTQ files were imported to the MATLAB software. The TCRB CDR3 region was identified according to the International ImMunoGeneTics (IMGT) collaboration, beginning with the second conserved cysteine encoded by the 30 portion of the V b gene segment and ending with the conserved phenylalanine encoded by the 50 portion of the J b gene segment. The number ofNT s between these codons determines the length of the CDR3 region. A manual algorithm was used to identify which V and J segments contributed to each TCRB CDR3 sequence. Sequences with lengths shorter than 110 bp, an average Phred quality score < 25, minimum Phred score < 20, or those with no exact match to the TRBC constant region primer were discarded. In addition, sequences with out-of-frame rearrangements, ambiguous V- and J-b segment alignment, V-b segment pseudogenes, or a CDR3 AA junction lacking a 50 cysteine or 30 phenylalanine were discarded. Resulting sequences were further analyzed using MATLAB 2013b (Math-Works) via manual scripts, and graphed using Excel (Office 2013, Microsoft) and Prism 5 software (GraphPad)."
I am a big fan of mixcr and really wanna analyze the data by mixcr but not other softwares. Could you please give me some advice about this or whether I can use mixcr for this group of data for a satisfactory result. Thank you soooo much!!!
Beta Was this translation helpful? Give feedback.
All reactions