The definition of a complete chloroplast comprises the following requirements:
- The subgraph need to have between MINNODES (3) and MAXNODES (100)
|
next if (@{$wcc} < $MINNODES || @{$wcc} > $MAXNODES); |
- Need to be a cyclic subgraph with a total sequence length between MINSEQLEN (25 kbp) and MAXSEQLEN (1 Mbp)
|
next unless ($c->is_cyclic && $seqlen >= $MINSEQLEN && $seqlen <= $MAXSEQLEN); |
- Subgraph need to have at least one blast hit against the reference database
|
my $output = qx(tblastx -db $blastdbfile -query $filename -evalue 1e-10 -outfmt 6 -num_alignments 1 -num_threads 4); |
|
|
|
if (length($output) > 0) |
|
{ |
|
$L->debug("Found hits for cyclic graph: ".$c); |
|
push(@cyclic_contigs_with_blast_hits, $c); |
|
} |
- Only one subgraph having blast hits is allowed
|
if (@cyclic_contigs_with_blast_hits == 1) |
- The node with the highest connectivity is assigned as IR
|
my $inverted_repeat = "$degree[0]{v}"; |
- After removing the IR nodes, only two other nodes are allowed
- LSC and SSC are simply assigned by sequence length
|
if (length($seq[$lsc]) < length($seq[$ssc])) |
|
{ |
|
($lsc, $ssc) = ($ssc, $lsc); |
|
} |
I think we can improve our detection by avoiding some of those requirements, eg. 6.
Any ideas are welcome!
The definition of a complete chloroplast comprises the following requirements:
fastg-parser/import.pl
Line 200 in be99085
fastg-parser/import.pl
Line 213 in be99085
fastg-parser/import.pl
Lines 233 to 239 in be99085
fastg-parser/import.pl
Line 246 in be99085
fastg-parser/import.pl
Line 270 in be99085
fastg-parser/import.pl
Line 281 in be99085
fastg-parser/import.pl
Lines 286 to 289 in be99085
I think we can improve our detection by avoiding some of those requirements, eg. 6.
Any ideas are welcome!