Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 4 additions & 4 deletions jgi_data_results/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -68,10 +68,10 @@ Results show six categories of taxonomic prediction accuracy:
|----------|-------|-----------------|
| **EXACT MATCH** | 15 | IMG and GTDB predictions match at species level |
| **MATCH - Phylum level** | 45 | Predictions align at phylum level (most common case) |
| **PARTIAL - Same genus** | 14 | Same genus but different species prediction |
| **PARTIAL - Same genus** | 16 | Same genus but different species prediction |
| **UNCLASSIFIED** | 13 | GTDB assigned "Unclassified Bacteria" (insufficient confidence) |
| **MISMATCH - Different genus** | 6 | IMG and GTDB predictions differ at genus level |
| **MISSING GTDB DATA** | 7 | GTDB data unavailable for this IMG genome |
| **MISSING GTDB DATA** | 5 | GTDB data unavailable for this IMG genome |

### `img_llm_annotations.tsv` (Supplementary File)

Expand All @@ -83,11 +83,11 @@ Contains the same 100 samples without GTDB annotations, showing only the origina

2. **Species-Level Exact Matches**: Only 15% show exact species-level matches, reflecting both taxonomic annotation methodology differences and potential genuinely different organism identifications.

3. **Genus-Level Partial Matches**: 14% of samples remain at the same genus but with different species predictions, suggesting fine-grained taxonomic differences.
3. **Genus-Level Partial Matches**: 16% of samples remain at the same genus but with different species predictions, suggesting fine-grained taxonomic differences.

4. **Unclassified Cases**: 13% received "Unclassified Bacteria" from GTDB, often indicating novel organisms or sequences with limited reference data.

5. **Missing Data**: 7% of IMG genomes lack GTDB coverage, highlighting coverage gaps in the GTDB database.
5. **Missing Data**: 5% of IMG genomes lack GTDB coverage, highlighting coverage gaps in the GTDB database.

6. **Mismatches**: 6% show true genus-level mismatches, potentially indicating annotation errors or novel taxonomy.

Expand Down
4 changes: 2 additions & 2 deletions jgi_data_results/img_llm_annotations_with_gtdb.tsv
Original file line number Diff line number Diff line change
Expand Up @@ -92,10 +92,10 @@ JGI sequencing project id IMG genome id File id File name Original upa Generated
1352409 2947692693 6226423ae99caa81935a9684 52655.2.412973.CAAGTGCA-CAAGTGCA.fastq.gz 239038/186/1 240032 https://narrative.kbase.us/narrative/240032 Actinobacteria bacterium 20805-2 Streptomyces althioticus MATCH - Phylum level (Matches at phylum level)
1214714 2816332240 5c3a97ec46d1e66b9ba89490 12804.1.287316.AATACGCG-CGCGTATT.fastq.gz 239038/187/1 240034 https://narrative.kbase.us/narrative/240034 Streptomyces albus J1074 VWB-mCherry-12 Streptomyces albidoflavus PARTIAL - Same genus
1340370 2944903374 61e8ce15b0df7a8c0db81a10 52640.1.404944.ACGATGAC-ACGATGAC.fastq.gz 239038/188/1 240028 https://narrative.kbase.us/narrative/240028 Actinomycetota bacterium 44427 Streptomyces sp900105755 MATCH - Phylum level (Matches at phylum level)
1248680 8130845799 67ae064768dd0c5de8e12ada 53090.2.581637.CCACTCGAGC-AGGACTCTTC.fastq.gz 239038/194/1 240223 https://narrative.kbase.us/narrative/240223 Streptosporangium nanhuense DSM 46674 Not found in log MISSING GTDB DATA
1248680 8130845799 67ae064768dd0c5de8e12ada 53090.2.581637.CCACTCGAGC-AGGACTCTTC.fastq.gz 239038/194/1 240223 https://narrative.kbase.us/narrative/240223 Streptosporangium nanhuense DSM 46674 Not found in log PARTIAL - Same genus
1053055 2596583657 545d5e010d87855284890b40 8465.8.102013.GACGAC.fastq.gz 239038/195/1 240224 https://narrative.kbase.us/narrative/240224 Xanthobacter autotrophicus DSM 432 Xanthobacter autotrophicus_A EXACT MATCH
1030857 2574179732 5329352d49607a1be00599a1 7779.3.83550.AAGCGA.fastq.gz 239038/198/1 240175 https://narrative.kbase.us/narrative/240175 Vibrio porteresiae DSM 19223 Vibrio porteresiae EXACT MATCH
1186048 8130828393 67d0972fd72b25923552b524 53102.8.586047.TCGTTCGTAA-CTCGAACCGG.fastq.gz 239038/199/1 240226 https://narrative.kbase.us/narrative/240226 Streptosporangium shengliense DSM 45881 (version 2) Not found in log MISSING GTDB DATA
1186048 8130828393 67d0972fd72b25923552b524 53102.8.586047.TCGTTCGTAA-CTCGAACCGG.fastq.gz 239038/199/1 240226 https://narrative.kbase.us/narrative/240226 Streptosporangium shengliense DSM 45881 (version 2) Not found in log PARTIAL - Same genus
1248546 8130794579 67d0973ed72b25923552b58d 53102.8.586047.ACGGGTGAGC-GTTTCCGTAC.fastq.gz 239038/202/1 240174 https://narrative.kbase.us/narrative/240174 Streptosporangium longisporum DSM 43180 (version 2) Unclassified Bacteria UNCLASSIFIED
1347351 2990676009 6267f6a1f21e5a14d08d81ce 52684.1.418857.GGTATAGG-GGTATAGG.fastq.gz 239038/203/1 240171 https://narrative.kbase.us/narrative/240171 Bradyrhizobium elkanii USDA 319 Unclassified Bacteria UNCLASSIFIED
1352244 2953085618 622f645fe99caa81935b477e 52664.1.415157.TTGGACGT-TTGGACGT.fastq.gz 239038/2/1 239695 https://narrative.kbase.us/narrative/239695 Brevundimonas sp. 003966 Brevundimonas EXACT MATCH
Loading