The following defines the various code documents that were used as part of the project, as well as the data sets that were obtained during the project.
In this document, the annotation matrices corresponding to the data collected by Bakta and PPanGGolin are already merged for both Thermaceae (SupplementaryFile1_1) and Deinococcaceae (SupplementaryFile1_2) for better understanding.
This file contains the text documents output by CheckM, giving the completeness and contamination values for each genome among other parameters for each family.
Results of merging the general annotation matrices (Supplementary Files 1 and 2) with the KO identifiers associated with the KEGG database.
Presence-absence matrices generated by PPanGGolin. Matrices made up of binary values where the clusters obtained are represented against the total set of genomes, where the value 1 indicates the presence of the genome in the cluster, and the value 0 indicates its absence. For the family Thermaceae (SupplementaryFile5_1_1) we also add the annotation of the two reference strains of the genus Thermus , T. Thermophilus HB27 and HB8.
.csv files corresponding to the results obtained through SAPPHIRE where the sequences used are specified followed by their thermophilic value. Supplementary File 8.3 (Thermaceae) Supplementary File 8.4 (Deinococcaceae)
This script uses the functionalities of Bakta to detect the 16s rRNA genes of the different genomes and set a minimum threshold of base pairs for these genes. Finally, the contamination of the samples is analyzed using one of the functionalities of the Biopython package.
This document explains how to merge the presence-absence matrix provided by PPanGGolin with the annotation of the reference genomes obtained from GenBank.
The following code makes use of the general annotation matrices (Supplementary File 1) to carry out a representation of the genomes of both families according to the genome type imposed by PPanGGolin. Associated to Fig. 2 of the article
Graphical representation of the genomes based on the Supplementary File 4 annotation. Associated to Fig. 4A of the article
The results obtained by SAPPHIRE are used and represented through a barplot. Associated to Fig 5. of the article.