Skip to content

rnrlab/TFM_Pablo_2023

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 

Repository files navigation

Comparative analysis of the pangenome of Thermaceae and Deinococcaceae families

Final Master's thesis

Author: Pablo Fernandez

Director: Modesto Redrejo Rodríguez

The following defines the various code documents that were used as part of the project, as well as the data sets that were obtained during the project.

Datasets

Supplementary File 1

In this document, the annotation matrices corresponding to the data collected by Bakta and PPanGGolin are already merged for both Thermaceae (SupplementaryFile1_1) and Deinococcaceae (SupplementaryFile1_2) for better understanding.

Supplementary File 3

This file contains the text documents output by CheckM, giving the completeness and contamination values for each genome among other parameters for each family.

Supplementary File 4

Results of merging the general annotation matrices (Supplementary Files 1 and 2) with the KO identifiers associated with the KEGG database.

Supplementary File 5

Presence-absence matrices generated by PPanGGolin. Matrices made up of binary values where the clusters obtained are represented against the total set of genomes, where the value 1 indicates the presence of the genome in the cluster, and the value 0 indicates its absence. For the family Thermaceae (SupplementaryFile5_1_1) we also add the annotation of the two reference strains of the genus Thermus , T. Thermophilus HB27 and HB8.

Supplementary File 8.3 and 8.4

.csv files corresponding to the results obtained through SAPPHIRE where the sequences used are specified followed by their thermophilic value. Supplementary File 8.3 (Thermaceae) Supplementary File 8.4 (Deinococcaceae)

Scripts

Supplementary File 2

This script uses the functionalities of Bakta to detect the 16s rRNA genes of the different genomes and set a minimum threshold of base pairs for these genes. Finally, the contamination of the samples is analyzed using one of the functionalities of the Biopython package.

Supplementary File 5.1.2

This document explains how to merge the presence-absence matrix provided by PPanGGolin with the annotation of the reference genomes obtained from GenBank.

Supplementary File 6

The following code makes use of the general annotation matrices (Supplementary File 1) to carry out a representation of the genomes of both families according to the genome type imposed by PPanGGolin. Associated to Fig. 2 of the article

Supplementary File 7

Graphical representation of the genomes based on the Supplementary File 4 annotation. Associated to Fig. 4A of the article

Supplementary File 8

The results obtained by SAPPHIRE are used and represented through a barplot. Associated to Fig 5. of the article.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors