This script needs to be run through the EBI VPN
The dashboard in this repository are created using Plotly Dash framework to track the SARS-CoV-2 country submissions into ENA.
The Tool includes:
- SQL.Reads_fetching.py: Fetch the read data from ERAREAD database.
- SQL_Analysis_fetching.py: Fetch Analysis and Sequence data from ENAREAD and ERAREAD
- Seq_Analysis_grouping.py: Process/group the analysis and the sequences from ENAREAD and ERAREAD.
- APIReads_fetch_process.py: Fetch Read and sequence data through ENA portal API, process/group the read data (from the portal API and ERAREAD) and NCBI/DDBJ data (reads and sequences).
- dashboard_v2.py: The Dashboard script contains the final processing and data grouping and the dashboard layout and callbacks
- dashboard_workflow.sh: Bash script that run the workflow to retrieve and process the data.
-
Install a Conda-based Python3 distribution, miniconda is recommended (see the link below) https://docs.conda.io/en/latest/miniconda.html
-
Setting up the Oracle database enviroment The ERA database is an Oracle database. In order to query the db, this script uses the
cx_Oraclepython module, which requires a little setup. -
Install the module using:
pip install cx_Oracle -
The Oracle Instant Client is a requirement of this module. The ‘Basic Light’ package is sufficient for our needs.
-
Once the instant client is downloaded, set the location of this library using the
$ORACLE_CLIENT_LIBenvironment variable before using this script.Setting up the Enviroment
NO NEED FOR ROOT WORK -
Unzip the
instantclient -
Find the path for the unzipped
instantclientand save it -
Edit the
.bashrcfile to set oracle enviroment -
Add the following lines to the end of
.bashrcfileexport ORACLE_HOME=/path/to/oracle/instantclientexport LD_LIBRARY_PATH=$ORACLE_HOME:$LD_LIBRARY_PATHexport PATH=$ORACLE_HOME:$PATHexport ORACLE_CLIENT_LIB=$ORACLE_HOME
-
sourcethe.bashrcfilesource $HOME/.bashrc
For more details, see: https://cx oracle.readthedocs.io/en/latest/user_guide/installation.html
- Clone the repository
git clone <repository>
-
Activate conda environment
source path/to/conda/bin/activate -
Setting up the scripts environment
Modify the config file (config.yaml) by including the appropriate values for each variable as below:
- ERAPRO_DETAILS: The credentials of the ERAREAD database where runs and analysis are going to be retrieved and processed
- ENAPRO_DETAILS: The credentials of the ENAREAD database where sequences are going to be retrieved and processed
Modify the data fetching and processing workflow file (dashboard_workflow.sh) by adding the absulote files path to each script
Note: The data fetching and processing workflow file (dashboard_workflow.sh) output the data in the form of .csv files, please make sure that the output directory is the same for all the scripts (
-o/--output flag)
To run the data fetching and processing workflow just run the following command:
sh <path/to>/dashboard_workflow.sh
To run the dashboard just run the following command:
python3 <path/to>/dashboard_v2.py -f <path/to/workflow_output_directory>
Note: You can view the Dashboard by using the following link in your browser (Make sure that you are connected to EBI VPN) http://10.42.28.202:8080/