Skip to content

Commit

Permalink
Merge branch 'main' of https://github.com/sanger-tol/ear
Browse files Browse the repository at this point in the history
  • Loading branch information
DLBPointon committed Jan 21, 2025
2 parents 844c575 + d4b0715 commit aa7a65d
Show file tree
Hide file tree
Showing 64 changed files with 2,354 additions and 614 deletions.
18 changes: 13 additions & 5 deletions .github/workflows/ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,8 @@ on:

env:
NXF_ANSI_LOG: false
NXF_SINGULARITY_CACHEDIR: ${{ github.workspace }}/.singularity
NXF_SINGULARITY_LIBRARYDIR: ${{ github.workspace }}/.singularity

concurrency:
group: "${{ github.workflow }}-${{ github.event.pull_request.number || github.ref }}"
Expand All @@ -24,23 +26,29 @@ jobs:
strategy:
matrix:
NXF_VER:
- "23.04.0"
- "24.04.0"
- "latest-everything"
steps:
- name: Check out pipeline code
uses: actions/checkout@0ad4b8fadaa221de15dcec353f45205ec38ea70b # v4
uses: actions/checkout@v4

- name: Install Nextflow
uses: nf-core/setup-nextflow@v2
with:
version: "${{ matrix.NXF_VER }}"

- name: Disk space cleanup
uses: jlumbroso/free-disk-space@54081f138730dfa15788a46383842cd2f914a1be # v1.3.1
- name: Download Tiny test data
# Download A fungal test data set that is full enough to show some real output.
# Needs a kmer db for merqury
run: |
curl https://tolit.cog.sanger.ac.uk/test-data/resources/treeval/TreeValTinyData.tar.gz | tar xzf -
cp TreeValTinyData/assembly/draft/grTriPseu1.fa TreeValTinyData/assembly/draft/grTriPseu1-hap.fa
cp TreeValTinyData/assembly/draft/grTriPseu1.fa TreeValTinyData/assembly/draft/grTriPseu1-all_hap.fa
- name: Run pipeline with test data
# TODO nf-core: You can customise CI pipeline run tests as required
# For example: adding multiple test runs with different parameters
# Remember that you can parallelise this by using strategy.matrix
# Skip BTK and CPRETEXT as they are already tested on their repos.
run: |
nextflow run ${GITHUB_WORKSPACE} -profile test,docker --outdir ./results
nextflow run ${GITHUB_WORKSPACE} -profile test,docker --outdir ./results --steps btk,cpretext,merquryfk
6 changes: 3 additions & 3 deletions .github/workflows/linting.yml
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@ jobs:
pre-commit:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@0ad4b8fadaa221de15dcec353f45205ec38ea70b # v4
- uses: actions/checkout@v4

- name: Set up Python 3.12
uses: actions/setup-python@82c7e631bb3cdc910f68e0081d67478d79c6982d # v5
Expand All @@ -31,7 +31,7 @@ jobs:
runs-on: ubuntu-latest
steps:
- name: Check out pipeline code
uses: actions/checkout@0ad4b8fadaa221de15dcec353f45205ec38ea70b # v4
uses: actions/checkout@v4

- name: Install Nextflow
uses: nf-core/setup-nextflow@v2
Expand All @@ -44,7 +44,7 @@ jobs:
- name: Install dependencies
run: |
python -m pip install --upgrade pip
pip install nf-core
pip install nf-core==2.8.0
- name: Run nf-core lint
env:
Expand Down
15 changes: 15 additions & 0 deletions .nf-core.yml
Original file line number Diff line number Diff line change
Expand Up @@ -4,21 +4,36 @@ lint:
- assets/nf-core-ear_logo_light.png
- docs/images/nf-core-ear_logo_light.png
- docs/images/nf-core-ear_logo_dark.png
- lib/nfcore_external_java_deps.jar
- lib/NfcoreSchema.groovy
- lib/NfcoreTemplate.groovy
- lib/Utils.groovy
- lib/WorkflowMain.groovy
- .github/ISSUE_TEMPLATE/config.yml
- .github/workflows/awstest.yml
- .github/workflows/awsfulltest.yml
- conf/igenomes.config
files_unchanged:
- LICENSE
- CODE_OF_CONDUCT.md
- assets/nf-core-ear_logo_light.png
- docs/images/nf-core-ear_logo_light.png
- docs/images/nf-core-ear_logo_dark.png
- .github/ISSUE_TEMPLATE/bug_report.yml
- .github/CONTRIBUTING.md
- .github/PULL_REQUEST_TEMPLATE.md
- .github/workflows/branch.yml
- .github/workflows/linting_comment.yml
- .github/workflows/linting.yml
- assets/email_template.html
multiqc_config:
- report_comment
nextflow_config:
- manifest.name
- manifest.homePage
- params.show_hidden_params
- params.schema_ignore_params
- params.validationSchemaIgnoreParams
nf_core_version: 2.14.1
repository_type: pipeline
template:
Expand Down
58 changes: 41 additions & 17 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,39 +2,63 @@

The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/)
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
Naming based on: [Mythical creatures](https://en.wikipedia.org/wiki/List_of_legendary_creatures_by_type).
Naming based on: [Audiologists](https://en.wikipedia.org/wiki/Category:Audiologists).

## v1.0.0 - Aquatic Bahamut [21/08/2024]
## v0.6.2 - Robert Beiny H2 [09/01/2025]

- Modules have been updated to remove conda defaults.

### Software dependencies

| Dependency | Old version | New version |
| ---------------------------- | ------------------- | ------------------- |
| sanger-tol/blobtoolkit\* | | 0.6.0 (Bellsprout) |
| sanger-tol/curationpretext\* | 1.0.0 (UNSC Cradle) | 1.1.0 (UNSC Delphi) |
| GFASTATS | | 1.3.6--hdcf5f25_3 |
| MERQUERY_FK | | 1.2 |
| MINIMAP2_ALIGN | | 2.28 |
| SAMTOOLS_MERGE | 1.20--h50ea8bc_0 | 1.21--h50ea8bc_0 |
| SAMTOOLS_SORT | 1.21--h50ea8bc_0 | 1.21--h50ea8bc_0 |

## v0.6.1 - Robert Beiny H1 [08/10/2024]

- Blobtookit version was specified in the wrong location, so defaulted to a development branch "draft_assemblies", this has now been updated to v0.6.0.
- Zenodo DOI has now been added to the repo.

## v0.6.0 - Robert Beiny [20/09/2024]

Initial release of sanger-tol/ear, created with the [nf-core](https://nf-co.re/) template.
The current pipeline means the MVP for ear.

### Added

GFASTATS to generate statistics on the input primary genome.
MERQURY_FK to generate kmer graphs and analyses of the primary, haplotype and merged assembly.
MAIN_MAPPING which is a small mapping subworkflow, that can work with single and paired reads.
BLOBTOOLKIT to generate busco files and blobtoolkit dataset/plots.
CURATIONPRETEXT to generate pretext plots and pngs.

### Parameters

| Old parameter | New parameter |
| --------------- | ------------- |
| | --mapped |
| Old parameter | New parameter |
| ------------- | ------------- |
| | --mapped |
| | --steps |

### Software dependencies

| Dependency | Old version | New version |
| ----------- | ------------- | ------------- |
| sanger-tol/blobtoolkit* | | draft_assemblies |
| sanger-tol/curationpretext* | | 1.0.0 (UNSC Cradle) |
| GFASTATS | | 1.3.6--hdcf5f25_3 |
| MERQUERY_FK | | 1.2 |
| MINIMAP2_ALIGN | | 2.28 |
| SAMTOOLS_MERGE | | 1.20--h50ea8bc_0 |
| SAMTOOLS_SORT | | 1.20--h50ea8bc_0 |
|
| Dependency | Old version | New version |
| ---------------------------- | ----------- | ------------------- |
| sanger-tol/blobtoolkit\* | | 0.6.0 (Bellsprout) |
| sanger-tol/curationpretext\* | | 1.0.0 (UNSC Cradle) |
| GFASTATS | | 1.3.6--hdcf5f25_3 |
| MERQUERY_FK | | 1.2 |
| MINIMAP2_ALIGN | | 2.28 |
| SAMTOOLS_MERGE | | 1.20--h50ea8bc_0 |
| SAMTOOLS_SORT | | 1.20--h50ea8bc_0 |

- Note: for pipelines, please check their own CHANGELOG file for a full list of software dependencies.
\* for pipelines, please check their own CHANGELOG file for a full list of software dependencies.

### Dependencies
The pipeline depends on a number of databases which are noted in [README](README.md) and [USAGE](docs/usage.md).

The pipeline depends on a number of databases which are noted in [README](README.md) and [USAGE](docs/usage.md).
28 changes: 22 additions & 6 deletions CITATIONS.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,19 +10,35 @@
## Pipeline tools

- [FastQC](https://www.bioinformatics.babraham.ac.uk/projects/fastqc/)
- [GFastar/GFastats](https://www.biorxiv.org/content/10.1101/2022.03.24.485682v1)

> Andrews, S. (2010). FastQC: A Quality Control Tool for High Throughput Sequence Data [Online].
> Formenti, G., Abueg, L., Brajuka, N., Gallardo, C., Giani, A., Fedrigo, O., Jarvis, ED. (2022). Gfastats: conversion, evaluation and manipulation of genome sequences using assembly graphs. bioRxiv. doi: https://doi.org/10.1101/2022.03.24.485682
- [MultiQC](https://pubmed.ncbi.nlm.nih.gov/27312411/)
- [Merqury_FK](https://github.com/thegenemyers/MERQURY.FK)

> Ewels P, Magnusson M, Lundin S, Käller M. MultiQC: summarize analysis results for multiple tools and samples in a single report. Bioinformatics. 2016 Oct 1;32(19):3047-8. doi: 10.1093/bioinformatics/btw354. Epub 2016 Jun 16. PubMed PMID: 27312411; PubMed Central PMCID: PMC5039924.
> Myers, G., Rhie, A. (2024). MerquryFK & KatFK. [online]. https://github.com/thegenemyers/MERQURY.FK. (Accessed on 20 September 2024).
- [Minimap2](https://pubmed.ncbi.nlm.nih.gov/34623391/)

> Li, H. 2021. ‘New strategies to improve MINIMAP2 alignment accuracy’, Bioinformatics, 37(23), pp. 4572–4574. doi:10.1093/bioinformatics/btab705.
- [Samtools](https://pubmed.ncbi.nlm.nih.gov/33590861/)

> Danecek P, Bonfield JK, Liddle J, Marshall J, Ohan V, Pollard MO, Whitwham A, Keane T, McCarthy SA, Davies RM, Li H. Twelve years of SAMtools and BCFtools. Gigascience. 2021 Feb 16;10(2):giab008. doi: 10.1093/gigascience/giab008. PMID: 33590861; PMCID: PMC7931819.
- [sanger-tol/blobtoolkit](https://zenodo.org/records/13758882)

> Muffato, M., Butt, Z., Challis, R., Kumar, S., Qi, G., Ramos Díaz, A., Surana, P., & Yates, B. (2024). sanger-tol/blobtoolkit: v0.6.0 – Bellsprout (0.6.0). Zenodo. https://doi.org/10.5281/zenodo.13758882
- [sanger-tol/curationpretext](https://zenodo.org/records/13758882)

> Pointon, DLB. (2024). sanger-tol/curationpretext: v1.0.0 (UNSC Cradle). [online]. https://github.com/sanger-tol/curationpretext/releases/tag/1.0.0. (Accessed on 20 September 2024).
## Software packaging/containerisation tools

- [Anaconda](https://anaconda.com)
- [Conda](https://conda.org/)

> Anaconda Software Distribution. Computer software. Vers. 2-2.4.0. Anaconda, Nov. 2016. Web.
> conda contributors. conda: A system-level, binary package and environment manager running on all major operating systems and platforms. Computer software. https://github.com/conda/conda
- [Bioconda](https://pubmed.ncbi.nlm.nih.gov/29967506/)

Expand Down
2 changes: 1 addition & 1 deletion LICENSE
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
MIT License

Copyright (c) DLBPointon
Copyright (c) 2022 - 2023 Genome Research Ltd.

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
Expand Down
28 changes: 14 additions & 14 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,9 +1,7 @@
[![GitHub Actions CI Status](https://github.com/sanger-tol/ear/actions/workflows/ci.yml/badge.svg)](https://github.com/sanger-tol/ear/actions/workflows/ci.yml)
[![GitHub Actions Linting Status](https://github.com/sanger-tol/ear/actions/workflows/linting.yml/badge.svg)](https://github.com/sanger-tol/ear/actions/workflows/linting.yml)[![Cite with Zenodo](http://img.shields.io/badge/DOI-10.5281/zenodo.XXXXXXX-1073c8?labelColor=000000)](https://doi.org/10.5281/zenodo.XXXXXXX)
[![GitHub Actions Linting Status](https://github.com/sanger-tol/ear/actions/workflows/linting.yml/badge.svg)](https://github.com/sanger-tol/ear/actions/workflows/linting.yml)[![DOI](https://zenodo.org/badge/833605808.svg)](https://doi.org/10.5281/zenodo.13819520)
[![nf-test](https://img.shields.io/badge/unit_tests-nf--test-337ab7.svg)](https://www.nf-test.com)

[![Nextflow](https://img.shields.io/badge/nextflow%20DSL2-%E2%89%A523.04.0-23aa62.svg)](https://www.nextflow.io/)
[![run with conda](http://img.shields.io/badge/run%20with-conda-3EB049?labelColor=000000&logo=anaconda)](https://docs.conda.io/en/latest/)
[![Nextflow](https://img.shields.io/badge/nextflow%20DSL2-%E2%89%A524.04.0-23aa62.svg)](https://www.nextflow.io/)
[![run with docker](https://img.shields.io/badge/run%20with-docker-0db7ed?labelColor=000000&logo=docker)](https://www.docker.com/)
[![run with singularity](https://img.shields.io/badge/run%20with-singularity-1d355c.svg?labelColor=000000)](https://sylabs.io/docs/)
[![Launch on Seqera Platform](https://img.shields.io/badge/Launch%20%F0%9F%9A%80-Seqera%20Platform-%234256e7)](https://cloud.seqera.io/launch?pipeline=https://github.com/sanger-tol/ear)
Expand All @@ -15,7 +13,7 @@
1. Read the input yaml file (YAML_INPUT)
2. Run GFASTATS (GFASTARS)
3. Run MERQURYFK_MERQURYFK (MERQURYFK)
4. Run MAIN_MAPPING, longread single-end/paired-end mapping
4. Run MAIN_MAPPING, longread single-end/paired-end mapping
5. Run GENERATE_SAMPLESHEET, generate a csv file required for SANGER_TOL_BTK.
6. Run SANGER_TOL_BTK, also known as SANGER-TOL/BLOBTOOLKIT a subpipline for SANGER-TOL/EAR
7. Run SANGER_TOL_CPRETEXT, also known as SANGER-TOL/CURATIONPRETEXT a subpipeline for SANGER-TOL/EAR.
Expand All @@ -27,11 +25,12 @@
The sanger-tol/ear pipeline requires a number of databases in place in order to run the blobtoolkit pipeline.
These include:
- A blast nt database
- A Diamond blast uniprot database
- A Diamond blast nr database
- An NCBI taxdump
- An NCBI rankedlineage.dmp

- A blast nt database
- A Diamond blast uniprot database
- A Diamond blast nr database
- An NCBI taxdump
- An NCBI rankedlineage.dmp

Next, a yaml file containing the following should then be completed:

Expand Down Expand Up @@ -59,8 +58,9 @@ curationpretext:
hic_dir: <DIRECTORY OF HIC READ FILES .CRAM AND .CRAI>
btk:
taxid: 1464561
lineages: <CSV LIST OF DATABASES TO USE: "insecta_odb10,diptera_odb10">
gca_accession: GCA_0001 <DEFAULT, DO NOT CHANGE UNLESS YOU HAVE A GCA_ACCESSION FOR YOUR SPECIES>
lineages: < CSV LIST OF DATABASES TO USE: "insecta_odb10,diptera_odb10">
gca_accession: GCA_0001 <DEFAULT, DO NOT CHANGE UNLESS YOU HAVE A GCA_ACCESSION FOR YOUR SPECIES >

nt_database: <DIRECTORY CONTAINING BLAST DB>
nt_database_prefix: <BLASTDB PREFIX>
diamond_uniprot_database_path: <PATH TO reference_proteomes.dmnd FROM UNIPROT>
Expand All @@ -70,14 +70,14 @@ btk:
config: <PATH TO ear/conf/sanger-tol-btk.config TO OVERWRITE PROCESS LIMITS>
```
Now, you can run the pipeline using:
```bash
nextflow run sanger-tol/ear -profile <singularity,docker> \\
--input assets/idCulLati1.yaml \\
--mapped TRUE \\ # OPTIONAL
--outdir test-truth
--steps ["", "btk", "cpretext", "merquryfk"] # OPTIONAL CSV LIST OF STEPS TO EXCLUDE FROM EXECUTION
--outdir test
```

> [!WARNING]
Expand Down
2 changes: 1 addition & 1 deletion assets/idCulLati1.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
assembly_id: idCulLati1_ear
reference_hap1: /nfs/treeoflife-01/teams/tola/users/dp24/ear/idCulLati1/primary.fa
reference_hap2: /nfs/treeoflife-01/teams/tola/users/dp24/ear/idCulLati1/hap2.fa
reference_haplotigs: /
reference_haplotigs: /nfs/treeoflife-01/teams/tola/users/dp24/ear/haplotigs.fa

# If a mapped bam already exists use the below + --mapped TRUE on the nextflow command else ignore.
mapped_bam: /nfs/treeoflife-01/teams/tola/users/dp24/ear/idCulLati1/mapped_bam.bam
Expand Down
4 changes: 2 additions & 2 deletions assets/real_pdf.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -20,14 +20,14 @@ PROFILING:
# ASSEMBLY DATA
ASSEMBLIES:
Pre-curation:
pipeline: [hifiasm_v0.19.8-r603|--primary, purge_dups_v1.2.5|-e, yahs_v1.2a.2|]
pipeline: [hifiasm_v0.19.8-r603|--primary, purge_dups_v1.2.5|-e, yahs_v1.2a.2|]
pri:
gfastats--nstar-report_txt: /lustre/scratch123/tol/tolqc/data/erga-bge/insects/Culex_laticinctus/working/idCulLati1.hifiasm.20240430/scaffolding/yahs/out.break.yahs/out_scaffolds_final.fa.gz.gfastats
busco_short_summary_txt: /lustre/scratch123/tol/tolqc/data/erga-bge/insects/Culex_laticinctus/working/idCulLati1.hifiasm.20240430/scaffolding/yahs/out.break.yahs/out_scaffolds_final.insecta_odb10.busco/short_summary.specific.insecta_odb10.out_scaffolds_final.insecta_odb10.busco.txt
merqury_folder: /lustre/scratch123/tol/tolqc/data/erga-bge/insects/Culex_laticinctus/working/idCulLati1.hifiasm.20240430/scaffolding/yahs/out.break.yahs/out_scaffolds_final.ccs.merquryk/

Curated:
pipeline: [hifiasm_v0.19.8-r603|--primary, purge_dups_v1.2.5|-e, yahs_v1.2a.2|, TreeVal_v1.1]
pipeline: [hifiasm_v0.19.8-r603|--primary, purge_dups_v1.2.5|-e, yahs_v1.2a.2|, TreeVal_v1.1]
pri:
gfastats--nstar-report_txt: /lustre/scratch123/tol/tolqc/data/erga-bge/insects/Culex_laticinctus/assembly/curated/idCulLati1.1/ear/idCulLati1.1.primary.curated.fa.gfastats
busco_short_summary_txt: /lustre/scratch123/tol/tolqc/data/erga-bge/insects/Culex_laticinctus/assembly/curated/idCulLati1.1/ear/idCulLati1.1.primary.curated.insecta_odb10.busco/short_summary.specific.insecta_odb10.idCulLati1.1.primary.curated.insecta_odb10.busco.txt
Expand Down
3 changes: 0 additions & 3 deletions assets/samplesheet.csv

This file was deleted.

Loading

0 comments on commit aa7a65d

Please sign in to comment.