Skip to content
This repository has been archived by the owner on May 21, 2024. It is now read-only.

ENH: Semantic Type for fetch-busco-db output #75

Closed
wants to merge 9 commits into from

Conversation

Sann5
Copy link
Contributor

@Sann5 Sann5 commented Jan 23, 2024

About this repo

  • q2-types-genomics is a qiime2 plugin that defines semantic types (ST) for other plugins.

What's new

Set up an environment

# For linux: 
# export MY_OS="linux"
# For mac:
export MY_OS="osx" 
wget "https://data.qiime2.org/distro/shotgun/qiime2-shotgun-2023.9-py38-"$MY_OS"-conda.yml"
conda env create -n q2-shotgun --file qiime2-shotgun-2023.9-py38-osx-conda.yml
rm "qiime2-shotgun-2023.9-py38-"$MY_OS"-conda.yml"

Run it locally

  1. First, clone the repo and checkout the PR branch:
conda activate q2-shotgun
conda install -c conda-forge -c bioconda busco=5.6.1
conda remove q2-types-genomics q2-types
pip install git+https://github.com/qiime2/q2-types.git
git clone [email protected]:bokulich-lab/q2-types-genomics.git
cd q2-types-genomics
gh pr checkout <PR_num_here>
pip install -e .
  1. Let's get you some data to play with:
cd wherever_you_want_to_download_the_data_to
busco --download "virus"
cd ..

FYI I can't run busco from the visual code terminal (it can find it), only from i-term.

  1. Test it out!
qiime tools import --input-path wherever_you_want_to_download_the_data_to --output-path busco_db.qza --type "ReferenceDB[BuscoDB]"

Running the tests

pytest -W ignore -vv --pyargs q2_types_genomics

@Sann5 Sann5 added the enhancement New feature or request label Jan 23, 2024
@Sann5 Sann5 self-assigned this Jan 23, 2024
@Sann5 Sann5 requested a review from misialq January 23, 2024 13:59
@Sann5 Sann5 marked this pull request as draft January 23, 2024 13:59
Comment on lines 310 to 338
# File collections for text files
(
ancestral,
dataset,
lengths_cutoff,
scores_cutoff,
links_to_ODB10,
ancestral_variants,
ogs_id,
species,
prfls,
hmms,
refseq_db_md5
) = [
model.FileCollection(pattern, format=BuscoGenericTextFileFmt)
for pattern in [
r'.+ancestral$',
r'.+dataset\.cfg$',
r'.+lengths_cutoff$',
r'.+scores_cutoff$',
r'.+links_to_ODB10\.txt$',
r'.+ancestral_variants$',
r'.+ogs\.id\.info$',
r'.+species\.info$',
r'.+\.prfl$',
r'.+\.hmm$',
r'.+refseq_db\.faa\.gz\.md5'
]
]
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @misialq. I wanted to get your input on this first draft of the BuscoDatabaseDirFmt. I defined all file collections as BuscoGenericTextFileFmt but I was wondering if you'd like me to create more specific formats for some of these (e.g. hmms which are the HMMER files).

Comment on lines 350 to 397
# Define path maker methods for each
@ancestral.set_path_maker
def ancestral_path_maker(self, name):
return str(name)

@dataset.set_path_maker
def dataset_path_maker(self, name):
return str(name)

@lengths_cutoff.set_path_maker
def lengths_cutoff_path_maker(self, name):
return str(name)

@scores_cutoff.set_path_maker
def scores_cutoff_path_maker(self, name):
return str(name)

@links_to_ODB10.set_path_maker
def links_to_ODB10_path_maker(self, name):
return str(name)

@ancestral_variants.set_path_maker
def ancestral_variants_path_maker(self, name):
return str(name)

@ogs_id.set_path_maker
def ogs_id_path_maker(self, name):
return str(name)

@species.set_path_maker
def species_path_maker(self, name):
return str(name)

@prfls.set_path_maker
def prfls_path_maker(self, name):
return str(name)

@hmms.set_path_maker
def hmms_path_maker(self, name):
return str(name)

@refseq_db.set_path_maker
def refseq_db_path_maker(self, name):
return str(name)

@refseq_db_md5.set_path_maker
def refseq_db_md5_path_maker(self, name):
return str(name)
Copy link
Contributor Author

@Sann5 Sann5 Jan 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@misialq

I was also wondering if you know a less verbose way of defining all these path_maker's. I guess I could add methods to the class with a loop but outside of the class definition. I also don't know exactly what this method is for so I'm not sure if returning str(name) is enough for our use case.

BTW you were right, qiime complains if file collections don't have this path_maker method overwritten.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I managed to do this inside init so the lines above are outdated :)

Copy link

codecov bot commented Jan 23, 2024

Codecov Report

Attention: 1 lines in your changes are missing coverage. Please review.

Comparison is base (3b993ed) 96.77% compared to head (15615ed) 96.78%.

Files Patch % Lines
q2_types_genomics/reference_db/_format.py 95.65% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main      #75      +/-   ##
==========================================
+ Coverage   96.77%   96.78%   +0.01%     
==========================================
  Files          42       42              
  Lines        1548     1586      +38     
==========================================
+ Hits         1498     1535      +37     
- Misses         50       51       +1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@Sann5
Copy link
Contributor Author

Sann5 commented Jan 24, 2024

I added some test data with the expected file structure and empty files but its a lot of them. Do you think I should follow another approach for testing the BuscoDatabaseDirFmt @misialq?

@Sann5 Sann5 marked this pull request as ready for review January 29, 2024 13:08
@Sann5 Sann5 closed this May 16, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ENH: Semantic Type for fetch-busco-db's output
1 participant