Skip to content

Conversation

yesimon
Copy link
Contributor

@yesimon yesimon commented Apr 22, 2020

Eventually make all executables subcommands of classify. Disperse functionality from top-level
executable scripts like metagenomics.py and taxon_filter.py into the classify library.

Restructure tests to remove unit/ folder and place unit integration tests together for each file,
with pytest marks to distinguish between unit and integration tests.

Eventually make all executables subcommands of `classify`. Disperse functionality from top-level
executable scripts like metagenomics.py and taxon_filter.py into the `classify` library.

Restructure tests to remove `unit/` folder and place unit integration tests together for each file,
with pytest marks to distinguish between unit and integration tests.
@dpark01
Copy link
Member

dpark01 commented Apr 22, 2020

You know I was just thinking about this recently and it warrants a conversation among us. I haven't had a chance to dive into this but I'm certain that the right place to start with this restructure is not viral-classify, but viral-core.

My thought is for each modular github repo:

  • completely delink the python code from everything conda. Remove the tool stuff that calls conda. Just have python code totally assume that things are in the path properly.
  • the two pieces (conda binaries and python code) shall coexist in the same repos, but not interact
  • each repo will build a conda package for itself in Travis (which we haven't done for a while) that is nothing more than a list of dependencies on other conda packages (no de novo content at all)
  • each repo will build a traditional python package containing its own python code, and nothing beyond the python code
  • it will continue to build its own Docker image which will contain both (the python code and the conda packages) and will continue to unit test the python code within the docker container

Then viral-classify, -assemble, -phylo, etc, will list viral-core as a formal dependency, in both conda world and pypi world.

viral-pipelines remains nothing more than WDLs

In this world, viral-core has to be restructured first.

Thoughts?

@yesimon
Copy link
Contributor Author

yesimon commented Apr 22, 2020

Agreed. I'm not sure if we should actually upload pypi packages because they have many other dependencies, but my effort here is just to restructure it to be more similar to a traditional pip-installable python package, where we can use setup.py directives to install the executable scripts.

I think restructuring viral-x can be done independently from restructuring viral-core, especially when restructuring within the package as I'm doing here, or removing the tool classes because we can always assume conda dependencies are available.

@tomkinsc
Copy link
Contributor

Agreed, and I like your plan, @dpark01. We should probably all talk about making broadinstitute/viral-ngs read-only at some point as well.

@dpark01
Copy link
Member

dpark01 commented May 1, 2020

I took a high level skim here (haven't really dove into the details) and I really like the overall shape of this PR. Two big picture things at the moment:

  1. I'd want to verify that various wdls in viral-pipelines still work as before if we manually set their docker strings to quay.io/broadinstitute/viral-classify:sy-restructure-classify (and if not, how can we tweak the docker build / path / whatever to remain backwards compatible).
  2. Can you read through DEVELOPMENT_NOTES.md and include any necessary changes there in this PR?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants