Description
Is your feature request related to a problem?
The documentation contains a couple of sections where the project structure is explained.
- https://pytask-dev.readthedocs.io/en/stable/tutorials/set_up_a_project.html
- https://pytask-dev.readthedocs.io/en/stable/how_to_guides/bp_structure_of_a_research_project.html
- https://pytask-dev.readthedocs.io/en/stable/how_to_guides/bp_templates_and_projects.html
All of them propose to structure the project using an src layout
(good) where the tasks are within the project folder (bad).
Why is this bad?
-
You cannot use
pip install .
to install the project but must use the editable mode.Why? If you use the normal installation, the paths
SRC
andBLD
defined inconfig.py
will be relative to the installed package path (like/mambaforge/envs/my_project/lib/python-3.11/site-packages/my_project/
). It means the data is assumed to lie somewhere there.Of course, you could add the data to your Python project via
MANIFEST.in
, but then the data would be copied over to the environment directory on every install, which can be very expensive. -
The data should not be part of the application.
Describe the solution you'd like
The new structure I propose is this one.
my_project
│
├───.pytask
│
├───bld
│ └────...
│
├───data
│ └────...
│
├───src
│ └───my_project
│ ├────__init__.py
│ └────data_preparation.py
│
├───tasks
│ ├────config.py
│ └───data_preparation
│ └────task_data_preparation.py
│
└───pyproject.toml
- Tasks are moved to a separate folder,
tasks
, just like tests. - Data is moved to
data
, out ofsrc
.
API breaking implications
None.
Describe alternatives you've considered
None.
Additional Context
Popular templates for data science projects also keep