Skip to content

ENH: Changing the recommendation on how to set up a project. #570

Open
@tobiasraabe

Description

@tobiasraabe

Is your feature request related to a problem?

The documentation contains a couple of sections where the project structure is explained.

All of them propose to structure the project using an src layout (good) where the tasks are within the project folder (bad).

Why is this bad?

  • You cannot use pip install . to install the project but must use the editable mode.

    Why? If you use the normal installation, the paths SRC and BLD defined in config.py will be relative to the installed package path (like /mambaforge/envs/my_project/lib/python-3.11/site-packages/my_project/). It means the data is assumed to lie somewhere there.

    Of course, you could add the data to your Python project via MANIFEST.in, but then the data would be copied over to the environment directory on every install, which can be very expensive.

  • The data should not be part of the application.

Describe the solution you'd like

The new structure I propose is this one.

my_project
│
├───.pytask
│
├───bld
│   └────...
│
├───data
│   └────...
│
├───src
│   └───my_project
│       ├────__init__.py
│       └────data_preparation.py
│
├───tasks
│   ├────config.py
│   └───data_preparation
│       └────task_data_preparation.py
│
└───pyproject.toml

  1. Tasks are moved to a separate folder, tasks, just like tests.
  2. Data is moved to data, out of src.

API breaking implications

None.

Describe alternatives you've considered

None.

Additional Context

Popular templates for data science projects also keep

Metadata

Metadata

Assignees

No one assigned

    Labels

    blockedThis issue is blockedenhancementNew feature or requestfeedback-wantedFeedback from everyone is requested.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions