|
| 1 | +# Provisional nodes and task generators |
| 2 | + |
| 3 | +pytask's execution model can usually be separated into three phases. |
| 4 | + |
| 5 | +1. Collection of tasks, dependencies, and products. |
| 6 | +1. Building the DAG. |
| 7 | +1. Executing the tasks. |
| 8 | + |
| 9 | +But, in some situations, pytask needs to be more flexible. |
| 10 | + |
| 11 | +Imagine you want to download a folder with files from an online storage. Before the task |
| 12 | +is completed you do not know the total number of files or their filenames. How can you |
| 13 | +still describe the files as products of the task? |
| 14 | + |
| 15 | +And how would you define another task that depends on these files? |
| 16 | + |
| 17 | +The following sections will explain how you use pytask in these situations. |
| 18 | + |
| 19 | +## Producing provisional nodes |
| 20 | + |
| 21 | +As an example for the aforementioned scenario, let us write a task that downloads all |
| 22 | +files without a file extension from the root folder of the pytask GitHub repository. The |
| 23 | +files are downloaded to a folder called `downloads`. `downloads` is in the same folder |
| 24 | +as the task module because it is a relative path. |
| 25 | + |
| 26 | +```{literalinclude} ../../../docs_src/how_to_guides/provisional_products.py |
| 27 | +--- |
| 28 | +emphasize-lines: 4, 22 |
| 29 | +--- |
| 30 | +``` |
| 31 | + |
| 32 | +Since the names of the files are not known when pytask is started, we need to use a |
| 33 | +{class}`~pytask.DirectoryNode` to define the task's product. With a |
| 34 | +{class}`~pytask.DirectoryNode` we can specify where pytask can find the files. The files |
| 35 | +are described with a root path (default is the directory of the task module) and a glob |
| 36 | +pattern (default is `*`). |
| 37 | + |
| 38 | +When we use the {class}`~pytask.DirectoryNode` as a product annotation, we get access to |
| 39 | +the `root_dir` as a {class}`~pathlib.Path` object inside the function, which allows us |
| 40 | +to store the files. |
| 41 | + |
| 42 | +```{note} |
| 43 | +The {class}`~pytask.DirectoryNode` is a provisional node that implements |
| 44 | +{class}`~pytask.PProvisionalNode`. A provisional node is not a {class}`~pytask.PNode`, |
| 45 | +but when its {meth}`~pytask.PProvisionalNode.collect` method is called, it returns |
| 46 | +actual nodes. A {class}`~pytask.DirectoryNode`, for example, returns |
| 47 | +{class}`~pytask.PathNode`. |
| 48 | +``` |
| 49 | + |
| 50 | +## Depending on provisional nodes |
| 51 | + |
| 52 | +In the next step, we want to define a task that consumes and merges all previously |
| 53 | +downloaded files into one file. |
| 54 | + |
| 55 | +The difficulty here is how can we reference the downloaded files before they have been |
| 56 | +downloaded. |
| 57 | + |
| 58 | +```{literalinclude} ../../../docs_src/how_to_guides/provisional_task.py |
| 59 | +--- |
| 60 | +emphasize-lines: 9 |
| 61 | +--- |
| 62 | +``` |
| 63 | + |
| 64 | +To reference the files that will be downloaded, we use the |
| 65 | +{class}`~pytask.DirectoryNode` is a dependency. Before the task is executed, the list of |
| 66 | +files in the folder defined by the root path and the pattern are automatically collected |
| 67 | +and passed to the task. |
| 68 | + |
| 69 | +If we use a {class}`~pytask.DirectoryNode` with the same `root_dir` and `pattern` in |
| 70 | +both tasks, pytask will automatically recognize that the second task depends on the |
| 71 | +first. If that is not true, you might need to make this dependency more explicit by |
| 72 | +using {func}`@task(after=...) <pytask.task>`, which is explained {ref}`here <after>`. |
| 73 | + |
| 74 | +## Task generators |
| 75 | + |
| 76 | +What if we wanted to process each downloaded file separately instead of dealing with |
| 77 | +them in one task? |
| 78 | + |
| 79 | +For that, we have to write a task generator to define an unknown number of tasks for an |
| 80 | +unknown number of downloaded files. |
| 81 | + |
| 82 | +A task generator is a task function in which we define more tasks, just as if we were |
| 83 | +writing functions in a task module. |
| 84 | + |
| 85 | +The code snippet shows each task takes one of the downloaded files and copies its |
| 86 | +content to a `.txt` file. |
| 87 | + |
| 88 | +```{literalinclude} ../../../docs_src/how_to_guides/provisional_task_generator.py |
| 89 | +``` |
| 90 | + |
| 91 | +```{important} |
| 92 | +The generated tasks need to be decoratored with {func}`@task <pytask.task>` to be |
| 93 | +collected. |
| 94 | +``` |
0 commit comments