@@ -8,29 +8,32 @@ pytask's execution model can usually be separated into three phases.
8
8
9
9
But, in some situations, pytask needs to be more flexible.
10
10
11
- Imagine you want to download files from an online storage, but the total number of files
12
- and their filenames is unknown before the task has started . How can you still describe
13
- the files as products of the task?
11
+ Imagine you want to download a folder with files from an online storage. Before the task
12
+ is completed you do not know the total number of files or their filenames . How can you
13
+ still describe the files as products of the task?
14
14
15
15
And how would you define another task that depends on these files?
16
16
17
17
The following sections will explain how you use pytask in these situations.
18
18
19
19
## Producing provisional nodes
20
20
21
- Let us start with a task that downloads all files without an extension from the root
22
- folder of the pytask repository and stores them on disk in a folder called ` downloads ` .
21
+ As an example for the aforementioned scenario, let us write a task that downloads all
22
+ files without a file extension from the root folder of the pytask GitHub repository. The
23
+ files are downloaded to a folder called ` downloads ` . ` downloads ` is in the same folder
24
+ as the task module because it is a relative path.
23
25
24
26
``` {literalinclude} ../../../docs_src/how_to_guides/provisional_products.py
25
27
-- -
26
- emphasize- lines: 4 , 11
28
+ emphasize- lines: 4 , 22
27
29
-- -
28
30
```
29
31
30
32
Since the names of the files are not known when pytask is started, we need to use a
31
- {class}` ~pytask.DirectoryNode ` . With a {class}` ~pytask.DirectoryNode ` we can specify
32
- where pytask can find the files. The files are described with a path (default is the
33
- directory of the task module) and a glob pattern (default is ` * ` ).
33
+ {class}` ~pytask.DirectoryNode ` to define the task's product. With a
34
+ {class}` ~pytask.DirectoryNode ` we can specify where pytask can find the files. The files
35
+ are described with a root path (default is the directory of the task module) and a glob
36
+ pattern (default is ` * ` ).
34
37
35
38
When we use the {class}` ~pytask.DirectoryNode ` as a product annotation, we get access to
36
39
the ` root_dir ` as a {class}` ~pathlib.Path ` object inside the function, which allows us
@@ -49,16 +52,19 @@ actual nodes. A {class}`~pytask.DirectoryNode`, for example, returns
49
52
In the next step, we want to define a task that consumes and merges all previously
50
53
downloaded files into one file.
51
54
55
+ The difficulty here is how can we reference the downloaded files before they have been
56
+ downloaded.
57
+
52
58
``` {literalinclude} ../../../docs_src/how_to_guides/provisional_task.py
53
59
-- -
54
60
emphasize- lines: 9
55
61
-- -
56
62
```
57
63
58
- Here, the {class} ` ~pytask.DirectoryNode ` is a dependency because we do not know the
59
- names of the downloaded files . Before the task is executed, the list of files in the
60
- folder defined by the root path and the pattern are automatically collected and passed
61
- to the task.
64
+ To reference the files that will be downloaded, we use the
65
+ {class} ` ~pytask.DirectoryNode ` is a dependency . Before the task is executed, the list of
66
+ files in the folder defined by the root path and the pattern are automatically collected
67
+ and passed to the task.
62
68
63
69
If we use a {class}` ~pytask.DirectoryNode ` with the same ` root_dir ` and ` pattern ` in
64
70
both tasks, pytask will automatically recognize that the second task depends on the
0 commit comments