@@ -32,27 +32,35 @@ are growing over time and you run into these problems.
32
32
## Solution
33
33
34
34
The main idea for the solution is quickly explained. We will, first, formalize
35
- dimensions into objects and, secondly, combine them in one object such that we only have
36
- to iterate over instances of this object in a single loop.
37
-
38
- We will start by defining the dimensions using {class}` ~typing.NamedTuple ` or
35
+ dimensions into objects using {class}` ~typing.NamedTuple ` or
39
36
{func}` ~dataclasses.dataclass ` .
40
37
41
- Then, we will define the object that holds both pieces of information together and for
42
- the lack of a better name, we will call it an experiment.
38
+ Secondly, we will combine dimensions in multi-dimensional objects such that we only have
39
+ to iterate over instances of this object in a single loop. Here and for the lack of a
40
+ better name, we will call the object an experiment.
41
+
42
+ Lastly, we will also use the {class}` ~pytask.DataCatalog ` to not be bothered with
43
+ defining paths.
43
44
44
- ``` {literalinclude} ../../../docs_src/how_to_guides/bp_complex_task_repetitions/experiment.py
45
+ ``` {seealso}
46
+ If you have not learned about the {class}`~pytask.DataCatalog` yet, start with the
47
+ {doc}`tutorial <../tutorials/using_a_data_catalog>` and continue with the
48
+ {doc}`how-to guide <the_data_catalog>`.
49
+ ```
50
+
51
+ ``` {literalinclude} ../../../docs_src/how_to_guides/bp_complex_task_repetitions/config.py
45
52
-- -
46
53
caption: config.py
47
54
-- -
48
55
```
49
56
50
57
There are some things to be said.
51
58
52
- - The names on each dimension need to be unique and ensure that by combining them for
53
- the name of the experiment, we get a unique and descriptive id.
54
- - Dimensions might need more attributes than just a name, like paths, or other arguments
55
- for the task. Add them.
59
+ - The ` .name ` attributes on each dimension need to return unique names and to ensure
60
+ that by combining them for the name of the experiment, we get a unique and descriptive
61
+ id.
62
+ - Dimensions might need more attributes than just a name, like paths, keys for the data
63
+ catalog, or other arguments for the task.
56
64
57
65
Next, we will use these newly defined data structures and see how our tasks change when
58
66
we use them.
@@ -63,21 +71,55 @@ caption: task_example.py
63
71
-- -
64
72
```
65
73
66
- As you see, we replaced
74
+ As you see, we lost a level of indentation and we moved all the generations of names and
75
+ paths to the dimensions and multi-dimensional objects.
67
76
68
- ## Using the ` DataCatalog `
77
+ ## Adding another level
69
78
70
- ## Adding another dimension
79
+ Extending a dimension by another level is usually quickly done. For example, if we have
80
+ another model that we want to fit to the data, we extend ` MODELS ` which will
81
+ automatically lead to all downstream tasks being created.
71
82
72
- ## Adding another level
83
+ ``` {code-block} python
84
+ ---
85
+ caption: config.py
86
+ ---
87
+ ...
88
+ MODELS = [Model("ols"), Model("logit"), Model("linear_prob"), Model("new_model")]
89
+ ...
90
+ ```
91
+
92
+ Of course, you might need to alter ` task_fit_model ` because the task needs to handle the
93
+ new model as well as the others. Here is where it pays off if you are using high-level
94
+ interfaces in your code that handle all of the models with a simple
95
+ ` fitted_model = fit_model(data=data, model_name=model_name) ` call and also return fitted
96
+ models that are similar objects.
73
97
74
98
## Executing a subset
75
99
76
- ## Grouping and aggregating
100
+ What if you want to execute a subset of tasks, for example, all tasks related to a model
101
+ or a dataset?
102
+
103
+ When you are using the ` .name ` attributes of the dimensions and multi-dimensional
104
+ objects like in the example above, you ensure that the names of dimensions are included
105
+ in all downstream tasks.
106
+
107
+ Thus, you can simply call pytask with the following expression to execute all tasks
108
+ related to the logit model.
109
+
110
+ ``` console
111
+ pytask -k logit
112
+ ```
113
+
114
+ ``` {seealso}
115
+ Expressions and markers for selecting tasks are explained in
116
+ {doc}`../tutorials/selecting_tasks`.
117
+ ```
77
118
78
119
## Extending repetitions
79
120
80
- Some parametrized tasks are costly to run - costly in terms of computing power, memory,
81
- or time. Users often extend repetitions triggering all repetitions to be rerun. Thus,
82
- use the {func}` @pytask.mark.persist <pytask.mark.persist> ` decorator, which is explained
83
- in more detail in this {doc}` tutorial <../tutorials/making_tasks_persist> ` .
121
+ Some repeated tasks are costly to run - costly in terms of computing power, memory, or
122
+ runtime. If you change a task module, you might accidentally trigger all other tasks in
123
+ the module to be rerun. Use the {func}` @pytask.mark.persist <pytask.mark.persist> `
124
+ decorator, which is explained in more detail in this
125
+ {doc}` tutorial <../tutorials/making_tasks_persist> ` .
0 commit comments