Skip to content

Commit 74ad361

Browse files
authored
Spell and grammar checking for more docs. (#281)
1 parent d8d3506 commit 74ad361

12 files changed

+190
-195
lines changed

docs/source/developers_guide.md

+12
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,17 @@
11
# Developer's Guide
22

3+
## Testing
4+
5+
Run pytest to execute the test suite.
6+
7+
The test suite creates many temporary directories. There is usually a limit on the
8+
number of open file descriptors on Unix systems which causes some tests and the end of
9+
the test suite to fail. If that happens, increase the limit with the following command.
10+
11+
```console
12+
$ ulimit -n 4096
13+
```
14+
315
## How to release
416

517
The following list covers all steps of a release cycle.

docs/source/explanations/pluggy.md

+8-8
Original file line numberDiff line numberDiff line change
@@ -5,15 +5,15 @@
55
pluggy ([^id4], [^id5], [^id6]) is at the heart of pytask and enables its plugin system.
66
The mechanism to achieve extensibility is called {term}`hooking`.
77

8-
At certain points, pytask, or more generally the host, implements entry-points which are
9-
called hook specifications. At these entry-points the host sends a message to all
10-
plugins which target this entry-point. The recipient of the message is implemented by
11-
the plugin and called a hook implementation. The hook implementation receives the
12-
message and can decide whether to send a response or not. Then, the host receives the
13-
responses and can decide whether to process all or just the first valid return.
8+
At specific points, pytask, or more generally the host, implements entry-points called
9+
hook specifications. At these entry-points, the host sends a message to all plugins
10+
which target this entry-point. The message's recipient is implemented by the plugin and
11+
called a hook implementation. The hook implementation receives the message and can
12+
decide whether to send a response or not. Then, the host gets the responses and can
13+
choose whether to process all or just the first valid return.
1414

1515
In contrast to some other mechanisms to change the behavior of a program (like method
16-
overriding, monkey patching), hooking excels at allowing multiple plugins to work
16+
overriding and monkey patching), hooking excels at allowing multiple plugins to work
1717
alongside each other.
1818

1919
It is the host's responsibility to design the entry-points in a way such that
@@ -22,7 +22,7 @@ It is the host's responsibility to design the entry-points in a way such that
2222
goal efficiently.
2323
- many plugins can work alongside each other.
2424
- the necessary knowledge about pytask to implement a plugin is somewhat proportional to
25-
the complexity of plugin's provided functionality.
25+
the complexity of the plugin's provided functionality.
2626

2727
## References
2828

+14-13
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
# Why pytask?
22

3-
There are a lot of workflow management systems out there with existing communities who
3+
There are a lot of workflow management systems out there with existing communities that
44
accumulated a lot of experience over time. So why bother creating another workflow
55
management system?
66

@@ -11,23 +11,24 @@ provide a [steep learning curve](https://english.stackexchange.com/a/6226).
1111

1212
pytask tries to address this point in many ways.
1313

14-
1. pytask is written in Python which is one of the most popular and fastest growing
15-
languages in the realm of scientific computing.
14+
1. pytask is written in Python, one of the most popular and fastest growing languages in
15+
scientific computing.
1616

17-
1. For those who know pytest, the main testing framework in Python, pytask will look
18-
extremely familiar and you will feel productive quickly. If you do not know pytest,
19-
you will learn two tools at the same time.
17+
1. For those who know pytest, the primary testing framework in Python, pytask will look
18+
highly familiar, and you will feel productive quickly. If you do not know pytest, you
19+
will learn two tools simultaneously.
2020

2121
1. pytask tries to improve your productivity by offering a couple of features like
2222
{doc}`repeating tasks <../tutorials/repeating_tasks_with_different_inputs>`,
2323
{doc}`debugging of tasks <../tutorials/debugging>` and
2424
{doc}`selecting subsets of tasks <../tutorials/selecting_tasks>`.
2525

26-
1. pytask integrates with other tools which are used in the scientific community such as
27-
R and Julia and offers solutions to bridge the gap between a
28-
{term}`workflow management system` written in Python and scripts in another language,
29-
for example, by making paths to dependencies and products usable in the scripts.
26+
1. pytask integrates with other tools used in the scientific community, such as R and
27+
Julia, and offers solutions to bridge the gap between a
28+
{term}`workflow management system` written in Python and scripts in another language.
29+
For example, pytask makes paths to dependencies and products available in the
30+
scripts.
3031

31-
1. The plugin system let's power users tailor pytask to their needs by adding additional
32-
functionality. It makes pytask extremely versatile and offers people from different
33-
backgrounds to collaborate on the same software.
32+
1. The plugin system lets power users tailor pytask to their needs by adding additional
33+
functionality. It makes pytask extraordinarily versatile and offers people from
34+
different backgrounds to collaborate on the same software.

docs/source/how_to_guides/bp_scalable_repititions_of_tasks.md docs/source/how_to_guides/bp_scalable_repetitions_of_tasks.md

+26-30
Original file line numberDiff line numberDiff line change
@@ -1,28 +1,27 @@
1-
# Scalable repititions of tasks
1+
# Scalable repetitions of tasks
22

3-
This section gives advice on how to use repitions to quickly scale your project.
3+
This section advises on how to use repetitions to scale your project quickly.
44

55
## TL;DR
66

7-
- Loop over dictionaries which map ids to `kwargs` to create multiple tasks.
7+
- Loop over dictionaries that map ids to `kwargs` to create multiple tasks.
88
- Create the dictionary with a separate function.
99
- Create functions to build intermediate objects like output paths which can be shared
1010
more easily across tasks than the generated values.
1111

1212
## Scalability
1313

14-
Parametrizations allow to scale tasks from $1$ to $N$ in a simple way. What is easily
14+
Parametrizations allow scaling tasks from $1$ to $N$ in a simple way. What is easily
1515
overlooked is that parametrizations usually trigger other parametrizations and the
1616
growth in tasks is more $1$ to $N \cdot M \cdot \dots$ or $1$ to $N^{M \cdot \dots}$.
1717

18-
To keep the resulting complexity as manageable as possible, this guide lays out a
19-
structure which is simple, modular, and scalable.
18+
This guide lays out a simple, modular, and scalable structure to fight complexity.
2019

21-
As an example, assume we have four datasets with one binary dependent variables and some
22-
independent variables. On each of the data sets, we fit three models, a linear model, a
23-
logistic model, and a decision tree. In total, we have $4 \cdot 3 = 12$ tasks.
20+
For example, assume we have four datasets with one binary dependent variable and some
21+
independent variables. We fit three models on each data set: a linear model, a logistic
22+
model, and a decision tree. In total, we have $4 \cdot 3 = 12$ tasks.
2423

25-
First, let us take a look at the folder and file structure of such a project.
24+
First, let us look at the folder and file structure of such a project.
2625

2726
```
2827
my_project
@@ -56,12 +55,12 @@ my_project
5655
└───bld
5756
```
5857

59-
The folder structure, the main `config.py` which holds `SRC` and `BLD` and the tasks
60-
follow the same structure which is advocated for throughout the tutorials.
58+
The folder structure, the main `config.py` which holds `SRC` and `BLD`, and the tasks
59+
follow the same structure advocated throughout the tutorials.
6160

62-
What is new are the local configuration files in each of the subfolders of `my_project`
63-
which contain objects which are shared across tasks. For example, `config.py` holds the
64-
paths to the processed data and the names of the data sets.
61+
What is new are the local configuration files in each subfolder of `my_project`, which
62+
contain objects shared across tasks. For example, `config.py` holds the paths to the
63+
processed data and the names of the data sets.
6564

6665
```python
6766
# Content of config.py
@@ -81,8 +80,7 @@ def path_to_processed_data(name):
8180
return BLD / "data" / f"processed_{name}.pkl"
8281
```
8382

84-
In the task file `task_prepare_data.py`, these objects are used to build the
85-
parametrization.
83+
The task file `task_prepare_data.py` uses these objects to build the parametrization.
8684

8785
```python
8886
# Content of task_prepare_data.py
@@ -115,8 +113,8 @@ for id_, kwargs in _ID_TO_KWARGS.items():
115113
```
116114

117115
All arguments for the loop and the {func}`@pytask.mark.task <pytask.mark.task>`
118-
decorator are built within a function to keep the logic in one place and the namespace
119-
of the module clean.
116+
decorator is built within a function to keep the logic in one place and the module's
117+
namespace clean.
120118

121119
Ids are used to make the task {ref}`ids <ids>` more descriptive and to simplify their
122120
selection with {ref}`expressions <expressions>`. Here is an example of the task ids with
@@ -152,15 +150,15 @@ def path_to_estimation_result(name):
152150
```
153151

154152
In the local configuration, we define `ESTIMATIONS` which combines the information on
155-
data and model. The key of the dictionary can be used as a task id whenever the
156-
estimation is involved. This allows to trigger all tasks related to one estimation -
157-
estimation, figures, tables - with one command
153+
data and model. The dictionary's key can be used as a task id whenever the estimation is
154+
involved. It allows triggering all tasks related to one estimation - estimation,
155+
figures, tables - with one command.
158156

159157
```console
160158
pytask -k linear_probability_data_0
161159
```
162160

163-
And, here is the task file.
161+
And here is the task file.
164162

165163
```python
166164
# Content of task_estimate_models.py
@@ -198,13 +196,11 @@ for id_, kwargs in _ID_TO_KWARGS.items():
198196
...
199197
```
200198

201-
Replicating this pattern across a project allows for a clean way to define
202-
parametrizations.
199+
Replicating this pattern across a project allows a clean way to define parametrizations.
203200

204201
## Extending parametrizations
205202

206-
Some parametrized tasks are extremely expensive to run - be it in terms of computing
207-
power, memory or time. On the other hand, parametrizations are often extended which
208-
could also trigger all parametrizations to be rerun. Thus, use the
209-
{func}`@pytask.mark.persist <pytask.mark.persist>` decorator which is explained in more
210-
detail in this {doc}`tutorial <../tutorials/making_tasks_persist>`.
203+
Some parametrized tasks are costly to run - costly in terms of computing power, memory,
204+
or time. Users often extend parametrizations triggering all parametrizations to be
205+
rerun. Thus, use the {func}`@pytask.mark.persist <pytask.mark.persist>` decorator, which
206+
is explained in more detail in this {doc}`tutorial <../tutorials/making_tasks_persist>`.

docs/source/how_to_guides/index.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -36,5 +36,5 @@ maxdepth: 1
3636
bp_structure_of_a_research_project
3737
bp_structure_of_task_files
3838
bp_templates_and_projects
39-
bp_scalable_repititions_of_tasks
39+
bp_scalable_repetitions_of_tasks
4040
```

docs/source/how_to_guides/repeating_tasks_with_different_inputs_the_pytest_way.md

+32-35
Original file line numberDiff line numberDiff line change
@@ -1,31 +1,28 @@
11
# Repeating tasks with different inputs - The pytest way
22

3-
You want to define a task which should be repeated over a range of inputs? Loop over
4-
your task function!
5-
6-
:::{hint}
7-
The process of repeating a function with different inputs is called parametrizations.
8-
:::
9-
103
:::{important}
114
This guide shows you how to parametrize tasks with the pytest approach. For the new and
125
preferred approach, see this
136
{doc}`tutorial <../tutorials/repeating_tasks_with_different_inputs>`.
147
:::
158

16-
You want to define a task which should be repeated over a range of inputs? Parametrize
9+
Do you want to define a task repeating an action over a range of inputs? Parametrize
1710
your task function!
1811

12+
:::{hint}
13+
The process of repeating a function with different inputs is called parametrizations.
14+
:::
15+
1916
:::{seealso}
2017
If you want to know more about best practices for parametrizations, check out this
21-
{doc}`guide <../how_to_guides/bp_scalable_repititions_of_tasks>` after you made yourself
22-
familiar this tutorial.
18+
{doc}`guide <../how_to_guides/bp_scalable_repititions_of_tasks>` after you have made
19+
yourself familiar with this tutorial.
2320
:::
2421

2522
## An example
2623

27-
We reuse the previous example of a task which generates random data and repeat the same
28-
operation over a number of seeds to receive multiple, reproducible samples.
24+
We reuse the previous example of a task that generates random data and repeat the same
25+
operation over some seeds to receive multiple, reproducible samples.
2926

3027
First, we write the task for one seed.
3128

@@ -61,12 +58,12 @@ specifies the name of a task function argument.
6158
The signature is explained in detail {ref}`below <parametrize-signature>`.
6259
:::
6360

64-
The second argument of the parametrize decorator is a list (or any iterable) which has
65-
as many elements as there are iterations over the task function. Each element has to
66-
provide one value for each argument name in the signature - two in this case.
61+
The second argument of the parametrize decorator is a list with one element per
62+
iteration. Each element must provide one value for each argument name in the signature -
63+
two in this case.
6764

68-
Putting all together, the task is executed three times and each run the path from the
69-
list is mapped to the argument `produces` and `seed` receives the seed.
65+
pytask executes the task function three times and passes the path from the list to the
66+
argument `produces` and the seed to `seed`.
7067

7168
:::{note}
7269
If you use `produces` or `depends_on` in the signature of the parametrize decorator, the
@@ -77,7 +74,7 @@ values are handled as if they were attached to the function with
7774

7875
## Un-parametrized dependencies
7976

80-
To specify a dependency which is the same for all parametrizations, add it with
77+
To specify a dependency that is the same for all parametrizations, add it with
8178
{func}`@pytask.mark.depends_on <pytask.mark.depends_on>`.
8279

8380
```python
@@ -95,10 +92,10 @@ def task_create_random_data(seed, produces):
9592

9693
## The signature
9794

98-
The signature can be passed in three different formats.
95+
pytask allows for three different kinds of formats for the signature.
9996

100-
1. The signature can be a comma-separated string like an entry in a csv table. Note that
101-
white-space is stripped from each name which you can use to separate the names for
97+
1. The signature can be a comma-separated string like an entry in a CSV table. Note that
98+
white space is stripped from each name which you can use to separate the names for
10299
readability. Here are some examples:
103100

104101
```python
@@ -114,41 +111,41 @@ The signature can be passed in three different formats.
114111
("first_argument", "second_argument")
115112
```
116113

117-
1. Finally, it is also possible to use a list of strings.
114+
1. Finally, using a list of strings is also possible.
118115

119116
```python
120117
["first_argument", "second_argument"]
121118
```
122119

123120
## The id
124121

125-
Every task has a unique id which can be used to
126-
{doc}`select it <../tutorials/selecting_tasks>`. The normal id combines the path to
127-
the module where the task is defined, a double colon, and the name of the task function.
122+
Every task has a unique id that can be used to
123+
{doc}`select it <../tutorials/selecting_tasks>`. The normal id combines the path to the
124+
module where the task is defined, a double colon, and the name of the task function.
128125
Here is an example.
129126

130127
```
131128
../task_example.py::task_example
132129
```
133130

134131
This behavior would produce duplicate ids for parametrized tasks. Therefore, there exist
135-
multiple mechanisms to produce unique ids.
132+
multiple mechanisms to have unique ids.
136133

137134
(auto-generated-ids)=
138135

139136
### Auto-generated ids
140137

141-
To avoid duplicate task ids, the ids of parametrized tasks are extended with
142-
descriptions of the values they are parametrized with. Booleans, floats, integers and
143-
strings enter the task id directly. For example, a task function which receives four
144-
arguments, `True`, `1.0`, `2`, and `"hello"`, one of each dtype, has the following id.
138+
pytask construct ids by extending the task name with representations of the values used
139+
for each iteration. Booleans, floats, integers, and strings enter the task id directly.
140+
For example, a task function that receives four arguments, `True`, `1.0`, `2`, and
141+
`"hello"`, one of each dtype, has the following id.
145142

146143
```
147144
task_example.py::task_example[True-1.0-2-hello]
148145
```
149146

150-
Arguments with other dtypes cannot be easily converted to strings and, thus, are
151-
replaced with a combination of the argument name and the iteration counter.
147+
Arguments with other dtypes cannot be converted to strings and, thus, are replaced with
148+
a combination of the argument name and the iteration counter.
152149

153150
For example, the following function is parametrized with tuples.
154151

@@ -192,10 +189,10 @@ task_example.py::task_example[second] # (1,)
192189
To change the representation of tuples and other objects, you can pass a function to the
193190
`ids` argument of the {func}`@pytask.mark.parametrize <pytask.mark.parametrize>`
194191
decorator. The function is called for every argument and may return a boolean, number,
195-
or string which will be integrated into the id. For every other return, the
192+
or string, which will be integrated into the id. For every other return, the
196193
auto-generated value is used.
197194

198-
To get a unique representation of a tuple, we can use the hash value.
195+
We can use the hash value to get a unique representation of a tuple.
199196

200197
```python
201198
def tuple_to_hash(value):
@@ -208,7 +205,7 @@ def task_example(i):
208205
pass
209206
```
210207

211-
This produces the following ids:
208+
The tasks have the following ids:
212209

213210
```
214211
task_example.py::task_example[3430018387555] # (0,)

0 commit comments

Comments
 (0)