Skip to content

Commit b05a7d6

Browse files
authoredMay 25, 2024··
Polish the documentation. (#109)
1 parent e99e530 commit b05a7d6

12 files changed

+123
-46
lines changed
 

‎.pre-commit-config.yaml

+4
Original file line numberDiff line numberDiff line change
@@ -22,6 +22,10 @@ repos:
2222
- id: python-no-log-warn
2323
- id: python-use-type-annotations
2424
- id: text-unicode-replacement-char
25+
- repo: https://github.com/aio-libs/sort-all
26+
rev: v1.2.0
27+
hooks:
28+
- id: sort-all
2529
- repo: https://github.com/astral-sh/ruff-pre-commit
2630
rev: v0.4.4
2731
hooks:

‎docs/source/api.md

+15
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,15 @@
1+
# API
2+
3+
```{eval-rst}
4+
.. currentmodule:: pytask_parallel
5+
6+
.. autoclass:: ParallelBackend
7+
.. autoclass:: ParallelBackendRegistry
8+
:members:
9+
.. autoclass:: WorkerType
10+
.. autodata:: registry
11+
12+
An instantiated :class:`~pytask_parallel.ParallelBackendRegistry` to register or
13+
overwrite parallel backends.
14+
15+
```

‎docs/source/changes.md

+1
Original file line numberDiff line numberDiff line change
@@ -28,6 +28,7 @@ releases are available on [PyPI](https://pypi.org/project/pytask-parallel) and
2828
- {pull}`106` fixes {pull}`99` such that only when there are coiled functions, all ready
2929
tasks are submitted.
3030
- {pull}`107` removes status from `pytask_execute_task_log_start` hook call.
31+
- {pull}`109` improves the documentation.
3132

3233
## 0.4.1 - 2024-01-12
3334

‎docs/source/coiled.md

+23-4
Original file line numberDiff line numberDiff line change
@@ -58,8 +58,13 @@ pytask -n 2
5858
pytask --parallel-backend loky
5959
```
6060

61+
```{note}
62+
When you build a project using coiled, you will see a message after pytask's startup
63+
that coiled is creating the remote software environment which takes 1-2m.
64+
```
65+
6166
When you apply the {func}`@task <pytask.task>` decorator to the task, make sure the
62-
{func}`@coiled.function <coiled.function>` decorator is applied first, or is closer to
67+
{func}`@coiled.function <coiled.function>` decorator is applied first or is closer to
6368
the function. Otherwise, it will be ignored. Add more arguments to the decorator to
6469
configure the hardware and software environment.
6570

@@ -71,7 +76,7 @@ By default, {func}`@coiled.function <coiled.function>`
7176
to the workload. It means that coiled infers from the number of submitted tasks and
7277
previous runtimes, how many additional remote workers it should deploy to handle the
7378
workload. It provides a convenient mechanism to scale without intervention. Also,
74-
workers launched by {func}`@coiled.function <coiled.function>` will shutdown quicker
79+
workers launched by {func}`@coiled.function <coiled.function>` will shut down quicker
7580
than a cluster.
7681

7782
```{seealso}
@@ -88,7 +93,8 @@ coiled. Usually, it is not necessary and you are better off using coiled's serve
8893
functions.
8994

9095
If you want to launch a cluster managed by coiled, register a function that builds an
91-
executor using {class}`coiled.Cluster`.
96+
executor using {class}`coiled.Cluster`. Assign a name to the cluster to reuse it when
97+
you build your project again and the cluster has not been shut down.
9298

9399
```python
94100
import coiled
@@ -98,7 +104,11 @@ from concurrent.futures import Executor
98104

99105

100106
def _build_coiled_executor(n_workers: int) -> Executor:
101-
return coiled.Cluster(n_workers=n_workers).get_client().get_executor()
107+
return (
108+
coiled.Cluster(n_workers=n_workers, name="coiled-project")
109+
.get_client()
110+
.get_executor()
111+
)
102112

103113

104114
registry.register_parallel_backend(ParallelBackend.CUSTOM, _build_coiled_executor)
@@ -109,3 +119,12 @@ Then, execute your workflow with
109119
```console
110120
pytask --parallel-backend custom
111121
```
122+
123+
## Tips
124+
125+
When you are changing your project during executions and your cluster is still up and
126+
running, the local and the remote software environment can get out of sync. Then, you
127+
see errors in remote workers you have fixed locally.
128+
129+
A quick solution is to stop the cluster in the coiled dashboard and create a new one
130+
with the next `pytask build`.

‎docs/source/custom_executors.md

+19-17
Original file line numberDiff line numberDiff line change
@@ -1,33 +1,35 @@
11
# Custom Executors
22

3-
```{caution}
4-
The interface for custom executors is rudimentary right now. Please, give some feedback
5-
if you managed to implement a custom executor or have suggestions for improvement.
6-
7-
Please, also consider contributing your executor to pytask-parallel if you believe it
3+
```{note}
4+
Please, consider contributing your executor to pytask-parallel if you believe it
85
could be helpful to other people. Start by creating an issue or a draft PR.
96
```
107

11-
pytask-parallel allows you to use your parallel backend as long as it follows the
12-
interface defined by {class}`~concurrent.futures.Executor`.
8+
pytask-parallel allows you to use any parallel backend as long as it follows the
9+
interface defined by {class}`concurrent.futures.Executor`.
1310

1411
In some cases, adding a new backend can be as easy as registering a builder function
15-
that receives some arguments (currently only `n_workers`) and returns the instantiated
16-
executor.
12+
that receives `n_workers` and returns the instantiated executor.
13+
14+
```{important}
15+
Place the following code in any module that will be imported when you are executing
16+
pytask. For example, the `src/project/config.py` in your project, the
17+
`src/project/__init__.py` or the task module directly.
18+
```
1719

1820
```{literalinclude} ../../docs_src/custom_executors.py
1921
```
2022

21-
Given {class}`pytask_parallel.WorkerType` pytask applies automatic wrappers around the
22-
task function to collect tracebacks, capture stdout/stderr and their like. The `remote`
23-
keyword allows pytask to handle local paths automatically for remote clusters.
23+
Given the optional {class}`~pytask_parallel.WorkerType` pytask applies automatic
24+
wrappers around the task function to collect tracebacks, capture stdout/stderr and their
25+
like. Possible values are `WorkerType.PROCESSES` (default) or `WorkerType.THREADS`.
2426

25-
Now, build the project requesting your custom backend.
27+
The `remote` keyword signals pytask that tasks are executed in remote workers without
28+
access to the local filesystem. pytask will then automatically sync local files to the
29+
workers. By default, pytask assumes workers have access to the local filesystem.
30+
31+
Now, build the project with your custom backend.
2632

2733
```console
2834
pytask --parallel-backend custom
2935
```
30-
31-
```{important}
32-
pytask applies automatic wrappers
33-
```

‎docs/source/dask.md

+14-11
Original file line numberDiff line numberDiff line change
@@ -8,15 +8,15 @@ package due to how pytask imports your code and dask serializes task functions
88

99
Dask is a flexible library for parallel and distributed computing. You probably know it
1010
from its {class}`dask.dataframe` that allows lazy processing of big data. Here, we use
11-
{mod}`distributed` that provides an interface similar to
12-
{class}`~concurrent.futures.Executor` to parallelize our execution.
11+
distributed that provides an interface similar to {class}`concurrent.futures.Executor`
12+
to parallelize our execution.
1313

14-
There are a couple of ways in how we can use dask.
14+
There are a couple of ways in which we can use dask.
1515

1616
## Local
1717

18-
By default, using dask as the parallel backend will launch a
19-
{class}`distributed.LocalCluster` with processes on your local machine.
18+
Using dask as the parallel backend will launch a {class}`distributed.LocalCluster` with
19+
processes on your local machine.
2020

2121
`````{tab-set}
2222
````{tab-item} CLI
@@ -53,10 +53,13 @@ terminals to launch as many dask workers as you like with
5353
dask worker <scheduler-ip>
5454
```
5555

56-
Finally, write a function to build the dask client and register it as the dask backend.
57-
Place the code somewhere in your codebase, preferably, where you store the main
58-
configuration of your project in `config.py` or another module that will be imported
59-
during execution.
56+
Finally, write a function to build the dask client and register it as the backend.
57+
58+
```{important}
59+
Place the following code in any module that will be imported when you are executing
60+
pytask. For example, the `src/project/config.py` in your project, the
61+
`src/project/__init__.py` or the task module directly.
62+
```
6063

6164
```python
6265
from pytask_parallel import ParallelBackend
@@ -73,7 +76,7 @@ registry.register_parallel_backend(ParallelBackend.DASK, _build_dask_executor)
7376
```
7477

7578
You can also register it as the custom executor using
76-
{class}`pytask_parallel.ParallelBackend.CUSTOM` to switch back to the default dask
79+
{obj}`pytask_parallel.ParallelBackend.CUSTOM` to switch back to the default dask
7780
executor quickly.
7881

7982
```{seealso}
@@ -84,7 +87,7 @@ You can find more information in the documentation for
8487
## Remote
8588

8689
You can learn how to deploy your tasks to a remote dask cluster in
87-
[this guide](https://docs.dask.org/en/stable/deploying.html). They recommend to use
90+
[this guide](https://docs.dask.org/en/stable/deploying.html). They recommend using
8891
coiled for deployment to cloud providers.
8992

9093
[coiled](https://www.coiled.io/) is a product built on top of dask that eases the

‎docs/source/index.md

+1
Original file line numberDiff line numberDiff line change
@@ -27,6 +27,7 @@ coiled
2727
dask
2828
custom_executors
2929
remote_backends
30+
api
3031
developers_guide
3132
changes
3233
On Github <https://github.com/pytask-dev/pytask-parallel>

‎docs/source/remote_backends.md

-6
Original file line numberDiff line numberDiff line change
@@ -21,12 +21,6 @@ to run their projects.
2121

2222
## Local files
2323

24-
Avoid using local files with remote backends and use storages like S3 for dependencies
25-
and products. The reason is that every local file needs to be send to the remote workers
26-
and when your internet connection is slow you will face a hefty penalty on runtime.
27-
28-
## Local paths
29-
3024
In most projects you are using local paths to refer to dependencies and products of your
3125
tasks. This becomes an interesting problem with remote workers since your local files
3226
are not necessarily available in the remote machine.

‎docs_src/custom_executors.py

+8-4
Original file line numberDiff line numberDiff line change
@@ -7,9 +7,13 @@
77

88

99
def build_custom_executor(n_workers: int) -> Executor:
10-
return CustomExecutor(
11-
max_workers=n_workers, worker_type=WorkerType.PROCESSES, remote=False
12-
)
10+
return CustomExecutor(max_workers=n_workers)
1311

1412

15-
registry.register_parallel_backend(ParallelBackend.CUSTOM, build_custom_executor)
13+
registry.register_parallel_backend(
14+
ParallelBackend.CUSTOM,
15+
build_custom_executor,
16+
# Optional defaults.
17+
worker_type=WorkerType.PROCESSES,
18+
remote=False,
19+
)

‎src/pytask_parallel/__init__.py

+8-1
Original file line numberDiff line numberDiff line change
@@ -3,6 +3,7 @@
33
from __future__ import annotations
44

55
from pytask_parallel.backends import ParallelBackend
6+
from pytask_parallel.backends import ParallelBackendRegistry
67
from pytask_parallel.backends import WorkerType
78
from pytask_parallel.backends import registry
89

@@ -14,4 +15,10 @@
1415
__version__ = "unknown"
1516

1617

17-
__all__ = ["ParallelBackend", "__version__", "registry", "WorkerType"]
18+
__all__ = [
19+
"ParallelBackend",
20+
"ParallelBackendRegistry",
21+
"WorkerType",
22+
"__version__",
23+
"registry",
24+
]

‎src/pytask_parallel/backends.py

+28-3
Original file line numberDiff line numberDiff line change
@@ -83,10 +83,26 @@ def _get_thread_pool_executor(n_workers: int) -> Executor:
8383

8484

8585
class ParallelBackend(Enum):
86-
"""Choices for parallel backends."""
86+
"""Choices for parallel backends.
87+
88+
Attributes
89+
----------
90+
NONE
91+
No parallel backend.
92+
CUSTOM
93+
A custom parallel backend.
94+
DASK
95+
A dask parallel backend.
96+
LOKY
97+
A loky parallel backend.
98+
PROCESSES
99+
A process pool parallel backend.
100+
THREADS
101+
A thread pool parallel backend.
102+
103+
"""
87104

88105
NONE = "none"
89-
90106
CUSTOM = "custom"
91107
DASK = "dask"
92108
LOKY = "loky"
@@ -95,7 +111,16 @@ class ParallelBackend(Enum):
95111

96112

97113
class WorkerType(Enum):
98-
"""A type for workers that either spawned as threads or processes."""
114+
"""A type for workers that either spawned as threads or processes.
115+
116+
Attributes
117+
----------
118+
THREADS
119+
Workers are threads.
120+
PROCESSES
121+
Workers are processes.
122+
123+
"""
99124

100125
THREADS = "threads"
101126
PROCESSES = "processes"

‎src/pytask_parallel/execute.py

+2
Original file line numberDiff line numberDiff line change
@@ -239,13 +239,15 @@ def pytask_execute_task(session: Session, task: PTask) -> Future[WrapperResult]:
239239
show_locals=session.config["show_locals"],
240240
task_filterwarnings=get_marks(task, "filterwarnings"),
241241
)
242+
242243
if worker_type == WorkerType.THREADS:
243244
# Prevent circular import for loky backend.
244245
from pytask_parallel.wrappers import wrap_task_in_thread
245246

246247
return session.config["_parallel_executor"].submit(
247248
wrap_task_in_thread, task=task, remote=False, **kwargs
248249
)
250+
249251
msg = f"Unknown worker type {worker_type}"
250252
raise ValueError(msg)
251253

0 commit comments

Comments
 (0)
Please sign in to comment.