Skip to content

Commit 930c6b0

Browse files
authored
Merge branch 'main' into uv
2 parents 9479410 + ccf3873 commit 930c6b0

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

59 files changed

+1501
-315
lines changed

docs/source/changes.md

+16
Original file line numberDiff line numberDiff line change
@@ -24,6 +24,21 @@ releases are available on [PyPI](https://pypi.org/project/pytask) and
2424
- {pull}`571` removes redundant calls to `PNode.state()` which causes a high penalty for
2525
remote files.
2626
- {pull}`573` removes the `pytask_execute_create_scheduler` hook.
27+
- {pull}`579` fixes an interaction with `--pdb` and `--trace` and task that return. The
28+
debugging modes swallowed the return and `None` was returned. Closes {issue}`574`.
29+
- {pull}`581` simplifies the code for tracebacks and unpublishes some utility functions.
30+
- {pull}`586` improves linting.
31+
- {pull}`587` improves typing of `capture.py`.
32+
- {pull}`588` resets class variables of `ExecutionReport` and `Traceback`.
33+
- {pull}`590` fixes an error introduced in {pull}`588`.
34+
- {pull}`591` invalidates the cache of fsspec when checking whether a remote file
35+
exists. Otherwise, a remote file might be reported as missing although it was just
36+
created. See https://github.com/fsspec/s3fs/issues/851 for more info.
37+
38+
## 0.4.6 - 2024-03-13
39+
40+
- {pull}`576` fixes accidentally collecting `pytask.MarkGenerator` when using
41+
`from pytask import mark`.
2742

2843
## 0.4.5 - 2024-01-09
2944

@@ -66,6 +81,7 @@ releases are available on [PyPI](https://pypi.org/project/pytask) and
6681
- {pull}`485` adds missing steps to unconfigure pytask after the job is done, which
6782
caused flaky tests.
6883
- {pull}`486` adds default names to {class}`~pytask.PPathNode`.
84+
- {pull}`487` implements task generators and provisional nodes.
6985
- {pull}`488` raises an error when an invalid value is used in a return annotation.
7086
- {pull}`489` and {pull}`491` simplifies parsing products and does not raise an error
7187
when a product annotation is used with the argument name `produces`. And allow

docs/source/how_to_guides/index.md

+1
Original file line numberDiff line numberDiff line change
@@ -19,6 +19,7 @@ capture_warnings
1919
how_to_influence_build_order
2020
hashing_inputs_of_tasks
2121
using_task_returns
22+
provisional_nodes_and_task_generators
2223
writing_custom_nodes
2324
extending_pytask
2425
the_data_catalog
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,94 @@
1+
# Provisional nodes and task generators
2+
3+
pytask's execution model can usually be separated into three phases.
4+
5+
1. Collection of tasks, dependencies, and products.
6+
1. Building the DAG.
7+
1. Executing the tasks.
8+
9+
But, in some situations, pytask needs to be more flexible.
10+
11+
Imagine you want to download a folder with files from an online storage. Before the task
12+
is completed you do not know the total number of files or their filenames. How can you
13+
still describe the files as products of the task?
14+
15+
And how would you define another task that depends on these files?
16+
17+
The following sections will explain how you use pytask in these situations.
18+
19+
## Producing provisional nodes
20+
21+
As an example for the aforementioned scenario, let us write a task that downloads all
22+
files without a file extension from the root folder of the pytask GitHub repository. The
23+
files are downloaded to a folder called `downloads`. `downloads` is in the same folder
24+
as the task module because it is a relative path.
25+
26+
```{literalinclude} ../../../docs_src/how_to_guides/provisional_products.py
27+
---
28+
emphasize-lines: 4, 22
29+
---
30+
```
31+
32+
Since the names of the files are not known when pytask is started, we need to use a
33+
{class}`~pytask.DirectoryNode` to define the task's product. With a
34+
{class}`~pytask.DirectoryNode` we can specify where pytask can find the files. The files
35+
are described with a root path (default is the directory of the task module) and a glob
36+
pattern (default is `*`).
37+
38+
When we use the {class}`~pytask.DirectoryNode` as a product annotation, we get access to
39+
the `root_dir` as a {class}`~pathlib.Path` object inside the function, which allows us
40+
to store the files.
41+
42+
```{note}
43+
The {class}`~pytask.DirectoryNode` is a provisional node that implements
44+
{class}`~pytask.PProvisionalNode`. A provisional node is not a {class}`~pytask.PNode`,
45+
but when its {meth}`~pytask.PProvisionalNode.collect` method is called, it returns
46+
actual nodes. A {class}`~pytask.DirectoryNode`, for example, returns
47+
{class}`~pytask.PathNode`.
48+
```
49+
50+
## Depending on provisional nodes
51+
52+
In the next step, we want to define a task that consumes and merges all previously
53+
downloaded files into one file.
54+
55+
The difficulty here is how can we reference the downloaded files before they have been
56+
downloaded.
57+
58+
```{literalinclude} ../../../docs_src/how_to_guides/provisional_task.py
59+
---
60+
emphasize-lines: 9
61+
---
62+
```
63+
64+
To reference the files that will be downloaded, we use the
65+
{class}`~pytask.DirectoryNode` is a dependency. Before the task is executed, the list of
66+
files in the folder defined by the root path and the pattern are automatically collected
67+
and passed to the task.
68+
69+
If we use a {class}`~pytask.DirectoryNode` with the same `root_dir` and `pattern` in
70+
both tasks, pytask will automatically recognize that the second task depends on the
71+
first. If that is not true, you might need to make this dependency more explicit by
72+
using {func}`@task(after=...) <pytask.task>`, which is explained {ref}`here <after>`.
73+
74+
## Task generators
75+
76+
What if we wanted to process each downloaded file separately instead of dealing with
77+
them in one task?
78+
79+
For that, we have to write a task generator to define an unknown number of tasks for an
80+
unknown number of downloaded files.
81+
82+
A task generator is a task function in which we define more tasks, just as if we were
83+
writing functions in a task module.
84+
85+
The code snippet shows each task takes one of the downloaded files and copies its
86+
content to a `.txt` file.
87+
88+
```{literalinclude} ../../../docs_src/how_to_guides/provisional_task_generator.py
89+
```
90+
91+
```{important}
92+
The generated tasks need to be decoratored with {func}`@task <pytask.task>` to be
93+
collected.
94+
```

docs/source/reference_guides/api.md

+4-1
Original file line numberDiff line numberDiff line change
@@ -190,6 +190,8 @@ Protocols define how tasks and nodes for dependencies and products have to be se
190190
:show-inheritance:
191191
.. autoprotocol:: pytask.PTaskWithPath
192192
:show-inheritance:
193+
.. autoprotocol:: pytask.PProvisionalNode
194+
:show-inheritance:
193195
```
194196

195197
## Nodes
@@ -203,6 +205,8 @@ Nodes are the interface for different kinds of dependencies or products.
203205
:members:
204206
.. autoclass:: pytask.PythonNode
205207
:members:
208+
.. autoclass:: pytask.DirectoryNode
209+
:members:
206210
```
207211

208212
To parse dependencies and products from nodes, use the following functions.
@@ -330,7 +334,6 @@ resolution and execution.
330334
## Tracebacks
331335

332336
```{eval-rst}
333-
.. autofunction:: pytask.remove_internal_traceback_frames_from_exc_info
334337
.. autoclass:: pytask.Traceback
335338
```
336339

docs/source/tutorials/skipping_tasks.md

+3-3
Original file line numberDiff line numberDiff line change
@@ -13,18 +13,18 @@ skip tasks during development that take too much time to compute right now.
1313
```{literalinclude} ../../../docs_src/tutorials/skipping_tasks_example_1.py
1414
```
1515

16-
Not only will this task be skipped, but all tasks that depend on
16+
Not only will this task be skipped, but all tasks depending on
1717
`time_intensive_product.pkl`.
1818

1919
## Conditional skipping
2020

2121
In large projects, you may have many long-running tasks that you only want to execute on
22-
a remote server but not when you are not working locally.
22+
a remote server, but not when you are not working locally.
2323

2424
In this case, use the {func}`@pytask.mark.skipif <pytask.mark.skipif>` decorator, which
2525
requires a condition and a reason as arguments.
2626

27-
Place the condition variable in a different module than the task, so you can change it
27+
Place the condition variable in a module different from the task so you can change it
2828
without causing a rerun of the task.
2929

3030
```python
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,33 @@
1+
from pathlib import Path
2+
3+
import httpx
4+
from pytask import DirectoryNode
5+
from pytask import Product
6+
from typing_extensions import Annotated
7+
8+
9+
def get_files_without_file_extensions_from_repo() -> list[str]:
10+
url = "https://api.github.com/repos/pytask-dev/pytask/git/trees/main"
11+
response = httpx.get(url)
12+
elements = response.json()["tree"]
13+
return [
14+
e["path"]
15+
for e in elements
16+
if e["type"] == "blob" and Path(e["path"]).suffix == ""
17+
]
18+
19+
20+
def task_download_files(
21+
download_folder: Annotated[
22+
Path, DirectoryNode(root_dir=Path("downloads"), pattern="*"), Product
23+
],
24+
) -> None:
25+
"""Download files."""
26+
# Contains names like CITATION or LICENSE.
27+
files_to_download = get_files_without_file_extensions_from_repo()
28+
29+
for file_ in files_to_download:
30+
url = "raw.githubusercontent.com/pytask-dev/pytask/main"
31+
response = httpx.get(url=f"{url}/{file_}", timeout=5)
32+
content = response.text
33+
download_folder.joinpath(file_).write_text(content)
+14
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,14 @@
1+
from pathlib import Path
2+
3+
from pytask import DirectoryNode
4+
from typing_extensions import Annotated
5+
6+
7+
def task_merge_files(
8+
paths: Annotated[
9+
list[Path], DirectoryNode(root_dir=Path("downloads"), pattern="*")
10+
],
11+
) -> Annotated[str, Path("all_text.txt")]:
12+
"""Merge files."""
13+
contents = [path.read_text() for path in paths]
14+
return "\n".join(contents)
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,21 @@
1+
from pathlib import Path
2+
3+
from pytask import DirectoryNode
4+
from pytask import task
5+
from typing_extensions import Annotated
6+
7+
8+
@task(is_generator=True)
9+
def task_copy_files(
10+
paths: Annotated[
11+
list[Path], DirectoryNode(root_dir=Path("downloads"), pattern="*")
12+
],
13+
) -> None:
14+
"""Create tasks to copy each file to a ``.txt`` file."""
15+
for path in paths:
16+
# The path of the copy will be CITATION.txt, for example.
17+
path_to_copy = path.with_suffix(".txt")
18+
19+
@task
20+
def copy_file(path: Annotated[Path, path]) -> Annotated[str, path_to_copy]:
21+
return path.read_text()

pyproject.toml

+3-5
Original file line numberDiff line numberDiff line change
@@ -112,14 +112,12 @@ extend-include = ["*.ipynb"]
112112
[tool.ruff.lint]
113113
select = ["ALL"]
114114
ignore = [
115-
"FBT", # flake8-boolean-trap
116-
"TRY", # ignore tryceratops.
117-
# Others.
118-
"ANN101", # type annotating self
119-
"ANN102", # type annotating cls
115+
"ANN101",
116+
"ANN102",
120117
"ANN401", # flake8-annotate typing.Any
121118
"COM812", # Comply with ruff-format.
122119
"ISC001", # Comply with ruff-format.
120+
"FBT",
123121
"PD901", # Avoid generic df for dataframes.
124122
"S101", # raise errors for asserts.
125123
"S603", # Call check with subprocess.run.

scripts/update_plugin_list.py

+1-1
Original file line numberDiff line numberDiff line change
@@ -70,7 +70,7 @@
7070
)
7171

7272

73-
_EXCLUDED_PACKAGES = ["pytask-io"]
73+
_EXCLUDED_PACKAGES = ["pytask-io", "pytask-list"]
7474

7575

7676
def _escape_rst(text: str) -> str:

src/_pytask/_inspect.py

+1-1
Original file line numberDiff line numberDiff line change
@@ -106,7 +106,7 @@ def get_annotations( # noqa: C901, PLR0912, PLR0915
106106

107107
if not isinstance(ann, dict):
108108
msg = f"{obj!r}.__annotations__ is neither a dict nor None"
109-
raise ValueError(msg)
109+
raise ValueError(msg) # noqa: TRY004
110110

111111
if not ann:
112112
return {}

src/_pytask/build.py

+4-9
Original file line numberDiff line numberDiff line change
@@ -13,7 +13,6 @@
1313
from typing import Literal
1414

1515
import click
16-
from rich.traceback import Traceback
1716

1817
from _pytask.capture_utils import CaptureMethod
1918
from _pytask.capture_utils import ShowCapture
@@ -34,7 +33,7 @@
3433
from _pytask.session import Session
3534
from _pytask.shared import parse_paths
3635
from _pytask.shared import to_list
37-
from _pytask.traceback import remove_internal_traceback_frames_from_exc_info
36+
from _pytask.traceback import Traceback
3837

3938
if TYPE_CHECKING:
4039
from typing import NoReturn
@@ -66,7 +65,7 @@ def pytask_unconfigure(session: Session) -> None:
6665
path.write_text(json.dumps(HashPathCache._cache))
6766

6867

69-
def build( # noqa: C901, PLR0912, PLR0913, PLR0915
68+
def build( # noqa: C901, PLR0912, PLR0913
7069
*,
7170
capture: Literal["fd", "no", "sys", "tee-sys"] | CaptureMethod = CaptureMethod.FD,
7271
check_casing_of_paths: bool = True,
@@ -257,9 +256,7 @@ def build( # noqa: C901, PLR0912, PLR0913, PLR0915
257256
session = Session.from_config(config_)
258257

259258
except (ConfigurationError, Exception):
260-
exc_info = remove_internal_traceback_frames_from_exc_info(sys.exc_info())
261-
traceback = Traceback.from_exception(*exc_info)
262-
console.print(traceback)
259+
console.print(Traceback(sys.exc_info()))
263260
session = Session(exit_code=ExitCode.CONFIGURATION_FAILED)
264261

265262
else:
@@ -279,9 +276,7 @@ def build( # noqa: C901, PLR0912, PLR0913, PLR0915
279276
session.exit_code = ExitCode.FAILED
280277

281278
except Exception: # noqa: BLE001
282-
exc_info = remove_internal_traceback_frames_from_exc_info(sys.exc_info())
283-
traceback = Traceback.from_exception(*exc_info)
284-
console.print(traceback)
279+
console.print(Traceback(sys.exc_info()))
285280
session.exit_code = ExitCode.FAILED
286281

287282
session.hook.pytask_unconfigure(session=session)

0 commit comments

Comments
 (0)