A new interface for pytask #361
Replies: 1 comment 2 replies
-
|
Sounds basically well to me. Python InputsI often encountered the problem of tasks not running after changing some config values, ended up in adding a Get Rid of
|
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Introduction
pytask will become three years old soon 🥳. What better way to celebrate this birthday than with an open discussion about pytask's interface and maybe a revision of significant components?
We should agree on some principles to guide the design decisions in this discussion.
Each section will discuss one detail, and sections might build up on each other.
The problem with
depends_onandproducesdepends_onandproducesare magic keywords the users need to know about. They must learn to use decorators that inject the correct values at runtime.depends_onandproducesobfuscate what's behind the values. It can be a path, a dictionary of paths, or a list of paths. Just look at these two signatures; what example is clearer?Getting rid of the decorators
We can eliminate the decorators by parsing
depends_onandproducesfrom the default arguments of a task function. This is already possible if a task is marked with@pytask.mark.task.Getting rid of
depends_onAs you saw in the example before and here,
depends_onis not necessary nor clearer. The values can be better attached as default arguments.This interface is not new. The
taskdecorator already allows exploiting default arguments and some extra features (notask_prefix necessary, passingkwargs).But, pytask currently only looks for path dependencies in
depends_on. The implementation could be changed without probably causing many problems to treat every task input exceptproducesas a pytree and when apathlib.Pathis encountered, it is parsed as a dependency.strshould probably not be supported outside ofdepends_on. Usingstrindepends_onshould be deprecated as well for keeping everything simpler.Since
pathlib.Paths are always assumed to bePathNodes, how can I pass apathlib.Pathas a normal function argument? Just wrap the path in aPythonNode, the neutral element for pytask. APythonNodeis explained in the next section.Possible changes
The following are non-breaking changes.
depends_onis present, do not change behavior. Also, strings are still allowed instead of paths.depends_onis not present, try to parse from all arguments of the task function that are notproducesand only if the value in the pytree is apathlib.Path. Strings as paths are not allowed anymore.Allowing for more dependency and product types.
Currently, pytask builds the DAG only from paths, and it cannot handle different objects, significantly limiting users.
To add a new type like a
PythonNode, the user creates a new class that inherits fromMetaNode.The metaclass requires a new type to implement
stateproperty. The state is a hash or some almost unique value signaling changes in the node's value. It isNoneif the node does not exist.valueproperty to retrieve the value from the node and pass it inside the task.The Python node could then be
Hashing Python inputs
What is a usecase for this new
PythonNode?Currently, python inputs are not processed as dependencies. If they alter the signature of the task, changing them might trigger a rerun of the task, but it is not a given.
Here, we declare that the task receives
additional_kwargs. Internally, pytask will treat them in aPythonNodecontainer. We say the dictionary should be hashed to detect changes.(We do not want hashing by default since some inputs can be substantial and hashing too costly.)
Clearer types
The previous example introduced
PythonNodes and how they can be used to declare and hash Python dependencies.A problem with the previous example is that types of function inputs are obfuscated by pytask's node types and do not reveal the original types anymore. Especially, for testing and other purposes it would be advantageous if you can call the function without any knowledge about pytask.
Luckily, a new Python feature recently dropped,
typing.Annotated, was introduced in PEP 593. It allows adding arbitrary metadata to type hints.The previous example can be rewritten to
How to get rid of (path) products?
The solution to replace products is less straightforward than for dependencies. Currently, there are two ways to declare products. First, continue using
@pytask.mark.producesor using onlyproducesas a magical argument of the task function which already removes the decorator.If we want to declare products independent from the
producesargument name, we could use our nodes with a new argument.Beta Was this translation helpful? Give feedback.
All reactions