Currently, the coiled backend can only be used if your workflow code is organized as a
package due to how pytask imports your code and dask serializes task functions
([issue](https://github.com/dask/distributed/issues/8607)).
coiled is a product built on top of dask that eases the deployment of your workflow to many cloud providers like AWS, GCP, and Azure.
Note that, coiled is a paid service. They offer a free monthly tier where you only need to pay the costs for your cloud provider and you can get started without a credit card.
They provide the following benefits which are especially helpful to people who are not familiar with cloud providers or remote computing.
- coiled manages your resources by spawning workers if you need them and shutting them down if they are idle.
- Synchronization of your local environment to remote workers.
- Adaptive scaling if your workflow takes a long time to finish.
There are two ways how you can use coiled with pytask and pytask-parallel.
- Run individual tasks in the cloud.
- Run your whole workflow in the cloud.
Both approaches are explained below after the setup.
Follow coiled's four step short process to set up your local environment and configure your cloud provider.
In most projects there are a just couple of tasks that require a lot of resources and that you would like to run in a virtual machine in the cloud.
With coiled's
serverless functions,
you can define the hardware and software environment for your task. Just decorate the
task function with a {func}@coiled.function <coiled.function>
decorator.
To execute the workflow, you need to turn on parallelization by requesting two or more workers or specifying one of the parallel backends. Otherwise, the decorated task is run locally.
pytask -n 2
pytask --parallel-backend loky
When you apply the {func}@task <pytask.task>
decorator to the task, make sure the
{func}@coiled.function <coiled.function>
decorator is applied first, or is closer to
the function. Otherwise, it will be ignored. Add more arguments to the decorator to
configure the hardware and software environment.
By default, {func}@coiled.function <coiled.function>
scales adaptively
to the workload. It means that coiled infers from the number of submitted tasks and
previous runtimes, how many additional remote workers it should deploy to handle the
workload. It provides a convenient mechanism to scale without intervention. Also,
workers launched by {func}@coiled.function <coiled.function>
will shutdown quicker
than a cluster.
Serverless functions are more thoroughly explained in
[coiled's guide](https://docs.coiled.io/user_guide/usage/functions/index.html).
(coiled-clusters)=
It is also possible to launch a cluster and run each task in a worker provided by coiled. Usually, it is not necessary and you are better off using coiled's serverless functions.
If you want to launch a cluster managed by coiled, register a function that builds an
executor using {class}coiled.Cluster
.
import coiled
from pytask_parallel import ParallelBackend
from pytask_parallel import registry
from concurrent.futures import Executor
def _build_coiled_executor(n_workers: int) -> Executor:
return coiled.Cluster(n_workers=n_workers).get_client().get_executor()
registry.register_parallel_backend(ParallelBackend.CUSTOM, _build_coiled_executor)
Then, execute your workflow with
pytask --parallel-backend custom