Skip to content

Latest commit

 

History

History
111 lines (80 loc) · 3.99 KB

coiled.md

File metadata and controls

111 lines (80 loc) · 3.99 KB

coiled

Currently, the coiled backend can only be used if your workflow code is organized as a
package due to how pytask imports your code and dask serializes task functions
([issue](https://github.com/dask/distributed/issues/8607)).

coiled is a product built on top of dask that eases the deployment of your workflow to many cloud providers like AWS, GCP, and Azure.

Note that, coiled is a paid service. They offer a free monthly tier where you only need to pay the costs for your cloud provider and you can get started without a credit card.

They provide the following benefits which are especially helpful to people who are not familiar with cloud providers or remote computing.

  • coiled manages your resources by spawning workers if you need them and shutting them down if they are idle.
  • Synchronization of your local environment to remote workers.
  • Adaptive scaling if your workflow takes a long time to finish.

There are two ways how you can use coiled with pytask and pytask-parallel.

  1. Run individual tasks in the cloud.
  2. Run your whole workflow in the cloud.

Both approaches are explained below after the setup.

Setup

Follow coiled's four step short process to set up your local environment and configure your cloud provider.

Running individual tasks

In most projects there are a just couple of tasks that require a lot of resources and that you would like to run in a virtual machine in the cloud.

With coiled's serverless functions, you can define the hardware and software environment for your task. Just decorate the task function with a {func}@coiled.function <coiled.function> decorator.

To execute the workflow, you need to turn on parallelization by requesting two or more workers or specifying one of the parallel backends. Otherwise, the decorated task is run locally.

pytask -n 2
pytask --parallel-backend loky

When you apply the {func}@task <pytask.task> decorator to the task, make sure the {func}@coiled.function <coiled.function> decorator is applied first, or is closer to the function. Otherwise, it will be ignored. Add more arguments to the decorator to configure the hardware and software environment.

By default, {func}@coiled.function <coiled.function> scales adaptively to the workload. It means that coiled infers from the number of submitted tasks and previous runtimes, how many additional remote workers it should deploy to handle the workload. It provides a convenient mechanism to scale without intervention. Also, workers launched by {func}@coiled.function <coiled.function> will shutdown quicker than a cluster.

Serverless functions are more thoroughly explained in
[coiled's guide](https://docs.coiled.io/user_guide/usage/functions/index.html).

(coiled-clusters)=

Running a cluster

It is also possible to launch a cluster and run each task in a worker provided by coiled. Usually, it is not necessary and you are better off using coiled's serverless functions.

If you want to launch a cluster managed by coiled, register a function that builds an executor using {class}coiled.Cluster.

import coiled
from pytask_parallel import ParallelBackend
from pytask_parallel import registry
from concurrent.futures import Executor


def _build_coiled_executor(n_workers: int) -> Executor:
    return coiled.Cluster(n_workers=n_workers).get_client().get_executor()


registry.register_parallel_backend(ParallelBackend.CUSTOM, _build_coiled_executor)

Then, execute your workflow with

pytask --parallel-backend custom