-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Using spot policy on batch runs #315
Comments
Hi, @betolink. That's not supported just yet, but it's something we're planning to add pretty soon—most likely in the next week or two. For you user case, is it okay if a task is running on a spot instance that gets interrupted/preempted, then the task is marked as having failed and not re-tried? Or would retries (and potentially replacing the spot instance) be important to you for using batch? |
Hi @ntabris, yeah it's ok if the cluster it's not provisioned or fails if the spot VMs are not available. A |
We'll definitely support on-demand, spot, and spot with fallback as options for what happens when initially creating the cluster to run your batch job. It's less clear to me right now what we'll do about spot VMs that have been running for a while and get reclaimed (as can happen for a spot VM). Do you expect your individual batch tasks to be fast (say, seconds to a few minutes) or longer running (say, many minutes to hours)? How many batch tasks per job do you expect (a few, tens, hundreds)? |
In my case (processing many files) the run could take minutes to hours. I'm currently testing but I expect the tasks would be in the "tens" and each VM will process a few thousand files. |
I don't know if it's possible to force the spot VM policy when we run a batch job, https://docs.coiled.io/user_guide/batch.html
Something like
The text was updated successfully, but these errors were encountered: