-
Notifications
You must be signed in to change notification settings - Fork 34
feat: respect maxRunTime specified in task payload #716
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Instead of a single task_max_timeout value for all tasks, let task payloads include a maxRunTime value, and kill the task after that time has passed. Going over the timeout now returns failure instead of intermittent-task, to avoid unwanted retries. This is backwards-incompatible, since maxRunTime would previously have been ignored.
return max_timeout | ||
if not isinstance(max_run_time, int): | ||
raise TaskVerificationError("maxRunTime must be an integer") | ||
return min(max_timeout, max_run_time) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can see this being a bit confusing. Eg: if we set maxRunTime
to 10800
in a task, but task_max_runtime
is 7200
, the payload value is ignore. This could be solved by always preferring the value in the payload (if it exists), leaving task_max_runtime
(or perhaps a renamed version of it?) as a default fallback value.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm thinking of task_max_runtime in the config as "the maximum (and default) allowed value for maxRunTime".
generic-worker has a maxTaskRunTime
config value that kind of plays that role (although it raises malformed-payload when maxRunTime exceeds it instead of ignoring the payload value; that could be an option here as well).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah. Thinking about it through that lens means it makes a lot of sense. It's still a bit confusing, but I'm not sure that a different name (eg: max_max_run_time
) would help.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a bit indifferent on whether or not to throw an error if the payload max runtime exceeds the max in the config. It's probably ideal to...but it may also cause problems in practice.
max_run_time = claim_task["task"]["payload"].get("maxRunTime") | ||
if max_run_time is None: | ||
return max_timeout | ||
if not isinstance(max_run_time, int): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can/should this get verified earlier? eg: when loading the config? It's unfortunate that bad values can possibly get through...but maybe that's just life with Python.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's coming from the task payload, not the config?
I guess what we could do is in get_task_definition
, instead of if "payload" not in task_defn
, validate the payload against a minimal schema that includes maxRunTime as an optional integer key.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right...sorry. I guess we're in scriptworker land here, not a specific script. Presumably the individual scripts will validate their max run times, but we probably shouldn't assume that here?
Instead of a single task_max_timeout value for all tasks, let task payloads include a maxRunTime value, and kill the task after that time has passed.
Going over the timeout now returns failure instead of intermittent-task, to avoid unwanted retries.
This is backwards-incompatible, since maxRunTime would previously have been ignored.