Skip to content

Conversation

jcristau
Copy link
Contributor

Instead of a single task_max_timeout value for all tasks, let task payloads include a maxRunTime value, and kill the task after that time has passed.

Going over the timeout now returns failure instead of intermittent-task, to avoid unwanted retries.

This is backwards-incompatible, since maxRunTime would previously have been ignored.

Instead of a single task_max_timeout value for all tasks, let task
payloads include a maxRunTime value, and kill the task after that time
has passed.

Going over the timeout now returns failure instead of intermittent-task,
to avoid unwanted retries.

This is backwards-incompatible, since maxRunTime would previously have
been ignored.
@jcristau jcristau marked this pull request as ready for review September 30, 2025 16:04
@jcristau jcristau requested a review from a team as a code owner September 30, 2025 16:04
return max_timeout
if not isinstance(max_run_time, int):
raise TaskVerificationError("maxRunTime must be an integer")
return min(max_timeout, max_run_time)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can see this being a bit confusing. Eg: if we set maxRunTime to 10800 in a task, but task_max_runtime is 7200, the payload value is ignore. This could be solved by always preferring the value in the payload (if it exists), leaving task_max_runtime (or perhaps a renamed version of it?) as a default fallback value.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm thinking of task_max_runtime in the config as "the maximum (and default) allowed value for maxRunTime".
generic-worker has a maxTaskRunTime config value that kind of plays that role (although it raises malformed-payload when maxRunTime exceeds it instead of ignoring the payload value; that could be an option here as well).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah. Thinking about it through that lens means it makes a lot of sense. It's still a bit confusing, but I'm not sure that a different name (eg: max_max_run_time) would help.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a bit indifferent on whether or not to throw an error if the payload max runtime exceeds the max in the config. It's probably ideal to...but it may also cause problems in practice.

max_run_time = claim_task["task"]["payload"].get("maxRunTime")
if max_run_time is None:
return max_timeout
if not isinstance(max_run_time, int):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can/should this get verified earlier? eg: when loading the config? It's unfortunate that bad values can possibly get through...but maybe that's just life with Python.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's coming from the task payload, not the config?

I guess what we could do is in get_task_definition, instead of if "payload" not in task_defn, validate the payload against a minimal schema that includes maxRunTime as an optional integer key.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right...sorry. I guess we're in scriptworker land here, not a specific script. Presumably the individual scripts will validate their max run times, but we probably shouldn't assume that here?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants