Edit how failed_count and cancelled_count is calculated #447

adamlazik1 · 2025-01-30T16:30:16Z

This PR proposes to mark every sub plan which finishes as not successful
after the parent execution plan receives the cancel directive as
cancelled. This is to be consistent with how a task is evaluated as
cancelled in the foreman project.

This PR has an additional effect which makes the parent execution plan
always finish with warning if at least one sub plan is cancelled as a
response to the cancel event. Until now, it could either finish with
success if all subplans either finished with success or were not yet
queued for execution or it could finish with warning if already pending
sub plan was cancelled and finished with the error result. In
summary, this additional change unifies the behavior of a cancelled
execution plan if there is anything left to cancel.

adamruzicka

The general approach looks good, left a couple of comments. Also rubocop is red and I'm not sure what the tests will say

lib/dynflow/persistence_adapters/abstract.rb

lib/dynflow/action/v2/with_sub_plans.rb

adamlazik1 · 2025-02-03T18:55:07Z

Thanks, I applied the suggestions. I noticed in the dynflow console that for whatever reason cancelled RunHostsJob shows result success even though the counts are correct. On master it shows result = error if a job is cancelled. Is this a problem?

adamruzicka · 2025-02-18T16:08:17Z

Is this a problem?

Sounds like one, this should ideally remain an "internal" change meaning there should be no observable difference anywhere except for the action's output.

lib/dynflow/persistence.rb

adamlazik1 · 2025-02-20T18:11:35Z

Alright, unless I am missing something I have fixed the issue with different result state. Right now the only change should be in the cancelled and failed counts of a cancelled job.

adamlazik1 · 2025-02-20T18:58:35Z

Ugh, I see a problem when cancelling job invocation on more than 1 hosts and I wonder what should be the correct implementation. The table below displays result of RunHostsJob in different scenarios. The job invocation on 1 host is cancelled before finishing and the job invocation on two hosts is run with concurrency level 1 and is cancelled after the command run successfully completes on the first host.

n. of hosts / branch	master	PR
1	warning	warning
2	success	warning

Under what circumstances should the result of RunHostsJob be warning and success when cancelling a job invocation?

adamlazik1 · 2025-02-25T17:52:35Z

Ugh, I see a problem when cancelling job invocation on more than 1 hosts and I wonder what should be the correct implementation. The table below displays result of RunHostsJob in different scenarios. The job invocation on 1 host is cancelled before finishing and the job invocation on two hosts is run with concurrency level 1 and is cancelled after the command run successfully completes on the first host.

n. of hosts / branch master PR

1 warning warning

2 success warning

Under what circumstances should the result of RunHostsJob be warning and success when cancelling a job invocation?

After discussion with @adamruzicka we concluded that the behavior on master could be considered an inconsistency and thus the side effect of this PR in its current form is actually a plus, so I am keeping the behavior for now. I updated the commit message accordingly.

I made some additional errors in the last version and I hope I fixed them in the current one, but I need to test more scenarios so I will flip this to draft for now.

Also, I need to transfer a value (cancelled_unqueued_sub_plans_count) between different methods and I don't know what the correct way of doing that here is so I used the output hash for now. Suggestions are welcome.

adamruzicka · 2025-02-26T08:23:08Z

Also, I need to transfer a value (cancelled_unqueued_sub_plans_count) between different methods and I don't know what the correct way of doing that here is so I used the output hash for now. Suggestions are welcome.

Can't you calculate that from the other counts that you have?

adamlazik1 · 2025-02-26T12:45:38Z

I am not sure how I would do that. Those unscheduled sub plans aren't even in the database as far as I understand, no?

adamlazik1 · 2025-02-26T13:58:54Z

I am not sure how I would do that. Those unqueued subplans aren't even in the database as far as I understand, no?

Looking at the code I could calculate it from total_count and planned_count, but I don't know where total_count even comes from as it appears to be (?) a virtual method. Can I count on it always returning the same value during one job invocation run? Same question goes for planned_count. From the code and output when running execution plans it seems that it always stays the same, but is there any scenario in which this value could change inbetween the cancel event occurence and the next call of recalculate_counts?

adamlazik1 · 2025-02-26T14:00:57Z

On a separate note: I retested the current version and it appears to be working as expected and the errors I had seem have been resolved. I am flipping this back to Ready for review, even though it is likely that more changes are pending based on the above comments.

This PR proposes to mark every sub plan which finishes as not successful after the parent execution plan receives the cancel directive as cancelled. This is to be consistent with how a task is evaluated as cancelled in the foreman project. This PR has an additional effect which makes the parent execution plan always finish with warning if at least one sub plan is cancelled as a response to the `cancel` event. Until now, it could either finish with success if all subplans either finished with success or were not yet queued for execution or it could finish with warning if already pending sub plan was cancelled and finished with the `error` result. In summary, this additional change unifies the behavior of a cancelled execution plan if there is anything left to cancel.

adamlazik1 · 2025-02-26T17:06:20Z

I am not sure how I would do that. Those unqueued subplans aren't even in the database as far as I understand, no?

Looking at the code I could calculate it from total_count and planned_count, but I don't know where total_count even comes from as it appears to be (?) a virtual method. Can I count on it always returning the same value during one job invocation run? Same question goes for planned_count. From the code and output when running execution plans it seems that it always stays the same, but is there any scenario in which this value could change inbetween the cancel event occurence and the next call of recalculate_counts?

After another discussion it turns out relying on dynamic calculation of unscheduled sub plans from total_count and planned_count should be save, thus I no longer need to transfer this value inbetween functions. PR updated and ready for review.

I hope I was able to edit the remaining_count method correctly. In the previous implementation cancelled_count could either be 0 or total_count - planned_count at the time of cancel event, which means that at the time of cancellation this method should return 0.

adamlazik1 · 2025-03-03T11:29:16Z

Test failure seems unrelated. Or at least I hope.

adamruzicka · 2025-03-03T12:31:20Z

Yeah, these do happen from time to time.

adamruzicka · 2025-03-17T08:48:59Z

Thank you @adamlazik1 !

lhellebr · 2025-08-28T13:44:08Z

lib/dynflow/action/v2/with_sub_plans.rb

-      failed  = sub_plans_count('state' => %w(paused stopped), 'result' => %w(error warning))
+      total = total_count
+      if output[:cancelled_timestamp]
+        cancelled_scheduled_plans = sub_plans_count_after(output[:cancelled_timestamp], { 'state' => %w(paused stopped), 'result' => %w(error warning) })


Is this bulletproof enough? Can the following happen?

I run a job, it gets scheduled

At time T, I cancel it and that time becomes cancelled_timestamp

Before the run is actually cancelled, at time T+1, an error occurs so the task becomes stopped/error

At T+2, an already failed run gets cancelled and nothing happens
=> The run failed but is counted as cancelled

If it failed after marked as cancelled then I would assume it should be ok to count it as cancelled.

adamruzicka reviewed Jan 31, 2025

View reviewed changes

lib/dynflow/persistence_adapters/abstract.rb Outdated Show resolved Hide resolved

lib/dynflow/action/v2/with_sub_plans.rb Outdated Show resolved Hide resolved

lib/dynflow/action/v2/with_sub_plans.rb Outdated Show resolved Hide resolved

adamlazik1 force-pushed the edit-cancelled-count branch from 50f7a38 to 16e5cee Compare February 3, 2025 18:52

adamlazik1 force-pushed the edit-cancelled-count branch from 16e5cee to 036ad60 Compare February 6, 2025 10:44

ofedoren reviewed Feb 20, 2025

View reviewed changes

lib/dynflow/persistence.rb Outdated Show resolved Hide resolved

adamlazik1 force-pushed the edit-cancelled-count branch from 036ad60 to 5acac92 Compare February 20, 2025 15:30

adamlazik1 force-pushed the edit-cancelled-count branch from 5acac92 to 28d239b Compare February 25, 2025 17:40

adamlazik1 marked this pull request as draft February 25, 2025 17:54

adamlazik1 marked this pull request as ready for review February 26, 2025 14:01

adamlazik1 force-pushed the edit-cancelled-count branch from 28d239b to 2238d24 Compare February 26, 2025 16:57

adamlazik1 force-pushed the edit-cancelled-count branch from 2238d24 to efb9f89 Compare February 26, 2025 16:58

adamruzicka approved these changes Mar 17, 2025

View reviewed changes

adamruzicka merged commit 09b331c into Dynflow:master Mar 17, 2025
6 of 7 checks passed

adamlazik1 deleted the edit-cancelled-count branch March 17, 2025 11:06

lhellebr reviewed Aug 28, 2025

View reviewed changes

Edit how failed_count and cancelled_count is calculated #447

Edit how failed_count and cancelled_count is calculated #447

Conversation

adamlazik1 commented Jan 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

adamruzicka left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

adamlazik1 commented Feb 3, 2025

Uh oh!

adamruzicka commented Feb 18, 2025

Uh oh!

Uh oh!

adamlazik1 commented Feb 20, 2025

Uh oh!

adamlazik1 commented Feb 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

adamlazik1 commented Feb 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

adamruzicka commented Feb 26, 2025

Uh oh!

adamlazik1 commented Feb 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

adamlazik1 commented Feb 26, 2025

Uh oh!

adamlazik1 commented Feb 26, 2025

Uh oh!

adamlazik1 commented Feb 26, 2025

Uh oh!

adamlazik1 commented Mar 3, 2025

Uh oh!

adamruzicka commented Mar 3, 2025

Uh oh!

Uh oh!

adamruzicka commented Mar 17, 2025

Uh oh!

lhellebr Aug 28, 2025

Choose a reason for hiding this comment

Uh oh!

adamlazik1 Aug 29, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

adamlazik1 commented Jan 30, 2025 •

edited

Loading

adamlazik1 commented Feb 20, 2025 •

edited

Loading

adamlazik1 commented Feb 25, 2025 •

edited

Loading

adamlazik1 commented Feb 26, 2025 •

edited

Loading