-
Notifications
You must be signed in to change notification settings - Fork 48
Edit how failed_count and cancelled_count is calculated #447
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
adamruzicka
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The general approach looks good, left a couple of comments. Also rubocop is red and I'm not sure what the tests will say
50f7a38 to
16e5cee
Compare
|
Thanks, I applied the suggestions. I noticed in the dynflow console that for whatever reason cancelled RunHostsJob shows result success even though the counts are correct. On master it shows result = error if a job is cancelled. Is this a problem? |
16e5cee to
036ad60
Compare
Sounds like one, this should ideally remain an "internal" change meaning there should be no observable difference anywhere except for the action's output. |
036ad60 to
5acac92
Compare
|
Alright, unless I am missing something I have fixed the issue with different result state. Right now the only change should be in the cancelled and failed counts of a cancelled job. |
|
Ugh, I see a problem when cancelling job invocation on more than 1 hosts and I wonder what should be the correct implementation. The table below displays result of RunHostsJob in different scenarios. The job invocation on 1 host is cancelled before finishing and the job invocation on two hosts is run with concurrency level 1 and is cancelled after the command run successfully completes on the first host.
Under what circumstances should the result of RunHostsJob be warning and success when cancelling a job invocation? |
5acac92 to
28d239b
Compare
After discussion with @adamruzicka we concluded that the behavior on master could be considered an inconsistency and thus the side effect of this PR in its current form is actually a plus, so I am keeping the behavior for now. I updated the commit message accordingly. I made some additional errors in the last version and I hope I fixed them in the current one, but I need to test more scenarios so I will flip this to draft for now. Also, I need to transfer a value ( |
Can't you calculate that from the other counts that you have? |
|
I am not sure how I would do that. Those unscheduled sub plans aren't even in the database as far as I understand, no? |
Looking at the code I could calculate it from total_count and planned_count, but I don't know where total_count even comes from as it appears to be (?) a virtual method. Can I count on it always returning the same value during one job invocation run? Same question goes for planned_count. From the code and output when running execution plans it seems that it always stays the same, but is there any scenario in which this value could change inbetween the cancel event occurence and the next call of recalculate_counts? |
|
On a separate note: I retested the current version and it appears to be working as expected and the errors I had seem have been resolved. I am flipping this back to Ready for review, even though it is likely that more changes are pending based on the above comments. |
28d239b to
2238d24
Compare
This PR proposes to mark every sub plan which finishes as not successful after the parent execution plan receives the cancel directive as cancelled. This is to be consistent with how a task is evaluated as cancelled in the foreman project. This PR has an additional effect which makes the parent execution plan always finish with warning if at least one sub plan is cancelled as a response to the `cancel` event. Until now, it could either finish with success if all subplans either finished with success or were not yet queued for execution or it could finish with warning if already pending sub plan was cancelled and finished with the `error` result. In summary, this additional change unifies the behavior of a cancelled execution plan if there is anything left to cancel.
2238d24 to
efb9f89
Compare
After another discussion it turns out relying on dynamic calculation of unscheduled sub plans from total_count and planned_count should be save, thus I no longer need to transfer this value inbetween functions. PR updated and ready for review. I hope I was able to edit the |
|
Test failure seems unrelated. Or at least I hope. |
|
Yeah, these do happen from time to time. |
|
Thank you @adamlazik1 ! |
| failed = sub_plans_count('state' => %w(paused stopped), 'result' => %w(error warning)) | ||
| total = total_count | ||
| if output[:cancelled_timestamp] | ||
| cancelled_scheduled_plans = sub_plans_count_after(output[:cancelled_timestamp], { 'state' => %w(paused stopped), 'result' => %w(error warning) }) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this bulletproof enough? Can the following happen?
- I run a job, it gets scheduled
- At time T, I cancel it and that time becomes
cancelled_timestamp - Before the run is actually cancelled, at time T+1, an error occurs so the task becomes stopped/error
- At T+2, an already failed run gets cancelled and nothing happens
=> The run failed but is counted as cancelled
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If it failed after marked as cancelled then I would assume it should be ok to count it as cancelled.
This PR proposes to mark every sub plan which finishes as not successful
after the parent execution plan receives the cancel directive as
cancelled. This is to be consistent with how a task is evaluated as
cancelled in the foreman project.
This PR has an additional effect which makes the parent execution plan
always finish with warning if at least one sub plan is cancelled as a
response to the
cancelevent. Until now, it could either finish withsuccess if all subplans either finished with success or were not yet
queued for execution or it could finish with warning if already pending
sub plan was cancelled and finished with the
errorresult. Insummary, this additional change unifies the behavior of a cancelled
execution plan if there is anything left to cancel.