-
Notifications
You must be signed in to change notification settings - Fork 48
Add execution plan chaining #446
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,56 @@ | ||
| #!/usr/bin/env ruby | ||
| # frozen_string_literal: true | ||
|
|
||
| require_relative 'example_helper' | ||
|
|
||
| class DelayedAction < Dynflow::Action | ||
| def plan(should_fail = false) | ||
| plan_self :should_fail => should_fail | ||
| end | ||
|
|
||
| def run | ||
| sleep 5 | ||
| raise "Controlled failure" if input[:should_fail] | ||
| end | ||
|
|
||
| def rescue_strategy | ||
| Dynflow::Action::Rescue::Fail | ||
| end | ||
| end | ||
|
|
||
| if $PROGRAM_NAME == __FILE__ | ||
| world = ExampleHelper.create_world do |config| | ||
| config.auto_rescue = true | ||
| end | ||
| world.action_logger.level = 1 | ||
| world.logger.level = 0 | ||
|
|
||
| plan1 = world.trigger(DelayedAction) | ||
| plan2 = world.chain(plan1.execution_plan_id, DelayedAction) | ||
| plan3 = world.chain(plan2.execution_plan_id, DelayedAction) | ||
| plan4 = world.chain(plan2.execution_plan_id, DelayedAction) | ||
|
|
||
| plan5 = world.trigger(DelayedAction, true) | ||
| plan6 = world.chain(plan5.execution_plan_id, DelayedAction) | ||
|
|
||
| puts <<-MSG.gsub(/^.*\|/, '') | ||
| | | ||
| | Execution Plan Chaining example | ||
| | ======================== | ||
| | | ||
| | This example shows the execution plan chaining functionality of Dynflow, which allows execution plans to wait until another execution plan finishes. | ||
| | | ||
| | Execution plans: | ||
| | #{plan1.id} runs immediately and should run successfully. | ||
| | #{plan2.id} is delayed and should run once #{plan1.id} finishes. | ||
| | #{plan3.id} and #{plan4.id} are delayed and should run once #{plan2.id} finishes. | ||
| | | ||
| | #{plan5.id} runs immediately and is expected to fail. | ||
| | #{plan6.id} should not run at all as its prerequisite failed. | ||
| | | ||
| | Visit #{ExampleHelper::DYNFLOW_URL} to see their status. | ||
| | | ||
| MSG | ||
|
|
||
| ExampleHelper.run_web_console(world) | ||
| end |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -39,7 +39,8 @@ class action_class execution_plan_uuid queue), | |
| envelope: %w(receiver_id), | ||
| coordinator_record: %w(id owner_id class), | ||
| delayed: %w(execution_plan_uuid start_at start_before args_serializer frozen), | ||
| output_chunk: %w(execution_plan_uuid action_id kind timestamp) } | ||
| output_chunk: %w(execution_plan_uuid action_id kind timestamp), | ||
| execution_plan_dependency: %w(execution_plan_uuid blocked_by_uuid) } | ||
|
|
||
| SERIALIZABLE_COLUMNS = { action: %w(input output), | ||
| delayed: %w(serialized_args), | ||
|
|
@@ -153,12 +154,31 @@ def find_old_execution_plans(age) | |
| records.map { |plan| execution_plan_column_map(load_data plan, table_name) } | ||
| end | ||
|
|
||
| def find_past_delayed_plans(time) | ||
| def find_execution_plan_dependencies(execution_plan_id) | ||
| table(:execution_plan_dependency) | ||
| .where(execution_plan_uuid: execution_plan_id) | ||
| .select_map(:blocked_by_uuid) | ||
| end | ||
|
|
||
| def find_blocked_execution_plans(execution_plan_id) | ||
| table(:execution_plan_dependency) | ||
| .where(blocked_by_uuid: execution_plan_id) | ||
| .select_map(:execution_plan_uuid) | ||
| end | ||
|
|
||
| def find_ready_delayed_plans(time) | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I had an issue where one composite CV publish was waiting on two children component CV publishes. However, the task would still start after the quicker child finished. I don't know Dynflow super well, so I employed some AI tool help: /begin robot Issue: The find_ready_delayed_plans query in lib/dynflow/persistence_adapters/sequel.rb had a bug when handling execution plans with multiple dependencies. It would return a delayed plan as "ready" if ANY dependency was stopped, instead of waiting for ALL dependencies to stop. Root Cause: The original query used LEFT JOINs: With multiple dependencies (e.g., plan D depends on A and B): If A is 'running' and B is 'stopped', the LEFT JOIN produces 2 rows Fix: Changed to NOT EXISTS subquery to ensure NO dependencies are in a non-stopped state: Result: Chained execution plans now correctly wait for ALL dependencies to complete before running, as documented in the original PR description. /end robot I tested this out, and afterwards the publish did indeed wait properly for the slower child to finish. It's possible I'm using this chaining method incorrectly in my development branch, but let me know what you think of the above. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. So:
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Oh yeah, this was put together rather quickly, I'll have to take a look at this again
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. And the suggestion looks reasonable, although I'll try to reduce raw sql as much as possible There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
It seems to work well for being a prototype! |
||
| table_name = :delayed | ||
| # Subquery to find delayed plans that have at least one non-stopped dependency | ||
| plans_with_unfinished_deps = table(:execution_plan_dependency) | ||
| .join(TABLES[:execution_plan], uuid: :blocked_by_uuid) | ||
| .where(::Sequel.~(state: 'stopped')) | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Just to check - when you force unlock a task, does it go to the 'stopped' state? If it doesn't, we might need a workflow for unlinking the scheduled task from the one that was force unlocked. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Actually, my comment above says it goes to the stopped state. In which case, since it doesn't go to
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
It does
It would probably be on the more difficult end of the spectrum, so I'd prefer to not go down that path. |
||
| .select(:execution_plan_uuid) | ||
|
|
||
| records = with_retry do | ||
| table(table_name) | ||
| .where(::Sequel.lit('start_at <= ? OR (start_before IS NOT NULL AND start_before <= ?)', time, time)) | ||
| .where(::Sequel.lit('start_at IS NULL OR (start_at <= ? OR (start_before IS NOT NULL AND start_before <= ?))', time, time)) | ||
| .where(:frozen => false) | ||
| .exclude(execution_plan_uuid: plans_with_unfinished_deps) | ||
| .order_by(:start_at) | ||
| .all | ||
| end | ||
|
|
@@ -175,6 +195,10 @@ def save_delayed_plan(execution_plan_id, value) | |
| save :delayed, { execution_plan_uuid: execution_plan_id }, value, with_data: false | ||
| end | ||
|
|
||
| def chain_execution_plan(first, second) | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This patch: Caused multiple dependencies to start showing up for me. I'm unsure if it's save to do the inserts here like this, but it worked around the upserting causing trouble.
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I took a different approach, but it should be fixed. |
||
| save :execution_plan_dependency, {}, { execution_plan_uuid: second, blocked_by_uuid: first }, with_data: false | ||
| end | ||
|
|
||
| def load_step(execution_plan_id, step_id) | ||
| load :step, execution_plan_uuid: execution_plan_id, id: step_id | ||
| end | ||
|
|
@@ -319,7 +343,8 @@ def abort_if_pending_migrations! | |
| envelope: :dynflow_envelopes, | ||
| coordinator_record: :dynflow_coordinator_records, | ||
| delayed: :dynflow_delayed_plans, | ||
| output_chunk: :dynflow_output_chunks } | ||
| output_chunk: :dynflow_output_chunks, | ||
| execution_plan_dependency: :dynflow_execution_plan_dependencies } | ||
|
|
||
| def table(which) | ||
| db[TABLES.fetch(which)] | ||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,22 @@ | ||
| # frozen_string_literal: true | ||
|
|
||
| Sequel.migration do | ||
| up do | ||
| type = database_type | ||
| create_table(:dynflow_execution_plan_dependencies) do | ||
| column_properties = if type.to_s.include?('postgres') | ||
| { type: :uuid } | ||
| else | ||
| { type: String, size: 36, fixed: true, null: false } | ||
| end | ||
| foreign_key :execution_plan_uuid, :dynflow_execution_plans, on_delete: :cascade, **column_properties | ||
| foreign_key :blocked_by_uuid, :dynflow_execution_plans, on_delete: :cascade, **column_properties | ||
| index :blocked_by_uuid | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Maybe it's worth to also
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Added |
||
| index :execution_plan_uuid | ||
| end | ||
| end | ||
|
|
||
| down do | ||
| drop_table(:dynflow_execution_plan_dependencies) | ||
| end | ||
| end | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -202,6 +202,16 @@ def delay_with_options(action_class:, args:, delay_options:, id: nil, caller_act | |
| Scheduled[execution_plan.id] | ||
| end | ||
|
|
||
| def chain(plan_uuids, action_class, *args) | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think I'm seeing the chaining only keeping one of the chained tasks instead of them all. I tested publishing 3 content views. The first takes a really long time. After publishing them in order of speed, I sometimes see that only one of the faster content views was made as a dependency of the composite content view publish. I thought it could be a Katello PR issue, but after adding debug logging, I found I was passing in two children, yet it the slower child was not waited on (and the new Dynflow chaining UI showed this). Claude dug around in the code a bit, and looked at: def save(what, condition, value, with_data: true, update_conditions: {})
table = table(what)
existing_record = with_retry { table.first condition } unless condition.empty?
if value
record = prepare_record(what, value, (existing_record || condition), with_data)
if existing_record
record = prune_unchanged(what, existing_record, record)
return value if record.empty?
condition = update_conditions.merge(condition)
return with_retry { table.where(condition).update(record) } <--------
else
with_retry { table.insert record }
end
else
existing_record and with_retry { table.where(condition).delete }
end
value
endIt's suggesting that the upsert logic in here is causing the other chained methods to be overwritten. I tried getting around the upsert logic and only then did I see multiple dependencies in the Dynflow UI for the composite task. See my other comment for the patch. |
||
| plan_uuids = [plan_uuids] unless plan_uuids.is_a? Array | ||
| result = delay_with_options(action_class: action_class, args: args, delay_options: { frozen: true }) | ||
| plan_uuids.each do |plan_uuid| | ||
| persistence.chain_execution_plan(plan_uuid, result.execution_plan_id) | ||
| end | ||
| persistence.set_delayed_plan_frozen(result.execution_plan_id, false) | ||
| result | ||
| end | ||
|
|
||
| def plan_elsewhere(action_class, *args) | ||
| execution_plan = ExecutionPlan.new(self, nil) | ||
| execution_plan.delay(nil, action_class, {}, *args) | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To test this I tried the following:
After that, I noticed the chained task actually started running - I though it would halt itself with an error.
Is my test here flawed somehow?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually - @sjha4 in your testing, this might be good to try to reproduce. Maybe I just had a timing issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is weird. Updating the
ForemanTasks::Taskobject should have no impact on anything as that's completely external to dynflow.