Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
56 changes: 56 additions & 0 deletions examples/execution_plan_chaining.rb
Original file line number Diff line number Diff line change
@@ -0,0 +1,56 @@
#!/usr/bin/env ruby
# frozen_string_literal: true

require_relative 'example_helper'

class DelayedAction < Dynflow::Action
def plan(should_fail = false)
plan_self :should_fail => should_fail
end

def run
sleep 5
raise "Controlled failure" if input[:should_fail]
end

def rescue_strategy
Dynflow::Action::Rescue::Fail
end
end

if $PROGRAM_NAME == __FILE__
world = ExampleHelper.create_world do |config|
config.auto_rescue = true
end
world.action_logger.level = 1
world.logger.level = 0

plan1 = world.trigger(DelayedAction)
plan2 = world.chain(plan1.execution_plan_id, DelayedAction)
plan3 = world.chain(plan2.execution_plan_id, DelayedAction)
plan4 = world.chain(plan2.execution_plan_id, DelayedAction)

plan5 = world.trigger(DelayedAction, true)
plan6 = world.chain(plan5.execution_plan_id, DelayedAction)

puts <<-MSG.gsub(/^.*\|/, '')
|
| Execution Plan Chaining example
| ========================
|
| This example shows the execution plan chaining functionality of Dynflow, which allows execution plans to wait until another execution plan finishes.
|
| Execution plans:
| #{plan1.id} runs immediately and should run successfully.
| #{plan2.id} is delayed and should run once #{plan1.id} finishes.
| #{plan3.id} and #{plan4.id} are delayed and should run once #{plan2.id} finishes.
|
| #{plan5.id} runs immediately and is expected to fail.
| #{plan6.id} should not run at all as its prerequisite failed.
|
| Visit #{ExampleHelper::DYNFLOW_URL} to see their status.
|
MSG

ExampleHelper.run_web_console(world)
end
2 changes: 1 addition & 1 deletion lib/dynflow/debug/telemetry/persistence.rb
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@ module Persistence
:load_execution_plan,
:save_execution_plan,
:find_old_execution_plans,
:find_past_delayed_plans,
:find_ready_delayed_plans,
:delete_delayed_plans,
:save_delayed_plan,
:set_delayed_plan_frozen,
Expand Down
2 changes: 1 addition & 1 deletion lib/dynflow/delayed_executors/abstract_core.rb
Original file line number Diff line number Diff line change
Expand Up @@ -32,7 +32,7 @@ def time

def delayed_execution_plans(time)
with_error_handling([]) do
world.persistence.find_past_delayed_plans(time)
world.persistence.find_ready_delayed_plans(time)
end
end

Expand Down
6 changes: 6 additions & 0 deletions lib/dynflow/delayed_plan.rb
Original file line number Diff line number Diff line change
Expand Up @@ -31,6 +31,12 @@ def timeout
error("Execution plan could not be started before set time (#{@start_before})", 'timeout')
end

def failed_dependencies(uuids)
bullets = uuids.map { |u| "- #{u}" }.join("\n")
msg = "Execution plan could not be started because some of its prerequisite execution plans failed:\n#{bullets}"
error(msg, 'failed-dependency')
end

def error(message, history_entry = nil)
execution_plan.root_plan_step.state = :error
execution_plan.root_plan_step.error = ::Dynflow::ExecutionPlan::Steps::Error.new(message)
Expand Down
10 changes: 9 additions & 1 deletion lib/dynflow/director.rb
Original file line number Diff line number Diff line change
Expand Up @@ -114,7 +114,15 @@ def execute
plan = world.persistence.load_delayed_plan(execution_plan_id)
return if plan.nil? || plan.execution_plan.state != :scheduled

if !plan.start_before.nil? && plan.start_before < Time.now.utc()
if plan.start_before.nil?
blocker_ids = world.persistence.find_execution_plan_dependencies(execution_plan_id)
statuses = world.persistence.find_execution_plan_statuses({ filters: { uuid: blocker_ids } })
failed = statuses.select { |_uuid, status| status[:state] == 'stopped' && status[:result] == 'error' }

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To test this I tried the following:

  1. Create a chained composite CV task by publishing two child content views
  2. Force cancel one of the dependencies, which left the task in a funny state "stopped - pending"
  3. Force canceling didn't seem to trigger the chained task to run, so I did the following on the Force cancelled task:
ForemanTasks::Task.where(id: '63fcef24-bc37-4445-9221-44382f216442').update(result: 'error')

After that, I noticed the chained task actually started running - I though it would halt itself with an error.

Is my test here flawed somehow?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually - @sjha4 in your testing, this might be good to try to reproduce. Maybe I just had a timing issue.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is weird. Updating the ForemanTasks::Task object should have no impact on anything as that's completely external to dynflow.

if failed.any?
plan.failed_dependencies(failed.keys)
return
end
elsif plan.start_before < Time.now.utc()
plan.timeout
return
end
Expand Down
16 changes: 14 additions & 2 deletions lib/dynflow/persistence.rb
Original file line number Diff line number Diff line change
Expand Up @@ -101,8 +101,16 @@ def find_old_execution_plans(age)
end
end

def find_past_delayed_plans(time)
adapter.find_past_delayed_plans(time).map do |plan|
def find_execution_plan_dependencies(execution_plan_id)
adapter.find_execution_plan_dependencies(execution_plan_id)
end

def find_blocked_execution_plans(execution_plan_id)
adapter.find_blocked_execution_plans(execution_plan_id)
end

def find_ready_delayed_plans(time)
adapter.find_ready_delayed_plans(time).map do |plan|
DelayedPlan.new_from_hash(@world, plan)
end
end
Expand Down Expand Up @@ -163,5 +171,9 @@ def prune_envelopes(receiver_ids)
def prune_undeliverable_envelopes
adapter.prune_undeliverable_envelopes
end

def chain_execution_plan(first, second)
adapter.chain_execution_plan(first, second)
end
end
end
10 changes: 9 additions & 1 deletion lib/dynflow/persistence_adapters/abstract.rb
Original file line number Diff line number Diff line change
Expand Up @@ -72,7 +72,15 @@ def save_execution_plan(execution_plan_id, value)
raise NotImplementedError
end

def find_past_delayed_plans(options = {})
def find_execution_plan_dependencies(execution_plan_id)
raise NotImplementedError
end

def find_blocked_execution_plans(execution_plan_id)
raise NotImplementedError
end

def find_ready_delayed_plans(options = {})
raise NotImplementedError
end

Expand Down
33 changes: 29 additions & 4 deletions lib/dynflow/persistence_adapters/sequel.rb
Original file line number Diff line number Diff line change
Expand Up @@ -39,7 +39,8 @@ class action_class execution_plan_uuid queue),
envelope: %w(receiver_id),
coordinator_record: %w(id owner_id class),
delayed: %w(execution_plan_uuid start_at start_before args_serializer frozen),
output_chunk: %w(execution_plan_uuid action_id kind timestamp) }
output_chunk: %w(execution_plan_uuid action_id kind timestamp),
execution_plan_dependency: %w(execution_plan_uuid blocked_by_uuid) }

SERIALIZABLE_COLUMNS = { action: %w(input output),
delayed: %w(serialized_args),
Expand Down Expand Up @@ -153,12 +154,31 @@ def find_old_execution_plans(age)
records.map { |plan| execution_plan_column_map(load_data plan, table_name) }
end

def find_past_delayed_plans(time)
def find_execution_plan_dependencies(execution_plan_id)
table(:execution_plan_dependency)
.where(execution_plan_uuid: execution_plan_id)
.select_map(:blocked_by_uuid)
end

def find_blocked_execution_plans(execution_plan_id)
table(:execution_plan_dependency)
.where(blocked_by_uuid: execution_plan_id)
.select_map(:execution_plan_uuid)
end

def find_ready_delayed_plans(time)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had an issue where one composite CV publish was waiting on two children component CV publishes. However, the task would still start after the quicker child finished.

I don't know Dynflow super well, so I employed some AI tool help:


/begin robot

Issue: The find_ready_delayed_plans query in lib/dynflow/persistence_adapters/sequel.rb had a bug when handling execution plans with multiple dependencies.

It would return a delayed plan as "ready" if ANY dependency was stopped, instead of waiting for ALL dependencies to stop.

Root Cause: The original query used LEFT JOINs:

LEFT JOIN dependencies ON delayed.uuid = dependencies.execution_plan_uuid
LEFT JOIN execution_plans ON dependencies.blocked_by_uuid = execution_plans.uuid
WHERE (state IS NULL OR state = 'stopped')

With multiple dependencies (e.g., plan D depends on A and B):

If A is 'running' and B is 'stopped', the LEFT JOIN produces 2 rows
The WHERE clause filters out the row with A ('running')
But keeps the row with B ('stopped')
Result: D is returned as "ready" even though A is still running

Fix: Changed to NOT EXISTS subquery to ensure NO dependencies are in a non-stopped state:

WHERE NOT EXISTS (
  SELECT 1 FROM dependencies
  LEFT JOIN execution_plans ON dependencies.blocked_by_uuid = execution_plans.uuid
  WHERE dependencies.execution_plan_uuid = delayed.execution_plan_uuid
  AND execution_plans.state IS NOT NULL
  AND execution_plans.state != 'stopped'
)

Result: Chained execution plans now correctly wait for ALL dependencies to complete before running, as documented in the original PR description.

/end robot


I tested this out, and afterwards the publish did indeed wait properly for the slower child to finish.

It's possible I'm using this chaining method incorrectly in my development branch, but let me know what you think of the above.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So:

diff --git a/lib/dynflow/persistence_adapters/sequel.rb b/lib/dynflow/persistence_adapters/sequel.rb
index b36298b..502aadf 100644
--- a/lib/dynflow/persistence_adapters/sequel.rb
+++ b/lib/dynflow/persistence_adapters/sequel.rb
@@ -146,14 +146,22 @@ module Dynflow
 
       def find_ready_delayed_plans(time)
         table_name = :delayed
+        # Find delayed plans where ALL dependencies (if any) are either non-existent or stopped
+        # We use NOT EXISTS to ensure no dependency is in a non-stopped state
         table(table_name)
-          .left_join(TABLES[:execution_plan_dependency], execution_plan_uuid: :execution_plan_uuid)
-          .left_join(TABLES[:execution_plan], uuid: :blocked_by_uuid)
           .where(::Sequel.lit('start_at IS NULL OR (start_at <= ? OR (start_before IS NOT NULL AND start_before <= ?))', time, time))
-          .where(::Sequel[{ state: nil }] | ::Sequel[{ state: 'stopped' }])
           .where(:frozen => false)
+          .where(::Sequel.lit(
+            "NOT EXISTS (
+              SELECT 1
+              FROM #{TABLES[:execution_plan_dependency]} dep
+              LEFT JOIN #{TABLES[:execution_plan]} ep ON dep.blocked_by_uuid = ep.uuid
+              WHERE dep.execution_plan_uuid = #{TABLES[table_name]}.execution_plan_uuid
+              AND ep.state IS NOT NULL
+              AND ep.state != 'stopped'
+            )"
+          ))
           .order_by(:start_at)
-          .select_all(TABLES[table_name])
           .all
           .map { |plan| load_data(plan, table_name) }
       end

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh yeah, this was put together rather quickly, I'll have to take a look at this again

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And the suggestion looks reasonable, although I'll try to reduce raw sql as much as possible

Copy link

@ianballou ianballou Oct 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh yeah, this was put together rather quickly, I'll have to take a look at this again

It seems to work well for being a prototype!

table_name = :delayed
# Subquery to find delayed plans that have at least one non-stopped dependency
plans_with_unfinished_deps = table(:execution_plan_dependency)
.join(TABLES[:execution_plan], uuid: :blocked_by_uuid)
.where(::Sequel.~(state: 'stopped'))

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to check - when you force unlock a task, does it go to the 'stopped' state? If it doesn't, we might need a workflow for unlinking the scheduled task from the one that was force unlocked.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, my comment above says it goes to the stopped state. In which case, since it doesn't go to stopped - error, I believe the parent chained task should start running. I'm not sure how feasible it would be to cause force unlock to unschedule chained parents.
I'd be okay with force unlock continuing to run the parent tasks since it's pretty much a debug action.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, my comment above says it goes to the stopped state.

It does

I'm not sure how feasible it would be to cause force unlock to unschedule chained parents.

It would probably be on the more difficult end of the spectrum, so I'd prefer to not go down that path.

.select(:execution_plan_uuid)

records = with_retry do
table(table_name)
.where(::Sequel.lit('start_at <= ? OR (start_before IS NOT NULL AND start_before <= ?)', time, time))
.where(::Sequel.lit('start_at IS NULL OR (start_at <= ? OR (start_before IS NOT NULL AND start_before <= ?))', time, time))
.where(:frozen => false)
.exclude(execution_plan_uuid: plans_with_unfinished_deps)
.order_by(:start_at)
.all
end
Expand All @@ -175,6 +195,10 @@ def save_delayed_plan(execution_plan_id, value)
save :delayed, { execution_plan_uuid: execution_plan_id }, value, with_data: false
end

def chain_execution_plan(first, second)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This patch:

diff --git a/lib/dynflow/persistence_adapters/sequel.rb b/lib/dynflow/persistence_adapters/sequel.rb
index c673090..0de3ea7 100644
--- a/lib/dynflow/persistence_adapters/sequel.rb
+++ b/lib/dynflow/persistence_adapters/sequel.rb
@@ -196,7 +196,11 @@ module Dynflow
       end
 
       def chain_execution_plan(first, second)
-        save :execution_plan_dependency, { execution_plan_uuid: second }, { execution_plan_uuid: second, blocked_by_uuid: first }, with_data: false
+        # Insert dependency directly without checking for existing records.
+        # The table is designed to allow multiple dependencies per execution plan.
+        # Using save() causes upsert behavior that overwrites existing dependencies.
+        record = { execution_plan_uuid: second, blocked_by_uuid: first }
+        with_retry { table(:execution_plan_dependency).insert(record) }
       end
 
       def load_step(execution_plan_id, step_id)

Caused multiple dependencies to start showing up for me. I'm unsure if it's save to do the inserts here like this, but it worked around the upserting causing trouble.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I took a different approach, but it should be fixed.

save :execution_plan_dependency, {}, { execution_plan_uuid: second, blocked_by_uuid: first }, with_data: false
end

def load_step(execution_plan_id, step_id)
load :step, execution_plan_uuid: execution_plan_id, id: step_id
end
Expand Down Expand Up @@ -319,7 +343,8 @@ def abort_if_pending_migrations!
envelope: :dynflow_envelopes,
coordinator_record: :dynflow_coordinator_records,
delayed: :dynflow_delayed_plans,
output_chunk: :dynflow_output_chunks }
output_chunk: :dynflow_output_chunks,
execution_plan_dependency: :dynflow_execution_plan_dependencies }

def table(which)
db[TABLES.fetch(which)]
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
# frozen_string_literal: true

Sequel.migration do
up do
type = database_type
create_table(:dynflow_execution_plan_dependencies) do
column_properties = if type.to_s.include?('postgres')
{ type: :uuid }
else
{ type: String, size: 36, fixed: true, null: false }
end
foreign_key :execution_plan_uuid, :dynflow_execution_plans, on_delete: :cascade, **column_properties
foreign_key :blocked_by_uuid, :dynflow_execution_plans, on_delete: :cascade, **column_properties
index :blocked_by_uuid

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe it's worth to also index :execution_plan_uuid?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added

index :execution_plan_uuid
end
end

down do
drop_table(:dynflow_execution_plan_dependencies)
end
end
10 changes: 10 additions & 0 deletions lib/dynflow/world.rb
Original file line number Diff line number Diff line change
Expand Up @@ -202,6 +202,16 @@ def delay_with_options(action_class:, args:, delay_options:, id: nil, caller_act
Scheduled[execution_plan.id]
end

def chain(plan_uuids, action_class, *args)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I'm seeing the chaining only keeping one of the chained tasks instead of them all.

I tested publishing 3 content views. The first takes a really long time. After publishing them in order of speed, I sometimes see that only one of the faster content views was made as a dependency of the composite content view publish.

I thought it could be a Katello PR issue, but after adding debug logging, I found I was passing in two children, yet it the slower child was not waited on (and the new Dynflow chaining UI showed this).

Claude dug around in the code a bit, and looked at:

https://github.com/Dynflow/dynflow/blob/3dea62325aacc96f8df8117f88657970c61f2836/lib/dynflow/persistence_adapters/sequel.rb#L367C1-L386C10

      def save(what, condition, value, with_data: true, update_conditions: {})
        table           = table(what)
        existing_record = with_retry { table.first condition } unless condition.empty?

        if value
          record = prepare_record(what, value, (existing_record || condition), with_data)
          if existing_record
            record = prune_unchanged(what, existing_record, record)
            return value if record.empty?
            condition = update_conditions.merge(condition)
            return with_retry { table.where(condition).update(record) }  <--------
          else
            with_retry { table.insert record }
          end

        else
          existing_record and with_retry { table.where(condition).delete }
        end
        value
      end

It's suggesting that the upsert logic in here is causing the other chained methods to be overwritten. I tried getting around the upsert logic and only then did I see multiple dependencies in the Dynflow UI for the composite task. See my other comment for the patch.

plan_uuids = [plan_uuids] unless plan_uuids.is_a? Array
result = delay_with_options(action_class: action_class, args: args, delay_options: { frozen: true })
plan_uuids.each do |plan_uuid|
persistence.chain_execution_plan(plan_uuid, result.execution_plan_id)
end
persistence.set_delayed_plan_frozen(result.execution_plan_id, false)
result
end

def plan_elsewhere(action_class, *args)
execution_plan = ExecutionPlan.new(self, nil)
execution_plan.delay(nil, action_class, {}, *args)
Expand Down
Loading