Skip to content

Scheduler crashloops with ValidationError: UUID input should be a string when task_instance.dag_version_id is NULL #68248

@seanmuth

Description

@seanmuth

What happened

The scheduler crashloops on deployments with historical task_instance records where dag_version_id IS NULL. These records exist on any deployment that was running before the dag_version table was introduced (migration 0047_3_0_0_add_dag_versioning).

The scheduler fails when it attempts to construct a DagRunContext using one of these historical TIs as last_ti:

pydantic_core._pydantic_core.ValidationError: 1 validation error for DagRunContext
last_ti.dag_version_id
  UUID input should be a string, bytes or UUID object [type=uuid_type, input_value=None, input_type=NoneType]
    For further information visit https://errors.pydantic.dev/2.13/v/uuid_type

Airflow Version

3.1.x (Astro Runtime 3.1-15)

Steps to Reproduce

  1. Have a deployment with historical TI records predating dag_version (i.e. task_instance.dag_version_id IS NULL)
  2. Upgrade to Airflow 3.1.x
  3. Scheduler begins processing a DAG run whose last_ti is one of these historical records
  4. Scheduler crashloops

Expected Behavior

The scheduler should not crash when encountering a historical TI with dag_version_id=None, nor should it silently skip or ignore the associated DAG run. A reasonable fallback would be to substitute the most recent dag_version_id for the given dag_id when constructing DagRunContext — keeping the run in-flight while avoiding the validation error. Open to other approaches from the community.

Actual Behavior

Scheduler crashloops continuously. The only workaround is to backfill all historical TIs with a valid dag_version_id:

-- Run in batches due to volume (can be 100M+ rows on long-running deployments)
WITH latest_version AS (
    SELECT DISTINCT ON (dag_id) id, dag_id
    FROM dag_version
    ORDER BY dag_id, version_number DESC
)
UPDATE task_instance ti
SET dag_version_id = lv.id
FROM latest_version lv
WHERE ti.dag_id = lv.dag_id
  AND ti.dag_version_id IS NULL;

Additional Context

  • dag_version_id FK constraint was changed from ON DELETE CASCADE to ON DELETE RESTRICT in migration 0072_3_1_0 — tightening the relationship between TIs and dag_version rows makes this null scenario more impactful
  • On large deployments this backfill can affect 100M+ rows; a partial index on (dag_id) WHERE dag_version_id IS NULL is recommended before running

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions