What happened
The scheduler crashloops on deployments with historical task_instance records where dag_version_id IS NULL. These records exist on any deployment that was running before the dag_version table was introduced (migration 0047_3_0_0_add_dag_versioning).
The scheduler fails when it attempts to construct a DagRunContext using one of these historical TIs as last_ti:
pydantic_core._pydantic_core.ValidationError: 1 validation error for DagRunContext
last_ti.dag_version_id
UUID input should be a string, bytes or UUID object [type=uuid_type, input_value=None, input_type=NoneType]
For further information visit https://errors.pydantic.dev/2.13/v/uuid_type
Airflow Version
3.1.x (Astro Runtime 3.1-15)
Steps to Reproduce
- Have a deployment with historical TI records predating
dag_version (i.e. task_instance.dag_version_id IS NULL)
- Upgrade to Airflow 3.1.x
- Scheduler begins processing a DAG run whose
last_ti is one of these historical records
- Scheduler crashloops
Expected Behavior
The scheduler should not crash when encountering a historical TI with dag_version_id=None, nor should it silently skip or ignore the associated DAG run. A reasonable fallback would be to substitute the most recent dag_version_id for the given dag_id when constructing DagRunContext — keeping the run in-flight while avoiding the validation error. Open to other approaches from the community.
Actual Behavior
Scheduler crashloops continuously. The only workaround is to backfill all historical TIs with a valid dag_version_id:
-- Run in batches due to volume (can be 100M+ rows on long-running deployments)
WITH latest_version AS (
SELECT DISTINCT ON (dag_id) id, dag_id
FROM dag_version
ORDER BY dag_id, version_number DESC
)
UPDATE task_instance ti
SET dag_version_id = lv.id
FROM latest_version lv
WHERE ti.dag_id = lv.dag_id
AND ti.dag_version_id IS NULL;
Additional Context
dag_version_id FK constraint was changed from ON DELETE CASCADE to ON DELETE RESTRICT in migration 0072_3_1_0 — tightening the relationship between TIs and dag_version rows makes this null scenario more impactful
- On large deployments this backfill can affect 100M+ rows; a partial index on
(dag_id) WHERE dag_version_id IS NULL is recommended before running
What happened
The scheduler crashloops on deployments with historical
task_instancerecords wheredag_version_id IS NULL. These records exist on any deployment that was running before thedag_versiontable was introduced (migration0047_3_0_0_add_dag_versioning).The scheduler fails when it attempts to construct a
DagRunContextusing one of these historical TIs aslast_ti:Airflow Version
3.1.x (Astro Runtime 3.1-15)
Steps to Reproduce
dag_version(i.e.task_instance.dag_version_id IS NULL)last_tiis one of these historical recordsExpected Behavior
The scheduler should not crash when encountering a historical TI with
dag_version_id=None, nor should it silently skip or ignore the associated DAG run. A reasonable fallback would be to substitute the most recentdag_version_idfor the givendag_idwhen constructingDagRunContext— keeping the run in-flight while avoiding the validation error. Open to other approaches from the community.Actual Behavior
Scheduler crashloops continuously. The only workaround is to backfill all historical TIs with a valid
dag_version_id:Additional Context
dag_version_idFK constraint was changed fromON DELETE CASCADEtoON DELETE RESTRICTin migration0072_3_1_0— tightening the relationship between TIs and dag_version rows makes this null scenario more impactful(dag_id) WHERE dag_version_id IS NULLis recommended before running