You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardexpand all lines: docs/website/docs/reference/troubleshooting.md
+33-2
Original file line number
Diff line number
Diff line change
@@ -181,7 +181,7 @@ Timestamp issues occur when formats are incompatible with the destination or inc
181
181
182
182
- Standardize timestamp formats across all runs to maintain consistent schema inference and avoid the creation of variant columns.
183
183
184
-
3. Inconsistent formats for incremental loading
184
+
3. **Inconsistent formats for incremental loading**
185
185
186
186
-**Scenario:**
187
187
@@ -402,6 +402,37 @@ Failures in the **Load** stage often relate to authentication issues, schema cha
402
402
403
403
- Use schema evolution to handle column renaming. [Read more about schema evolution.](../general-usage/schema-evolution#evolving-the-schema)
404
404
405
+
### **`FileNotFoundError` for 'schema_updates.json' in parallel runs**
406
+
407
+
-**Scenario**
408
+
When running the same pipeline name multiple times in parallel (e.g., via Airflow), `dlt` may fail at the load stage with an error like:
409
+
410
+
>`FileNotFoundError: schema_updates.json not found`
411
+
412
+
This happens because `schema_updates.json`is generated during normalization. Concurrent runs using the same pipeline name may overwrite or lock access to this file, causing failures.
413
+
414
+
-**Possible Solutions**
415
+
416
+
1. **Use unique pipeline names for each parallel run**
417
+
418
+
If calling `pipeline.run()` multiple times within the same workflow (e.g., once per resource), assign a unique `pipeline_name`for each run. This ensures separate working directories, preventing file conflicts.
419
+
420
+
2. **Leverage dlt’s concurrency management or Airflow helpers**
421
+
422
+
dlt’s Airflow integration “serializes” resources into separate tasks while safely handling concurrency. To parallelize resource extraction without file conflicts, use:
423
+
```py
424
+
decompose="serialize"
425
+
```
426
+
More details are available in the [Airflow documentation](../walkthroughs/deploy-a-pipeline/deploy-with-airflow-composer#2-valueerror-can-only-decompose-dlt-source).
427
+
428
+
3. **Disable dev mode to prevent multiple destination datasets**
429
+
430
+
When `dev_mode=True`, dlt generates unique dataset names (`<dataset_name>_<timestamp>`) for each run. To maintain a consistent dataset, set:
431
+
```py
432
+
dev_mode=False
433
+
```
434
+
Read more about this in the [dev mode documentation](../general-usage/pipeline#do-experiments-with-dev-mode).
435
+
405
436
### Memory management issues
406
437
407
438
-**Scenario:**
@@ -412,7 +443,7 @@ Failures in the **Load** stage often relate to authentication issues, schema cha
412
443
413
444
- Pipeline failures due to out-of-memory errors.
414
445
415
-
-**Solution:**
446
+
-**Possible Solution:**
416
447
417
448
- Enable file rotation. [Read more about it here.](./performance#controlling-intermediary-file-size-and-rotation)
0 commit comments