Skip to content

fix(ingest/deltalake): pin upperbound for delta lake #14083

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

yoonhyejin
Copy link
Collaborator

@yoonhyejin yoonhyejin commented Jul 15, 2025

Bug in Delta Lake source ingestion — fails when Delta Table contains Deletion Vectors (DV).
Resolves #14051

Repro Steps

  1. Create a Delta Table with DV enabled (Delta 2.4.0+, Spark 3.4.4)
from pyspark.sql import SparkSession
from pyspark.sql.types import StructType, StructField, LongType, StringType

table_path = "tmp/dv_test_table"

spark = SparkSession.builder \
    .appName("DeltaTableWithDV") \
    .config("spark.sql.extensions", "io.delta.sql.DeltaSparkSessionExtension") \
    .config("spark.sql.catalog.spark_catalog", "org.apache.spark.sql.delta.catalog.DeltaCatalog") \
    .config("spark.databricks.delta.properties.defaults.enableDeletionVectors", "true") \
    .master("local[*]") \
    .getOrCreate()

schema = StructType([
    StructField("id", LongType(), True),
    StructField("name", StringType(), True),
])

data = [(1, "Alice"), (2, "Bob"), (3, "Charlie")]
df = spark.createDataFrame(data, schema)

df.write.format("delta").mode("overwrite").save(table_path)

df_loaded = spark.read.format("delta").load(table_path)
df_loaded.createOrReplaceTempView("dv_table")

spark.sql("DELETE FROM dv_table WHERE id = 2")
spark.stop()

run with spark-submit --packages io.delta:delta-core_2.12:2.4.0 dv_test.py

  1. install deltalake with pip install deltalake==1.0.2 (or other versions for test)

  2. Ingest local DV tables with datahub ingest -c delta-lake.yaml with the recipe below:

source:
  type: "delta-lake"
  config:
    base_path: "tmp/dv_test_table/"
sink:
  type: "datahub-rest"
  config:
    server: "http://localhost:8080"
  • deltalake==0.25.5 → works
  • deltalake==1.0.0 to 1.1.0 → fails with <class '_internal.CommitFailedError'>: Unsupported reader features required: [DeletionVectors]"]

@github-actions github-actions bot added the ingestion PR or Issue related to the ingestion of metadata label Jul 15, 2025
Copy link

codecov bot commented Jul 15, 2025

❌ 3 Tests Failed:

Tests completed Failed Passed Skipped
3075 3 3072 29
View the full list of 3 ❄️ flaky tests
search cypress/e2e/search/searchFilters.js::cypress/e2e/search/searchFilters.js

Flake rate in main: 18.82% (Passed 69 times, Failed 16 times)

Stack Traces | 16.8s run time
2025-07-15T05:44:28.008Z
Timed out retrying after 10000ms: Expected to find element: `[data-testid=update-filters`, but never found it.
glossary cypress/e2e/glossaryV2/v2_glossary.js::cypress/e2e/glossaryV2/v2_glossary.js

Flake rate in main: 60.00% (Passed 12 times, Failed 18 times)

Stack Traces | 64.6s run time
2025-07-15T05:41:38.289Z
Timed out retrying after 10000ms: Expected to find content: 'CypressGlossaryTerm' but never did.
glossary sidebar navigation test cypress/e2e/glossaryV2/v2_glossary_navigation.js::cypress/e2e/glossaryV2/v2_glossary_navigation.js

Flake rate in main: 90.32% (Passed 3 times, Failed 28 times)

Stack Traces | 103s run time
2025-07-15T07:41:04.088Z
Timed out retrying after 10000ms: Expected to find content: 'CypressGlosssaryNavigationTerm' but never did.

To view more test analytics, go to the Test Analytics Dashboard
📋 Got 3 mins? Take this short survey to help us improve Test Analytics.

@datahub-cyborg datahub-cyborg bot added the needs-review Label for PRs that need review from a maintainer. label Jul 15, 2025
@yoonhyejin yoonhyejin requested a review from pedro93 July 15, 2025 07:27
@datahub-cyborg datahub-cyborg bot added pending-submitter-merge and removed needs-review Label for PRs that need review from a maintainer. labels Jul 15, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ingestion PR or Issue related to the ingestion of metadata pending-submitter-merge
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Delta lake ingest error Unsupported reader features required: [DeletionVectors]
2 participants