Skip to content

[Feature Request][Spark] Allow USING INVENTORY table identifier to resolve non-Delta sources #7036

@awbarbeau

Description

@awbarbeau

Feature request

Which Delta project/connector is this regarding?

  • Spark
  • Standalone
  • Flink
  • Kernel
  • Other (fill in here)

Overview

VACUUM ... USING INVENTORY <table_identifier> currently requires the inventory source to be a Delta table.

This is stricter than necessary. The inventory is ultimately consumed as a DataFrame and validated against the required inventory schema (path, length, isDir, modificationTime).

Motivation

The feature announcement describes inventory-based vacuum as a way to pass inventory "as a delta table or as a spark sql query":
https://delta.io/blog/efficient-delta-vacuum/

This request is intended to close the gap between that documented usage and the current implementation behavior for the table identifier form.

The command has two inventory paths with inconsistent behavior:

  1. USING INVENTORY <table_identifier> fails unless the source is Delta.
  2. USING INVENTORY (<subquery>) works with any SQL source.

Both paths end up in the same VACUUM flow where inventory schema is validated.
Because of this, users are forced to either:

  1. Materialize inventory data as Delta just to satisfy the identifier path, or
  2. Rewrite to subquery syntax as a workaround.

Further details

Scope clarification:

  1. This request only changes how the USING INVENTORY <table_identifier> source is resolved.
  2. It does not change any controls for the VACUUM target table.
  3. The target remains Delta-only and still goes through existing Delta VACUUM safety and protocol checks.

Requested behavior:

  1. Allow <table_identifier> inventory sources to resolve to any analyzable relation (table/view/temp view), not only Delta.
  2. Keep the existing inventory schema validation as the safety check.

Suggested implementation:

  1. For the identifier branch, resolve to a DataFrame directly from the analyzed plan.
  2. Do not require getDeltaTable() for the inventory source.
  3. Continue using the existing strict schema validation to reject invalid inventory input.

This keeps safety unchanged while removing an unnecessary restriction and aligning identifier behavior with subquery behavior.

Willingness to contribute

  • Yes. I can contribute this feature independently.
  • Yes. I would be willing to contribute this feature with guidance from the Delta Lake community.
  • No. I cannot contribute this feature at this time.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions