Skip to content

Conversation

@yingsu00
Copy link
Collaborator

@yingsu00 yingsu00 commented Aug 12, 2025

This PR is the first step of refactoring Velox’s built-in Iceberg support into its own standalone connector under a new connectors/lakehouse/iceberg hierarchy. By decoupling Iceberg from the Hive connector, we lay the groundwork for a true plugin architecture that:

  • Modularizes Iceberg logic—removing it from the monolithic Hive code path
  • Enables independent releases of the Iceberg connector (semantic versioning, separate CI)
  • Facilitates community adoption by exposing a clean, stable API surface

What’s changed

  • New connectors/lakehouse folder: a top-level plugin namespace for “lakehouse” formats
  • storage_adapters/ — shared filesystem abstractions (S3, HDFS, ABFS, etc.) used by all lakehouse connectors
  • common/ — generic connector interfaces and data types:
    • TableHandle, ColumnHandle, etc.
    • Base classes: DataSource, DataSink, SplitReader
  • iceberg/ — Iceberg-specific code,

Currently it is only a plain copy of the existing Hive-based implementations. It is not included in the build and will not break any CI. In follow-up work we’ll update their headers/namespaces, add necessary changes to make them build, remove Hive dependencies, clean up the implementations and add more features.

We will be moving Iceberg connector to connectors/lakehouse, which will
host all lakehouse connectors like Hudi, Deltalake, etc. These connectors
would require file system supports in storage_adapters folder. This
commit copies it over, so that all future lakehouse connectors can use
it.
The Hive code was mostly used as base of Iceberg code. This commit
copies the code from connectors/hive to connectors/lakehouse/common.
Future commits will adapt them to base classes of future lakehouse
connectors.
@yingsu00 yingsu00 changed the title Connector refactor 5 Connector refactor part 1: copy files Aug 12, 2025
@yingsu00 yingsu00 changed the title Connector refactor part 1: copy files Connector refactor part 1: Copy Hive and Iceberg code to connectors/lakehouse Aug 12, 2025
@yingsu00 yingsu00 requested review from majetideepak and removed request for majetideepak August 12, 2025 21:53
@yingsu00 yingsu00 marked this pull request as ready for review August 15, 2025 21:22
@yingsu00 yingsu00 requested a review from majetideepak as a code owner August 15, 2025 21:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants