Skip to content

Conversation

@Li0k
Copy link
Contributor

@Li0k Li0k commented Nov 14, 2025

Which issue does this PR close?

  • Closes #.

What changes are included in this PR?

risingwavelabs#89

Summary

Enable append mode for Azure Data Lake Storage (AZDLS) write operations.

Problem

AZDLS has specific requirements for write operations that necessitate enabling append mode. Previously, there was no clean way to pass this parameter through the FileIO interface without affecting read operations.

https://github.com/apache/opendal/blob/9746efca6aaa95776d467e7e5e88c5ec93dfd00d/core/src/services/azfile/backend.rs#L328

When writing to Parquet, triggering a RowGroup switch will cause multiple IOs, thus hitting the OneShot limit.

Solution

  • Add an append_file: bool field to OutputFile struct
  • Determine the append mode based on storage backend type in new_output()
  • Use conditional compilation to set append_file = true for AZDLS, false otherwise
  • Keep create_operator() signature unchanged to avoid polluting read-only interfaces

Design Rationale

This approach was chosen because:

  1. It doesn't modify the create_operator() return signature, keeping read operations clean
  2. The append mode decision is made at the point where it's needed (new_output)
  3. Uses pattern matching on Storage type, leveraging existing path parsing logic
  4. Zero overhead when ADLS feature is not enabled (via conditional compilation)

Are these changes tested?

// Relative path of file to uri, starts at [`relative_path_pos`]
relative_path_pos: usize,
// Whether to use append mode for writes (required for some storage backends like AZDLS)
append_file: bool,
Copy link
Collaborator

@mbutrovich mbutrovich Nov 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm generally not a fan of adding bools as args since it limits extension/API changes in the future and sets a precedent for exponentially complex input arguments that might not all be valid configs. Can we envision any other things that we might want to encode in a WriteMode enum with one value representing the current behavior and maybe an Append value. If the answer is no then I'll accept the bool, otherwise we might consider an enum.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for your comment. Using bool is indeed not a good practice. Let me refactor it to Enum. However, I'm not sure if we should expose this parameter externally for now, because in the bug I encountered, only AZDLS uses it, and this is related to opendal's implementation.

@Xuanwo
Copy link
Member

Xuanwo commented Nov 17, 2025

Hi, I'm a bit confused about this PR. What does it mean that "AZDLS has specific requirements for write operations that necessitate enabling append mode"? We can't write azdls with the native block API?

@Li0k
Copy link
Contributor Author

Li0k commented Nov 17, 2025

Hi, I'm a bit confused about this PR. What does it mean that "AZDLS has specific requirements for write operations that necessitate enabling append mode"? We can't write azdls with the native block API?

If I understand correctly, azfile constructs different writers based on the append flag. Repeatedly calling write on OneShotWriter will cause a panic, which is the issue we are encountering.

https://github.com/apache/opendal/blob/9746efca6aaa95776d467e7e5e88c5ec93dfd00d/core/src/services/azfile/backend.rs#L328

@Xuanwo
Copy link
Member

Xuanwo commented Nov 17, 2025

I got it, it's more of a missing feature in OpenDAL since it doesn't support streaming write for azdls. I've created apache/opendal#6799 to track this. Can you convert this PR into a feature request for iceberg-rust too?

Copy link
Contributor

@liurenjie1024 liurenjie1024 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @Li0k for this pr, but I don't think we should use such a hack way to resolve this issue. We should do it in opendal.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants