-
Notifications
You must be signed in to change notification settings - Fork 354
fix(azdls): enable append mode for AZDLS write operations #1856
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
crates/iceberg/src/io/file_io.rs
Outdated
| // Relative path of file to uri, starts at [`relative_path_pos`] | ||
| relative_path_pos: usize, | ||
| // Whether to use append mode for writes (required for some storage backends like AZDLS) | ||
| append_file: bool, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm generally not a fan of adding bools as args since it limits extension/API changes in the future and sets a precedent for exponentially complex input arguments that might not all be valid configs. Can we envision any other things that we might want to encode in a WriteMode enum with one value representing the current behavior and maybe an Append value. If the answer is no then I'll accept the bool, otherwise we might consider an enum.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot for your comment. Using bool is indeed not a good practice. Let me refactor it to Enum. However, I'm not sure if we should expose this parameter externally for now, because in the bug I encountered, only AZDLS uses it, and this is related to opendal's implementation.
|
Hi, I'm a bit confused about this PR. What does it mean that "AZDLS has specific requirements for write operations that necessitate enabling append mode"? We can't write azdls with the native block API? |
If I understand correctly, azfile constructs different writers based on the append flag. Repeatedly calling write on OneShotWriter will cause a panic, which is the issue we are encountering. |
|
I got it, it's more of a missing feature in OpenDAL since it doesn't support streaming write for azdls. I've created apache/opendal#6799 to track this. Can you convert this PR into a feature request for iceberg-rust too? |
liurenjie1024
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @Li0k for this pr, but I don't think we should use such a hack way to resolve this issue. We should do it in opendal.
Which issue does this PR close?
What changes are included in this PR?
risingwavelabs#89
Summary
Enable append mode for Azure Data Lake Storage (AZDLS) write operations.
Problem
AZDLS has specific requirements for write operations that necessitate enabling append mode. Previously, there was no clean way to pass this parameter through the
FileIOinterface without affecting read operations.https://github.com/apache/opendal/blob/9746efca6aaa95776d467e7e5e88c5ec93dfd00d/core/src/services/azfile/backend.rs#L328
When writing to Parquet, triggering a RowGroup switch will cause multiple IOs, thus hitting the OneShot limit.
Solution
append_file: boolfield toOutputFilestructnew_output()append_file = truefor AZDLS,falseotherwisecreate_operator()signature unchanged to avoid polluting read-only interfacesDesign Rationale
This approach was chosen because:
create_operator()return signature, keeping read operations cleannew_output)Storagetype, leveraging existing path parsing logicAre these changes tested?