Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parquet Modular decryption support #6637

Open
wants to merge 70 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
70 commits
Select commit Hold shift + click to select a range
9f240bd
first commit
ggershinsky Mar 21, 2024
3c1ca4f
Use ParquetMetaDataReader
rok Nov 23, 2024
a1bf0ea
Fix CI
rok Nov 23, 2024
d752073
test
rok Dec 3, 2024
8e2e118
save progress
rok Dec 11, 2024
be10eb3
work
rok Dec 16, 2024
46910b2
Review feedback
rok Dec 17, 2024
e8e5df2
page decompression issue
rok Dec 17, 2024
8501cf9
add update_aad
rok Dec 17, 2024
1a8fd94
Change encrypt and decrypt to return Results
adamreeve Dec 17, 2024
b5a7336
Use correct page ordinal and module type in AADs
adamreeve Dec 18, 2024
968209f
Tidy up ordinal types
adamreeve Dec 18, 2024
a2507c5
Lint
rok Dec 18, 2024
3a5e8bb
Fix regular deserialization path
rok Dec 18, 2024
ffd4a7e
cleaning
rok Dec 18, 2024
22b2abb
Update data checks in test
adamreeve Dec 19, 2024
9f58792
start non-uniform decryption
rok Dec 19, 2024
e6e056a
Add missing doc comments
adamreeve Dec 19, 2024
cca1155
Make encryption an optional feature
adamreeve Dec 20, 2024
952892d
Handle when a file is encrypted but encryption is disabled or no decr…
adamreeve Dec 20, 2024
8aa8ba4
Allow for plaintext footer
rok Dec 22, 2024
490e153
work
rok Dec 23, 2024
6763ee9
Fix method name
adamreeve Dec 22, 2024
af6c589
work
rok Jan 4, 2025
9cf130d
Minor
rok Jan 6, 2025
6014acd
work
rok Jan 7, 2025
2f09a88
work
rok Jan 9, 2025
9104ab5
work
rok Jan 20, 2025
3b3b75a
Fix reading to end of file
adamreeve Jan 21, 2025
40d3c21
Refactor tests
adamreeve Jan 21, 2025
3f7e841
Fix non-uniform encryption configuration
adamreeve Jan 21, 2025
bf4df8a
Don't use footer key for non-encrypted columns
adamreeve Jan 21, 2025
135eef2
Rebase and cleanup
rok Jan 21, 2025
4617870
Cleanup
rok Jan 21, 2025
397d37b
Cleanup
rok Jan 21, 2025
3d3bfd8
Cleanup
rok Jan 21, 2025
e53306e
Cleanup
rok Jan 21, 2025
95888bc
Cleanup
rok Jan 21, 2025
ed1bb3c
Cleanup
rok Jan 21, 2025
ee2cbed
lint
rok Jan 21, 2025
fbd23cb
Remove encryption setup
rok Jan 22, 2025
e050751
Fix building with ring on wasm
rok Jan 22, 2025
62ac361
file_decryptor into a seperate module
rok Jan 22, 2025
8ab9a43
lint
rok Jan 22, 2025
febbe83
FileDecryptionProperties should have at least one key
rok Jan 22, 2025
e5a788e
Move cyphertext reading into decryptor
rok Jan 23, 2025
423411d
More tidy up of footer key handling
adamreeve Jan 23, 2025
187e7de
Get column decryptors as RingGcmBlockDecryptor
adamreeve Jan 23, 2025
65cebbe
Use Arc<dyn BlockDecryptor>
adamreeve Jan 24, 2025
53e554e
Fix file metadata tests
adamreeve Jan 24, 2025
55e55ce
Handle reading plaintext footer files without decryption properties
adamreeve Jan 24, 2025
98cc63e
Split up encryption modules further
adamreeve Jan 24, 2025
0ff9404
Error instead of panic for AES-GCM-CTR
adamreeve Jan 24, 2025
c6d4dca
load_async
rok Feb 5, 2025
dc19abc
new_with_options
rok Feb 5, 2025
329a613
Add tests
rok Feb 5, 2025
00aa47a
get_metadata
rok Feb 5, 2025
fb6cdbc
Add CryptoContext to async_reader
rok Feb 7, 2025
aa44408
Add row_group_ordinal to InMemoryRowGroup
rok Feb 7, 2025
497abb3
Adjust docstrings
rok Feb 7, 2025
16e9efe
Apply suggestions from code review
rok Feb 10, 2025
95a3097
Review feedback
rok Feb 10, 2025
1e73b25
move file_decryption_properties into ArrowReaderOptions
rok Feb 10, 2025
e3e3163
make create_page_aad method of CryptoContext
rok Feb 10, 2025
105c3e9
Review feedback
rok Feb 10, 2025
a8a204e
Infer ModuleType in create_page_aad
rok Feb 10, 2025
fb3b6b0
add create_page_header_aad
rok Feb 10, 2025
11c4e7a
Review feedback
rok Feb 14, 2025
815e35d
Update parquet/src/arrow/async_reader/store.rs
rok Feb 14, 2025
d3df0ab
Review feedback
rok Feb 16, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 5 additions & 0 deletions parquet/Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,8 @@ rust-version = { workspace = true }

[target.'cfg(target_arch = "wasm32")'.dependencies]
ahash = { version = "0.8", default-features = false, features = ["compile-time-rng"] }
# See https://github.com/briansmith/ring/issues/918#issuecomment-2077788925
ring = { version = "0.17", default-features = false, features = ["wasm32_unknown_unknown_js", "std"], optional = true }

[target.'cfg(not(target_arch = "wasm32"))'.dependencies]
ahash = { version = "0.8", default-features = false, features = ["runtime-rng"] }
Expand Down Expand Up @@ -70,6 +72,7 @@ half = { version = "2.1", default-features = false, features = ["num-traits"] }
sysinfo = { version = "0.33.0", optional = true, default-features = false, features = ["system"] }
crc32fast = { version = "1.4.2", optional = true, default-features = false }
simdutf8 = { version = "0.1.5", optional = true, default-features = false }
ring = { version = "0.17", default-features = false, features = ["std"], optional = true }

[dev-dependencies]
base64 = { version = "0.22", default-features = false, features = ["std"] }
Expand Down Expand Up @@ -125,6 +128,8 @@ sysinfo = ["dep:sysinfo"]
crc = ["dep:crc32fast"]
# Enable SIMD UTF-8 validation
simdutf8 = ["dep:simdutf8"]
# Enable Parquet modular encryption support
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we please also document this new flag here
https://github.com/apache/arrow-rs/tree/main/parquet#feature-flags

Maybe we should update the feature support matrix as well

encryption = ["dep:ring"]


[[example]]
Expand Down
9 changes: 8 additions & 1 deletion parquet/examples/read_with_rowgroup.rs
Original file line number Diff line number Diff line change
Expand Up @@ -35,7 +35,12 @@ async fn main() -> Result<()> {
let mut file = File::open(&path).await.unwrap();

// The metadata could be cached in other places, this example only shows how to read
let metadata = file.get_metadata().await?;
let metadata = file
.get_metadata(
#[cfg(feature = "encryption")]
None,
)
.await?;

for rg in metadata.row_groups() {
let mut rowgroup = InMemoryRowGroup::create(rg.clone(), ProjectionMask::all());
Expand Down Expand Up @@ -121,6 +126,8 @@ impl RowGroups for InMemoryRowGroup {
self.metadata.column(i),
self.num_rows(),
None,
#[cfg(feature = "encryption")]
None,
)?);

Ok(Box::new(ColumnChunkIterator {
Expand Down
Loading
Loading