-
Notifications
You must be signed in to change notification settings - Fork 286
Description
What is the problem the feature request solves?
Note: This issue was generated with AI assistance. The specification details have been extracted from Spark documentation and may need verification.
Comet does not currently support the Spark aes_decrypt function, causing queries using this function to fall back to Spark's JVM execution instead of running natively on DataFusion.
The AesDecrypt expression provides AES (Advanced Encryption Standard) decryption functionality for encrypted binary data. It supports multiple AES modes including GCM (Galois/Counter Mode) with optional Additional Authenticated Data (AAD) for authenticated encryption scenarios.
Supporting this expression would allow more Spark workloads to benefit from Comet's native acceleration.
Describe the potential solution
Spark Specification
Syntax:
AES_DECRYPT(input, key [, mode [, padding [, aad]]])// DataFrame API
col("encrypted_data").aes_decrypt(col("key"), col("mode"), col("padding"), col("aad"))Arguments:
| Argument | Type | Description |
|---|---|---|
| input | Binary | The encrypted binary data to decrypt |
| key | Binary | The encryption key used for decryption |
| mode | String | The AES mode (default: "GCM") |
| padding | String | The padding scheme (default: "DEFAULT") |
| aad | Binary | Additional Authenticated Data for authenticated modes (default: empty) |
Return Type: Returns BinaryType - the decrypted data as a binary array.
Supported Data Types:
- input: Binary data only
- key: Binary data only
- mode: String with collation support (supports trim collation)
- padding: String with collation support (supports trim collation)
- aad: Binary data only
Edge Cases:
- Null input: Returns null if any required parameter (input, key) is null
- Invalid key length: Throws exception for keys that don't match AES requirements (128, 192, or 256 bits)
- Corrupted ciphertext: Returns null or throws exception for malformed encrypted data
- Authentication failure: For authenticated modes, returns null if AAD doesn't match or authentication tag is invalid
- Empty AAD: Treated as valid input (empty byte array) for modes that support AAD
- Unsupported mode/padding: Throws exception for invalid combinations
Examples:
-- Basic AES decryption with default GCM mode
SELECT AES_DECRYPT(unbase64('AAAAAAAAAAAAAAAAQiYi+sTLm7KD9UcZ2nlRdYDe/PX4'),
'abcdefghijklmnop12345678ABCDEFGH');
-- AES decryption with specific mode, padding and AAD
SELECT AES_DECRYPT(unbase64('AAAAAAAAAAAAAAAAQiYi+sTLm7KD9UcZ2nlRdYDe/PX4'),
'abcdefghijklmnop12345678ABCDEFGH',
'GCM',
'DEFAULT',
'This is an AAD mixed into the input');// DataFrame API usage
import org.apache.spark.sql.functions._
df.select(
aes_decrypt(
col("encrypted_data"),
col("encryption_key"),
lit("GCM"),
lit("DEFAULT"),
col("additional_auth_data")
).alias("decrypted_data")
)Implementation Approach
See the Comet guide on adding new expressions for detailed instructions.
- Scala Serde: Add expression handler in
spark/src/main/scala/org/apache/comet/serde/ - Register: Add to appropriate map in
QueryPlanSerde.scala - Protobuf: Add message type in
native/proto/src/proto/expr.protoif needed - Rust: Implement in
native/spark-expr/src/(check if DataFusion has built-in support first)
Additional context
Difficulty: Large
Spark Expression Class: org.apache.spark.sql.catalyst.expressions.AesDecrypt
Related:
AesEncrypt- Corresponding AES encryption functionBase64/UnBase64- For encoding/decoding binary data to/from stringsSha2- For generating cryptographic hashes
This issue was auto-generated from Spark reference documentation.