Skip to content

[Feature] Support Spark expression: aes_decrypt #3188

@andygrove

Description

@andygrove

What is the problem the feature request solves?

Note: This issue was generated with AI assistance. The specification details have been extracted from Spark documentation and may need verification.

Comet does not currently support the Spark aes_decrypt function, causing queries using this function to fall back to Spark's JVM execution instead of running natively on DataFusion.

The AesDecrypt expression provides AES (Advanced Encryption Standard) decryption functionality for encrypted binary data. It supports multiple AES modes including GCM (Galois/Counter Mode) with optional Additional Authenticated Data (AAD) for authenticated encryption scenarios.

Supporting this expression would allow more Spark workloads to benefit from Comet's native acceleration.

Describe the potential solution

Spark Specification

Syntax:

AES_DECRYPT(input, key [, mode [, padding [, aad]]])
// DataFrame API
col("encrypted_data").aes_decrypt(col("key"), col("mode"), col("padding"), col("aad"))

Arguments:

Argument Type Description
input Binary The encrypted binary data to decrypt
key Binary The encryption key used for decryption
mode String The AES mode (default: "GCM")
padding String The padding scheme (default: "DEFAULT")
aad Binary Additional Authenticated Data for authenticated modes (default: empty)

Return Type: Returns BinaryType - the decrypted data as a binary array.

Supported Data Types:

  • input: Binary data only
  • key: Binary data only
  • mode: String with collation support (supports trim collation)
  • padding: String with collation support (supports trim collation)
  • aad: Binary data only

Edge Cases:

  • Null input: Returns null if any required parameter (input, key) is null
  • Invalid key length: Throws exception for keys that don't match AES requirements (128, 192, or 256 bits)
  • Corrupted ciphertext: Returns null or throws exception for malformed encrypted data
  • Authentication failure: For authenticated modes, returns null if AAD doesn't match or authentication tag is invalid
  • Empty AAD: Treated as valid input (empty byte array) for modes that support AAD
  • Unsupported mode/padding: Throws exception for invalid combinations

Examples:

-- Basic AES decryption with default GCM mode
SELECT AES_DECRYPT(unbase64('AAAAAAAAAAAAAAAAQiYi+sTLm7KD9UcZ2nlRdYDe/PX4'), 
                   'abcdefghijklmnop12345678ABCDEFGH');

-- AES decryption with specific mode, padding and AAD
SELECT AES_DECRYPT(unbase64('AAAAAAAAAAAAAAAAQiYi+sTLm7KD9UcZ2nlRdYDe/PX4'), 
                   'abcdefghijklmnop12345678ABCDEFGH', 
                   'GCM', 
                   'DEFAULT', 
                   'This is an AAD mixed into the input');
// DataFrame API usage
import org.apache.spark.sql.functions._

df.select(
  aes_decrypt(
    col("encrypted_data"),
    col("encryption_key"),
    lit("GCM"),
    lit("DEFAULT"),
    col("additional_auth_data")
  ).alias("decrypted_data")
)

Implementation Approach

See the Comet guide on adding new expressions for detailed instructions.

  1. Scala Serde: Add expression handler in spark/src/main/scala/org/apache/comet/serde/
  2. Register: Add to appropriate map in QueryPlanSerde.scala
  3. Protobuf: Add message type in native/proto/src/proto/expr.proto if needed
  4. Rust: Implement in native/spark-expr/src/ (check if DataFusion has built-in support first)

Additional context

Difficulty: Large
Spark Expression Class: org.apache.spark.sql.catalyst.expressions.AesDecrypt

Related:

  • AesEncrypt - Corresponding AES encryption function
  • Base64 / UnBase64 - For encoding/decoding binary data to/from strings
  • Sha2 - For generating cryptographic hashes

This issue was auto-generated from Spark reference documentation.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions