Skip to content

Conversation

nikhilmehta16
Copy link

@nikhilmehta16 nikhilmehta16 commented Aug 1, 2025

Summary

Enables native Arrow decimal type support by removing unnecessary blocking condition in ScanRow and enabling native decimal support by default. The decimal128Container infrastructure was already complete - these changes allow decimal columns to return native Arrow decimal128 types instead of raw bytes.

Changes

  • Enable native decimal by default: Set UseArrowNativeDecimal = true in ArrowConfig.WithDefaults()
  • Remove ScanRow blocking: Remove decimal type check from ScanRow error condition
  • Update test expectations: Allow decimal columns to scan successfully
  • Interval types remain blocked (still unsupported)

Why Native Arrow Decimal128?

This provides significant advantages over the previous raw bytes approach:

  • Native Arrow types: Arrow records now contain proper decimal128 types with precision/scale metadata
  • Type safety: Schema correctly reflects arrow.DECIMAL128 instead of arrow.STRING
  • No manual parsing: Direct access to decimal values without byte-level parsing
  • Precision preservation: Full decimal precision maintained through Arrow's native decimal128 format

Impact

Before:

  • UseArrowNativeDecimal = false by default
  • Decimal columns → raw bytes ([]byte) in Arrow records requiring manual parsing
  • Schema shows arrow.STRING type for decimal columns

After:

  • UseArrowNativeDecimal = true by default
  • Decimal columns → native arrow.Decimal128 types in Arrow records
  • Schema correctly shows arrow.DECIMAL128 with precision/scale metadata
  • Direct conversion to float64 available via decimal128Value.ToFloat64(scale)

For Direct Arrow Usage

Users working with Arrow records directly now get:

// Schema reflects true types
field.Type.ID() == arrow.DECIMAL128  // ✅ Instead of arrow.STRING

Configuration Note

The UseArrowNativeDecimal option is currently internal and not exposed through the public API. Users cannot currently override this setting, but the new default behavior provides proper Arrow type semantics.

Testing

✅ Updated test expectations to reflect new default behavior
✅ All tests pass locally
✅ Arrow records now contain native decimal128 types instead of strings

Breaking Change

This is a minor breaking change for users directly accessing Arrow records, as decimal columns now return arrow.Decimal128Array instead of arrow.StringArray. However, this provides the correct Arrow type semantics and eliminates the need for manual parsing.

Remove blocking condition for decimal types in ScanRow method.
The decimal128Container infrastructure is complete and functional,
so native decimal scanning now works properly.

Only interval types remain unsupported and continue to be blocked.

Signed-off-by: Nikhil <[email protected]>
Update 'Error on retrieving not implemented native arrow types' test
to expect success for decimal columns (index < 4 instead of < 3).
Decimal types now work, only interval types should still error.

Signed-off-by: Nikhil <[email protected]>
@@ -886,7 +886,7 @@ func TestArrowRowScanner(t *testing.T) {

err := ars.ScanRow(dest, 0)

if i < 3 {
if i < 4 {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a separate test to verify changes in ScanRow? That tests ScanRow no longer blocks decimal types

Copy link
Contributor

@shivam2680 shivam2680 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for making these changes! LGTM for code changes
can you update doc.go to indicate data type change.
DECIMAL(p,s) --> float64

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants