Skip to content

Conversation

@surister
Copy link
Contributor

@surister surister commented Dec 5, 2025

Fixes #459

With this PR we now support creating arrays with precission and timezone support:

from datetime import datetime, timezone
from arro3.core import Array, DataType

a = Array([datetime.now()], type=DataType.timestamp("ns"))
b = Array([datetime.now(tz=timezone.utc)], type=DataType.timestamp("s", tz="UTC"))
tz_name = "America/Chicago"
tzinfo = zoneinfo.ZoneInfo(tz_name)
expected: datetime = datetime.now().astimezone(timezone(tzinfo.utcoffset(datetime.now())))
c = Array([expected], type=DataType.timestamp("ms", tz=tz_name))
print(a, b, c)
arro3.core.Array<Timestamp(ns)>
[
  2025-12-05T16:47:26.094894,
]
 arro3.core.Array<Timestamp(s, "UTC")>
[
  2025-12-05T15:47:26Z,
]
 arro3.core.Array<Timestamp(ms, "America/Chicago")>
[
  2025-12-05T09:47:26.095-06:00,
]

@github-actions github-actions bot added the feat label Dec 5, 2025
@surister surister force-pushed the feat/support_timestamp_arrays branch 7 times, most recently from d6ca76a to b61a580 Compare December 5, 2025 16:08
@surister surister force-pushed the feat/support_timestamp_arrays branch from b61a580 to 533b90e Compare December 5, 2025 16:11
Copy link
Owner

@kylebarron kylebarron left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR! I think there are a few more test cases I'd like to see:

  • Passing an array of tz-aware datetimes, each with a separate timezone
  • Passing an array of partially tz-aware datetimes, some with a naive timezone
  • Passing an array of tz-aware datetimes where the type passed has a naive timezone (not sure what happens here, maybe check pyarrow)
  • Passing an array of tz-aware datetimes where the passed type has a tz that is not the same as the datetimes that are passed in
  • Having partial and full nulls in the input array

Perhaps each of these test cases could also use the pyarrow constructor, so we know we're consistent with them

.collect()
}
None => {
let vs: Vec<Option<chrono::NaiveDateTime>> = obj.extract()?;
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This says that if the type passed in doesn't have a tz, then ignore any tz passed with the input data. That seems easily wrong if we have a variety of timezones in our array of input timestamps; I'm curious how pyarrow handles this

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does not ignore it, everything is converted to chrono::NaiveDateTime, so if there is a dt with tz it errors out:

dt = datetime(1999, 8, 7, 11, 12, 13, 141516)
tzinfo = zoneinfo.ZoneInfo('Europe/Madrid')
dt2: datetime = dt.astimezone(timezone(tzinfo.utcoffset(dt)))

arr = Array([dt, None, dt2], type=DataType.timestamp("ms"))
# E       TypeError: expected a datetime without tzinfo

In pyarrow it seems to work this way:

Same datetime, one has a tz and array type is tz

dt = datetime(1999, 8, 7, 11, 12, 13, 141516)
tzinfo = zoneinfo.ZoneInfo('Europe/Madrid')
dt2: datetime = dt.astimezone(timezone(tzinfo.utcoffset(dt)))
arr = pyarrow.array([dt, None, dt2], type=pyarrow.timestamp('ms', 'Europe/Madrid'))
print(arr.type)

It does not error out:

[
  1999-08-07 11:12:13.141Z,
  null,
  1999-08-07 09:12:13.141Z
]
timestamp[ms, tz=Europe/Madrid]

It applies the given tz to all dates.

Back to python:

[
    datetime.datetime(1999, 8, 7, 13, 12, 13, 141000, tzinfo=<DstTzInfo 'Europe/Madrid' CEST+2:00:00 DST>),
    None,
    datetime.datetime(1999, 8, 7, 11, 12, 13, 141000, tzinfo=<DstTzInfo 'Europe/Madrid' CEST+2:00:00 DST>)
]

Both have applied tz.

Same datetime, one has tz and array type does not have tz

[
    datetime.datetime(1999, 8, 7, 11, 12, 13, 141516),
    None, 
    datetime.datetime(1999, 8, 7, 9, 12, 13, 141516)
]

Both dates are still as is, but tz information is now lost.

So I guess the expected functionality would be to apply the given tz to all dates, if no tz exists just take the dates as is.

Copy link
Owner

@kylebarron kylebarron Jan 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does not error out:

[
  1999-08-07 11:12:13.141Z,
  null,
  1999-08-07 09:12:13.141Z
]
timestamp[ms, tz=Europe/Madrid]

That feels wrong...? You have two input datetimes that represent the same time. And when they are returned to Python they have the same time. But displayed from Rust they are displayed as different.

Edit: I didn't see that back in Python they are different

TimeUnit::Nanosecond => {
let values: Vec<_> = values
.iter()
.map(|v| v.unwrap().timestamp_nanos_opt().unwrap())
Copy link
Owner

@kylebarron kylebarron Dec 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should probably the case where nanosecond timestamps overflow and error instead of panic

Copy link
Contributor Author

@surister surister Jan 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thinking about this again, can this actually ever overflow? DataType.timestamp coerces things into datetime objects:

arr = Array([18_446_744_073_709_551_61, None], type=DataType.timestamp("ns"))
# E       TypeError: 'int' object cannot be cast as 'datetime'

And datetime objects support max precision of 6 digits (microseconds), so can we actually create an object with precision ns that actually hits that branch? Perhaps we can delete it altogether?

Copy link
Contributor Author

@surister surister Jan 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Came to me as I hit comment, we can cast from a u64 and that'll have ns precission.

arr = Array([18_446_744_073_709_551_61, None], type=DataType.uint64()).cast(DataType.timestamp("ns"))
print(arr)
[
  2028-06-15T09:33:27.370955161,
  null,
]

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Btw, it does not panic when there is an overflow:

arr = Array([18_446_744_073_709_551_610, None], type=DataType.uint64()).cast(DataType.timestamp("ns"))
print(arr)

It just nulls it:

[
  null,
  null,
]

So what do we do with nano overflow?

Checking pyarrow it seems that they cannot overflow, since casting a uint64 to timestamp is not allowed:

arr = pyarrow.array([18_446_744_073_709_551_61, None], type=pyarrow.uint64()).cast(pyarrow.timestamp("ns"))
print(arr)
E   pyarrow.lib.ArrowNotImplementedError: Unsupported cast from uint64 to timestamp using function cast_timestamp

pyarrow/error.pxi:92: ArrowNotImplementedError

And if we try coerce an overflow:

dt = datetime(1999, 8, 7, 11, 12, 13, 141516)
arr = pyarrow.array([9_223_372_036_854_775_8070, None], type=pyarrow.int64()).cast(pyarrow.timestamp("ns"))
print(arr)

We get an overflow error but that's not time timestampo nano overflowing, but the creation of the int64.

So it's up to us to decide, do we silently null it, trunc it to nano precision or just raise an error?

@surister surister force-pushed the feat/support_timestamp_arrays branch from 4d40446 to d6746af Compare December 14, 2025 12:39
@surister surister force-pushed the feat/support_timestamp_arrays branch from d6746af to ab8b9cc Compare December 14, 2025 12:43
Arc::new(StringViewArray::from(slices))
}
DataType::Timestamp(unit, tz) => {
// We normalize all datetimes to datetimes in UTC.
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a more detailed comment here, explaining why this is valid?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Cannot create Table with Array of Timestamp type

2 participants