-
Notifications
You must be signed in to change notification settings - Fork 873
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Date
data types can cast to a time zone-specific timestamp
#7141
base: main
Are you sure you want to change the base?
Date
data types can cast to a time zone-specific timestamp
#7141
Conversation
717ed28
to
02a86ac
Compare
This is ready for review. The test cases comprehensively cover the entire cc/ @xudong963 @Omega359 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From a quick look over this looks nice
#[test] | ||
fn test_cast_date32_to_timestamp_with_timezone() { | ||
let tz = "+0545"; // UTC + 0545 is Asia/Kathmandu | ||
let a = Date32Array::from(vec![Some(18628), Some(18993), None]); // 2021-1-1, 2022-1-1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So the default tz of date is UTC, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So the default tz of date is UTC, right?
Yes, but shouldn't we ensure Date
s cast to timestamps with a particular timezone?
I'm also a bit confused by this comment. Are you implying that a
is improperly constructed? (i.e. the third element shouldn't be None
?) Here's a similar test case that tests for time zone aware timestamps:
arrow-rs/arrow-cast/src/cast/mod.rs
Lines 5277 to 5291 in 38d6e69
let array_with_tz = | |
TimestampMillisecondArray::from(vec![Some(864000003005), Some(1545696002001), None]) | |
.with_timezone(tz.to_string()); | |
let expected = vec![ | |
Some("1997-05-19 05:45:03.005000"), | |
Some("2018-12-25 05:45:02.001000"), | |
None, | |
]; | |
assert_cast_timestamp_to_string!( | |
array_with_tz, | |
DataType::Utf8View, | |
StringViewArray, | |
cast_options, | |
expected | |
); |
Maybe I'm misunderstanding something. Would appreciate some clarification. Thank you!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The test is okay for me. Thanks!
I'm just curious about which particular timezone we should use. Should we depend on a time zone configuration or use the UTC
directly?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm just curious about which particular timezone we should use. Should we depend on a time zone configuration or use the UTC directly?
Other test cases use random time zones like the +0545
mentioned above. I'd argue that for the sake of testing, we'd want to use a non-default time zone.
Thank you for this PR, and especially for the tests which make it clear what this is doing. I do have one question, is this the expected behaviour? In particular when casting timestamps without a timezone to timestamps with a timezone, the conversion is done preserving the "local" time. i.e.
I therefore wonder if date conversions should behave similarly, i.e.
instead of what this PR implements which interprets the date as being in the UTC timezone
With the behaviour in this PR I think you would get different behaviour going |
Interesting! Yeah, I agree that we should preserve the local time when casting. c19dbc5 resolves this. Now, when we cast a date to a time zone aware timestamp, we do a intermediary cast into a regular |
arrow-cast/src/cast/mod.rs
Outdated
(Date64, Timestamp(TimeUnit::Second, _)) => { | ||
let array = array | ||
.as_primitive::<Date64Type>() | ||
.unary::<_, TimestampSecondType>(|x| x / MILLISECONDS) | ||
.with_timezone_opt(to_tz.clone()), | ||
)), | ||
(Date64, Timestamp(TimeUnit::Millisecond, to_tz)) => Ok(Arc::new( | ||
array | ||
.unary::<_, TimestampSecondType>(|x| x / MILLISECONDS); | ||
|
||
cast_with_options(&array, to_type, cast_options) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can split into two arms if we don't like how we're eagerly recasting a Timestamp
.
(Date64, Timestamp(TimeUnit::Second, None)) => {
// single cast here
},
(Date64, Timestamp(TimeUnit::Second, Some(_))) => {
// first cast to Timestamp, then recast with time zone
}
#[test] | ||
fn test_cast_date32_to_timestamp_and_timestamp_with_timezone() { | ||
let tz = "+0545"; // UTC + 0545 is Asia/Kathmandu | ||
let a = Date32Array::from(vec![Some(18628), None, None]); // 2021-1-1, 2022-1-1 | ||
let array = Arc::new(a) as ArrayRef; | ||
|
||
let b = cast( | ||
&array, | ||
&DataType::Timestamp(TimeUnit::Second, Some(tz.into())), | ||
) | ||
.unwrap(); | ||
let c = b.as_primitive::<TimestampSecondType>(); | ||
let string_array = cast(&c, &DataType::Utf8).unwrap(); | ||
let result = string_array.as_string::<i32>(); | ||
assert_eq!("2021-01-01T00:00:00+05:45", result.value(0)); | ||
|
||
let b = cast(&array, &DataType::Timestamp(TimeUnit::Second, None)).unwrap(); | ||
let c = b.as_primitive::<TimestampSecondType>(); | ||
let string_array = cast(&c, &DataType::Utf8).unwrap(); | ||
let result = string_array.as_string::<i32>(); | ||
assert_eq!("2021-01-01T00:00:00", result.value(0)); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Proof that casting a Date
to a Timestamp
with/without time zones result in the same string.
Not the biggest fan of this test case, would appreciate some comments if we would want to a test case that compares the cast results (with/without time zone).
Weird, clippy seems to complain about a completely different region of the codebase? /cc @xudong963 |
FYI: #7167, maybe you can do an update. |
c37b1a4
to
7828b6d
Compare
Should be gtg |
Closes #7140. Related: apache/datafusion#14638
Currently, you can only cast
Date32
/Date64
into aTimestamp
without a time zone. This commit updates the casting rules to support date to time-zone specific timestamps and remodels the casting logic to support this feature.There are no user facing changes.