Skip to content

Commit

Permalink
Fix formatting of Etc/... time zones in Presto's datetime_format and …
Browse files Browse the repository at this point in the history
…cast TSwTZ to varchar (facebookincubator#11409)

Summary:
Pull Request resolved: facebookincubator#11409

The file we use from Joda to generate time zone links (substitutions for certain time zone IDs that
we use when formatting) maps various versions of "UTC" to "Etc/GMT" and "Etc/UTC" as these are
the official IDs (see https://en.wikipedia.org/wiki/List_of_tz_database_time_zones ).

However, Joda also has logic at runtime to remove these prefixes and convert them to "UTC".  This
leads to Velox producing "Etc/GMT" or "Etc/UTC" when calling datetime_format with the "ZZZ" (or
more) pattern and when casting TSwTZ to varchar, while Joda produces "UTC".

To keep things simple, I've modified the script that generates our mapping of time zone IDs to simply
replace the Etc/GMT and Etc/UTC IDs with "UTC".  For cases like parsing, we already call
normalizeTimeZone on the time zone ID which does this (among other things), so this is consistent
with other places in the code, and, by doing it when generating the map, means we have to do less
normalization at runtime.

Reviewed By: xiaoxmeng

Differential Revision: D65348867

fbshipit-source-id: f39dc397b6c1be87a92aaa7dd93215df64c98b40
  • Loading branch information
Kevin Wilfong authored and facebook-github-bot committed Nov 1, 2024
1 parent ad27145 commit 8bb3bb9
Show file tree
Hide file tree
Showing 3 changed files with 55 additions and 16 deletions.
32 changes: 32 additions & 0 deletions velox/functions/prestosql/tests/DateTimeFunctionsTest.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -3572,6 +3572,38 @@ TEST_F(DateTimeFunctionsTest, formatDateTime) {
"America/Los_Angeles",
formatDatetime(parseTimestamp("1970-01-01"), "ZZZ"));

// Test the Etc/... time zones.
auto testFormatTimeZoneID =
[&](const std::string& inputTimeZoneID,
const std::string& expectedFormattedTimeZoneID) {
setQueryTimeZone(inputTimeZoneID);
EXPECT_EQ(
expectedFormattedTimeZoneID,
formatDatetime(parseTimestamp("1970-01-01"), "ZZZ"));
};
testFormatTimeZoneID("Etc/GMT", "UTC");
testFormatTimeZoneID("Etc/GMT+0", "UTC");
testFormatTimeZoneID("Etc/GMT+1", "-01:00");
testFormatTimeZoneID("Etc/GMT+10", "-10:00");
testFormatTimeZoneID("Etc/GMT+12", "-12:00");
testFormatTimeZoneID("Etc/GMT-0", "UTC");
testFormatTimeZoneID("Etc/GMT-2", "+02:00");
testFormatTimeZoneID("Etc/GMT-11", "+11:00");
testFormatTimeZoneID("Etc/GMT-14", "+14:00");
testFormatTimeZoneID("Etc/GMT0", "UTC");
testFormatTimeZoneID("Etc/Greenwich", "UTC");
testFormatTimeZoneID("Etc/UCT", "UTC");
testFormatTimeZoneID("Etc/Universal", "UTC");
testFormatTimeZoneID("Etc/UTC", "UTC");
testFormatTimeZoneID("Etc/Zulu", "UTC");
// These do not explicitly start with "Etc/" but they link to time zone IDs
// that do.
testFormatTimeZoneID("GMT0", "UTC");
testFormatTimeZoneID("Greenwich", "UTC");
testFormatTimeZoneID("UCT", "UTC");
testFormatTimeZoneID("UTC", "UTC");
testFormatTimeZoneID("Zulu", "UTC");

setQueryTimeZone("Asia/Kolkata");
// Literal test cases.
EXPECT_EQ("hello", formatDatetime(parseTimestamp("1970-01-01"), "'hello'"));
Expand Down
30 changes: 15 additions & 15 deletions velox/type/tz/TimeZoneLinks.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -67,19 +67,19 @@ const std::unordered_map<std::string, std::string>& getTimeZoneLinks() {
{"Cuba", "America/Havana"},
{"Egypt", "Africa/Cairo"},
{"Eire", "Europe/Dublin"},
{"Etc/GMT+0", "Etc/GMT"},
{"Etc/GMT-0", "Etc/GMT"},
{"Etc/GMT0", "Etc/GMT"},
{"Etc/Greenwich", "Etc/GMT"},
{"Etc/UCT", "Etc/UTC"},
{"Etc/Universal", "Etc/UTC"},
{"Etc/Zulu", "Etc/UTC"},
{"Etc/GMT+0", "UTC"},
{"Etc/GMT-0", "UTC"},
{"Etc/GMT0", "UTC"},
{"Etc/Greenwich", "UTC"},
{"Etc/UCT", "UTC"},
{"Etc/Universal", "UTC"},
{"Etc/Zulu", "UTC"},
{"GB", "Europe/London"},
{"GB-Eire", "Europe/London"},
{"GMT+0", "Etc/GMT"},
{"GMT-0", "Etc/GMT"},
{"GMT0", "Etc/GMT"},
{"Greenwich", "Etc/GMT"},
{"GMT+0", "UTC"},
{"GMT-0", "UTC"},
{"GMT0", "UTC"},
{"Greenwich", "UTC"},
{"Hongkong", "Asia/Hong_Kong"},
{"Iceland", "Atlantic/Reykjavik"},
{"Iran", "Asia/Tehran"},
Expand All @@ -101,7 +101,7 @@ const std::unordered_map<std::string, std::string>& getTimeZoneLinks() {
{"ROK", "Asia/Seoul"},
{"Singapore", "Asia/Singapore"},
{"Turkey", "Europe/Istanbul"},
{"UCT", "Etc/UTC"},
{"UCT", "UTC"},
{"US/Alaska", "America/Anchorage"},
{"US/Aleutian", "America/Adak"},
{"US/Arizona", "America/Phoenix"},
Expand All @@ -114,10 +114,10 @@ const std::unordered_map<std::string, std::string>& getTimeZoneLinks() {
{"US/Mountain", "America/Denver"},
{"US/Pacific", "America/Los_Angeles"},
{"US/Samoa", "Pacific/Pago_Pago"},
{"UTC", "Etc/UTC"},
{"Universal", "Etc/UTC"},
{"UTC", "UTC"},
{"Universal", "UTC"},
{"W-SU", "Europe/Moscow"},
{"Zulu", "Etc/UTC"},
{"Zulu", "UTC"},
{"America/Buenos_Aires", "America/Argentina/Buenos_Aires"},
{"America/Catamarca", "America/Argentina/Catamarca"},
{"America/Cordoba", "America/Argentina/Cordoba"},
Expand Down
9 changes: 8 additions & 1 deletion velox/type/tz/gen_timezone_links.py
Original file line number Diff line number Diff line change
Expand Up @@ -115,8 +115,15 @@ def main():
if columns[0] != "Link":
continue

# Officially the GMT/UTC time zone IDs start with Etc/ so Joda
# includes this in its link files. Joda ends up removing this
# prefix at runtime though, so we just skip ahead and do it here.
tz_value = columns[1]
if tz_value == "Etc/GMT" or tz_value == "Etc/UTC":
tz_value = "UTC"

entries.append(
entry_template.substitute(tz_key=columns[2], tz_value=columns[1])
entry_template.substitute(tz_key=columns[2], tz_value=tz_value)
)

print(cpp_template.substitute(entries="\n".join(entries)))
Expand Down

0 comments on commit 8bb3bb9

Please sign in to comment.