-
Notifications
You must be signed in to change notification settings - Fork 1.5k
fix: unambiguously truncate time in date_trunc function #9068
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
When date_trunc is truncating a timestamp with a geographic timezone it would previously get stuck if the local reprentation of the time could be ambiguously interpretted. This happens when the clocks "go back". The update here is to use the original timestamp offset as the tie-breaker when the local representation of the truncated time could be ambiguous.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks sensible to me
// the original time must have been within the ambiguous local time | ||
// period. Therefore the offset of one of these times should match the | ||
// offset of the original time. | ||
if datetime1.offset().fix() == offset { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if datetime1.offset().fix() == offset { | |
if datetime1.offset().fix() == value.offset().fix() { |
You could defer this computation to here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @mhilton -- I also pushed a .slt
test to this code for SQL level verification (it panic's on main, and passes on this branch).
Thank you very much ❤️
I plan to merge this tomorrow so there is at least one day for anyone one else who might want a chance to comment on this PR to do so. Please let us know if anyone would like more time to review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for fixing it @mhilton!
arrow_cast(ts, 'Timestamp(Nanosecond, Some("Europe/Berlin"))') as ts | ||
from timestamp_utc; -- have to convert to utc prior to converting to berlin | ||
|
||
query PT |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@alamb What does query PT
mean? Same question for query PPPP
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
https://github.com/apache/arrow-datafusion/tree/main/datafusion/sqllogictest#slt-file-format has the full details. It describes how the query output should be compared with the expected output.
LocalResult::None => { | ||
// It is impossible to truncate from a time that does exist into one that doesn't. | ||
panic!("date_trunc produced impossible time") | ||
} | ||
LocalResult::Single(datetime) => datetime, | ||
LocalResult::Ambiguous(datetime1, datetime2) => { | ||
// Because we are truncating from an equally or more specific time | ||
// the original time must have been within the ambiguous local time | ||
// period. Therefore the offset of one of these times should match the | ||
// offset of the original time. | ||
if datetime1.offset().fix() == value.offset().fix() { | ||
datetime1 | ||
} else { | ||
datetime2 | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice!
Add test for the historical America/Sao_Paulo timezone which changed in and out of DST at midnight.
I merged up from main to resolve conflicts after #9040 was merged |
FYI @Omega359 as you have been working in this part of the code recently |
match truncated.and_local_timezone(value.timezone()) { | ||
LocalResult::None => { | ||
// It is impossible to truncate from a time that does exist into one that doesn't. | ||
panic!("date_trunc produced impossible time") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Panic! seems rather severe for this - exec_err! I think would be a better option as it will bubble up the error back to the caller.
Historically Sao Paulo, and possibly other places, have had daylight savings time that started at midnight. This causes the day to start at 1am. The naive method used by date_trunc to truncate to 'day' will create a non-existent time in these circumstances. Adjust the timestamps produced by date_trunc in this case to be valid within the required timezone.
// an hour that doesn't exist due to daylight savings. On known example where | ||
// this can happen is with historic dates in the America/Sao_Paulo time zone. | ||
// To account for this adjust the time by a few hours, convert to local time, | ||
// and then adjust the time back. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Timezones continue to blow my mind
Thank you everyone involved. Amazing |
Which issue does this PR close?
Closes #8899
Rationale for this change
When date_trunc is truncating a timestamp with a geographic timezone it would previously get stuck if the local reprentation of the time could be ambiguously interpreted. This happens when the clocks "go back". The update here is to use the original timestamp offset as the tie-breaker when the local representation of the truncated time could be ambiguous.
What changes are included in this PR?
Are these changes tested?
Yes, additional unit tests
Are there any user-facing changes?