Skip to content

fix: unambiguously truncate time in date_trunc function #9068

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 11 commits into from
Jan 31, 2024

Conversation

mhilton
Copy link
Contributor

@mhilton mhilton commented Jan 30, 2024

Which issue does this PR close?

Closes #8899

Rationale for this change

When date_trunc is truncating a timestamp with a geographic timezone it would previously get stuck if the local reprentation of the time could be ambiguously interpreted. This happens when the clocks "go back". The update here is to use the original timestamp offset as the tie-breaker when the local representation of the truncated time could be ambiguous.

What changes are included in this PR?

Are these changes tested?

Yes, additional unit tests

Are there any user-facing changes?

When date_trunc is truncating a timestamp with a geographic
timezone it would previously get stuck if the local reprentation of
the time could be ambiguously interpretted. This happens when the
clocks "go back". The update here is to use the original timestamp
offset as the tie-breaker when the local representation of the
truncated time could be ambiguous.
@github-actions github-actions bot added the physical-expr Changes to the physical-expr crates label Jan 30, 2024
Copy link
Contributor

@tustvold tustvold left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks sensible to me

// the original time must have been within the ambiguous local time
// period. Therefore the offset of one of these times should match the
// offset of the original time.
if datetime1.offset().fix() == offset {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if datetime1.offset().fix() == offset {
if datetime1.offset().fix() == value.offset().fix() {

You could defer this computation to here

@github-actions github-actions bot added the sqllogictest SQL Logic Tests (.slt) label Jan 30, 2024
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @mhilton -- I also pushed a .slt test to this code for SQL level verification (it panic's on main, and passes on this branch).

Thank you very much ❤️

@alamb
Copy link
Contributor

alamb commented Jan 30, 2024

I plan to merge this tomorrow so there is at least one day for anyone one else who might want a chance to comment on this PR to do so. Please let us know if anyone would like more time to review

Copy link
Contributor

@appletreeisyellow appletreeisyellow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for fixing it @mhilton!

arrow_cast(ts, 'Timestamp(Nanosecond, Some("Europe/Berlin"))') as ts
from timestamp_utc; -- have to convert to utc prior to converting to berlin

query PT
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@alamb What does query PT mean? Same question for query PPPP

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://github.com/apache/arrow-datafusion/tree/main/datafusion/sqllogictest#slt-file-format has the full details. It describes how the query output should be compared with the expected output.

Comment on lines 572 to 586
LocalResult::None => {
// It is impossible to truncate from a time that does exist into one that doesn't.
panic!("date_trunc produced impossible time")
}
LocalResult::Single(datetime) => datetime,
LocalResult::Ambiguous(datetime1, datetime2) => {
// Because we are truncating from an equally or more specific time
// the original time must have been within the ambiguous local time
// period. Therefore the offset of one of these times should match the
// offset of the original time.
if datetime1.offset().fix() == value.offset().fix() {
datetime1
} else {
datetime2
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!

mhilton and others added 2 commits January 30, 2024 19:06
Add test for the historical America/Sao_Paulo timezone which changed
in and out of DST at midnight.
@alamb
Copy link
Contributor

alamb commented Jan 30, 2024

I merged up from main to resolve conflicts after #9040 was merged

@alamb
Copy link
Contributor

alamb commented Jan 30, 2024

FYI @Omega359 as you have been working in this part of the code recently

match truncated.and_local_timezone(value.timezone()) {
LocalResult::None => {
// It is impossible to truncate from a time that does exist into one that doesn't.
panic!("date_trunc produced impossible time")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Panic! seems rather severe for this - exec_err! I think would be a better option as it will bubble up the error back to the caller.

Historically Sao Paulo, and possibly other places, have had daylight
savings time that started at midnight. This causes the day to start
at 1am. The naive method used by date_trunc to truncate to 'day' will
create a non-existent time in these circumstances. Adjust the
timestamps produced by date_trunc in this case to be valid within
the required timezone.
// an hour that doesn't exist due to daylight savings. On known example where
// this can happen is with historic dates in the America/Sao_Paulo time zone.
// To account for this adjust the time by a few hours, convert to local time,
// and then adjust the time back.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Timezones continue to blow my mind

@alamb
Copy link
Contributor

alamb commented Jan 31, 2024

Thank you everyone involved. Amazing

@alamb alamb merged commit feeee04 into apache:main Jan 31, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
physical-expr Changes to the physical-expr crates sqllogictest SQL Logic Tests (.slt)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

date_trunc panicked with message: called Option::unwrap() on a None value
5 participants