Skip to content

API: resolution for date_range, to_datetime, timedelta_range, to_timedelta #49060

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
jbrockmendel opened this issue Oct 12, 2022 · 6 comments
Labels
API Design Non-Nano datetime64/timedelta64 with non-nanosecond resolution

Comments

@jbrockmendel
Copy link
Member

jbrockmendel commented Oct 12, 2022

In 2.0 we'll support non-nanosecond datetime64 and timedelta64. ATM date_range, timedelta_range, to_datetime, and to_timedelta still are nano-only. This issue is about how to support non-nano in these functions.

Two main options: inference or a keyword. A keyword would be something like pd.date_range(start, end, periods=10, reso="ms"), and the default would be "ns". This is the simplest thing to implement, but adds more API surface.

inference for date_range would look at start and stop to determine the correct resolution. This could get messy if e.g. start and stop have different resos. ATM im thinking this isn't worth it.

inference for to_datetime (really in array_to_datetime) is more compelling in part bc I expect to_datetime to be called by library code for e.g. io.

@jbrockmendel jbrockmendel added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Oct 12, 2022
@mroeschke
Copy link
Member

mroeschke commented Oct 12, 2022

For the _range methods, what if freq is a lower resolution than reso? e.g. date_range("2022", periods=3, freq="D", reso="ms")

If the to_ methods have inference, would the resolution of each argument be collected and the highest one chosen as the inferred reso? e.g. to_timedelta([timedelta(day=1), timedelta(second=1), timedelta(millisecond=1])

@mroeschke mroeschke added API Design Non-Nano datetime64/timedelta64 with non-nanosecond resolution and removed Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Oct 12, 2022
@jbrockmendel
Copy link
Member Author

For the _range methods, what if freq is a lower resolution than reso? e.g. date_range("2022", periods=3, freq="D", reso="ms")

That wouldn't be a problem, would be identical to date_range("2022", periods=3, freq="D").astype("M8[ms]"). What would be a problem is the reverse, where freq is a higher-resolution than reso, e.g. date_range("2022", periods=3, freq="ns", reso="s"). We'd probably need to disallow that.

If the to_ methods have inference, would the resolution of each argument be collected and the highest one chosen as the inferred reso? e.g. to_timedelta([timedelta(day=1), timedelta(second=1), timedelta(millisecond=1])

In that particular case they are all pytimedelta objects which all get microsecond resolution. Suppose instead we have to_timedelta([Timedelta(days=1)._as_unit(unit) for unit in ["s", "ms", "us", "ns"]]). I think the way I would implement this would be something like

def array_to_timedelta(objs):
    try:
        res = array_to_timedelta_with_reso(objs, "ns")
    except OutOfBoundsTimedelta:
        try:
              res = array_to_timedelta_with_reso(objs, "us")
        [...]
   return res

def array_to_timedelta_with_reso(objs, reso):
    for item in objs:
           td = Timedelta(item)._as_unit(reso)  # <- will raise if either overflow or casting involves rounding
           [...]

This should avoid a major perf hit or API change for currently-working cases. The downside is it isn't inferring the best reso so much as the highest viable reso. Also wouldn't match scalar behavior.

@mroeschke
Copy link
Member

would be identical to date_range("2022", periods=3, freq="D").astype("M8[ms]")

Okay that is reasonable. I think if constructors have arguments that allow multiple ways to specify resolutions (freq, dtype, reso), we should definitely document the "order of operations"

@wiedeflo
Copy link

wiedeflo commented Mar 7, 2023

Since I could not find anything on this in the current release notes for 2.0.0 I wanted to ask if there are any updates on this issue?

@jbrockmendel
Copy link
Member Author

there is now a "unit"keyword in date_range and timedelta_range that specifies resolution. Haven't done to_datetime and to_timedelta yet.

@satyrmipt
Copy link

satyrmipt commented May 6, 2024

there is now a "unit"keyword in date_range and timedelta_range that specifies resolution. Haven't done to_datetime and to_timedelta yet.

Documentation is silent about what is resolution and its possible values. Please add link for possible values on this page https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.date_range.html

Edit: if you pass any string to unit, Value error would provide you with documentation: ValueError("'unit' must be one of 's', 'ms', 'us', 'ns'")

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
API Design Non-Nano datetime64/timedelta64 with non-nanosecond resolution
Projects
None yet
Development

No branches or pull requests

4 participants