Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
REF/ENH: Constructors for DatetimeArray/TimedeltaArray #23493
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
REF/ENH: Constructors for DatetimeArray/TimedeltaArray #23493
Changes from all commits
f25d24c
e47c200
2cb7597
a4512b7
a5ef959
83b04fe
e8abc83
a4c8671
98dca45
56fd95e
5f92cfa
0e15536
1a015f6
5445a56
272f4b1
35195bd
bb394dc
510ae3d
49cf495
3a62633
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BTW, if you want to simplify this already a bit more, you could rename the current
__new__
to__init__
(you might need some dummy init/new on Index overriding it to make sure it works with the current inheritance works), and then we can already merge the__init__
with_simple_new
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIUC, then yes you're OK setting
copy=False
here. By definition, the conversion todatetime64[ns]
will involve a copy.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Further question: it is not (yet) possible to simply remove this case? (eventually we should not call the DatetimeArray constructor with an array-like of scalars)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not if we want to share the extant arithmetic tests (which we do)
I don't share this opinion, would prefer to delay this discussion until it is absolutely necessary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then please raise this in the appropriate issue, as we have been discussing this before (I think it is #23212, although there is probably some more scattered discussion on other related PRs)
It is here that you are redesigning the constructors for the array refactor, IIUC, so if there is a time we should discuss it, it is now I think?
Can you clarify this a little bit? At what point do the arithmetic tests need to deal with array of objects?
Eg for boxing the constructed values into Series/Index/Array, there a properly dtyped array can be used?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The pertinent word here is "extant". Many of the tests in tests/arithmetic pass a list into
tm.box_expected
orklass
.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ignoring the tests for a moment, I thought we were all on board with the goal of the
DatetimelikeArray.__init__
being no inference and no copy.Back to the tests, it looks like you can you add an entry to
box_expected
forDatetimeArray
to returnexpected = DatetimeArray._from_sequence(expected)
?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My comment to Joris below about mothballing this conversation applies. But short answer is no: I did not get on board with that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
then we should also get the
freq
from the values as a "cheap" inference? Or would there be cases were an inferred frequency can be different than the actual frequency?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's what I'm thinking, yah
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would get out the
_values
, and then treat that the same as directly passing a DatetimeIndex/DatetimeArray ?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll see if there is a graceful way to do this in the next pass (if I ever manage to catch up with all these comments!)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
DatetimeArrayMixin
->cls
?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And you don't need to get the
tz
in this case?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No. For the moment we are still using inheritance, so this would mess up for DatetimeIndex == DatetimeArray. When we change to composition this check will have to become
isinstance(values, (DatetimeArray, ABCDatetimeIndex))
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why do you need to turn into an object array here? to_datetime handles all of these cases
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right we could make do without it. I like doing this explicitly because to_datetime is already overloaded and circular.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is horribly inefficient and unnecessary
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we don't do it here, to_datetime is going to do this. It may be unnecessary, but it is not horribly inefficient. What is a code smell is the circularity involved in calling to_datetime.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
then just call array_to_datetime and don’t force the conversion to array
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So is the root problem (referenced in your "circularity" comment, and down below in
TimedeltaIndex.__new__
) thatto_datetime
/to_timedelta
returns an Index instead of an EA?Could we have the public
to_datetime
just be a simpleso the internal
_to_datetime
returns the array?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not the fact that it's an Index so much as that it is a circular dependency. I think I can resolve this in an upcoming commit.
_convert_listlike_datetimes calls
ensure_object
.Not sure what you're referring to. As implemented _from_sequence is specifically for list, tuple, or object-dtype NDArray/Index. datetime64-dtype goes through a different path.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's after an
So those both avoid conversion to object.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ExtensionArray._from_sequence
is for any sequence of scalar objects, including a ndarray with a specialized type likedatetime64[ns]
. It'll be used, for example, infactorize(ExtensionArray)
.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@TomAugspurger thank you for clarifying; I was under the mistaken impression that it was specifically list/tuple/object-dtype.
Are there any restrictions on kwargs that can be added to it? In particular I'm thinking of
freq
andtz
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you cannot return here directly?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't call to_timedelta, so this does require that we pass an object array.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
then it should
let’s not reinvent the wheel
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, it shouldn't.
to_timedelta
will just end up callingarray_to_timedelta64
like this does, but only after doing a bunch of unecessary dtype checks.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Besides this is what
TimedeltaIndex.__new__
currently callsThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because of (lack of) caching? This comment makes it seems like it's slower in general, when (if it's caching) it's just slower on repeated use).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BTW (as mentioned elsewhere), I am not sure we should add them as public methods. If we do so, we should add them to all our EAs, or actually even to the EA interface, and not only to TimedeltaArray (or datetimelike arrays).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not necessarily opposed to this, but this isn't obvious to me.
Because the Index version defines monotonic_increasing, monotonic_decreasing, and is_unique in a single call via _engine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AssertionError no?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When called from _simple_new this is internal so AssertionError would make sense, but it is also called from
__new__
so is in principle user-facing.Either way I need to add tests for this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
well this should never happen all conversions should be before this
so it should assert
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
dtype
is part of the signature ofTimedeltaArray.__new__
, which is/will be user-facing. If the user passes the wrong dtype, its aValueError
.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no my point is there should be and i think currently there is already conversion
if it’s wrong at this point it’s not a user error but an incorrect path taken
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My point is that this check function is called two times, one of which is the very first thing in
TimedeltaArray.__new__
.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Apart from the discussion above, is it worth having a 15 line function (including docstrings :-)), for a 2-liner used in two places?
I would maybe simply leave it in place how it was, I think reading something like
assert dtype == _TD_DTYPE
inTimedeltaArray._simple_new
is clearer than calling into a helper functionThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reasonable. But hey, its a nice docstring.