-
-
Notifications
You must be signed in to change notification settings - Fork 18.9k
Closed
Labels
API DesignClosing CandidateMay be closeable, needs more eyeballsMay be closeable, needs more eyeballsConstructorsSeries/DataFrame/Index/pd.array ConstructorsSeries/DataFrame/Index/pd.array ConstructorsDatetimeDatetime data dtypeDatetime data dtypeEnhancementNeeds DiscussionRequires discussion from core team before further actionRequires discussion from core team before further actionPeriodPeriod data typePeriod data typeRefactorInternal refactoring of codeInternal refactoring of codeTimedeltaTimedelta data typeTimedelta data type
Description
The constructor(s) for DatetimeArray are a bit messy right now, so let's step back a bit to lay out what we want out of them.
What do we want out of our init? I'd like the following constraints:
data
is never copied unless explicitly requested withcopy=True
. The values indata
are never coerced. This means no lists (copy), and no ndarrays of values that can be coerced to datetime64[ns] (no object-dtype strings, Timestamps, etc.). We do allow unboxing data from a Series / Index / DatetimeArray, and we do allow viewing i8 data as M8[ns].- The signature matches across all DTA classes:
values, dtype, freq, copy
- It's fast. There are two wrinkles here
a.) I didn't (and many users probably don't) appreciate the performance impact of passingfreq=
to DTI / DTA. (ballpark: 5x slower for creating). Everything else is relatively cheap to check, the most expensive thing is probably timezone normalization which I think is unavoidable.
b.) Frequency inference. Right now it's disallowed. Should we allow it? Is this expensive?
If possible, I'd prefer to avoid defining DatetimeArray.__new__
, for two main reasons
- Maintainability: defining
__new__
complicates pickle, which makes for relatively difficult debugging sessions in the future - Aesthetics: Python already has a way for initializing classes (
__init__
), so all else equal I'd prefer to use that instead of__new__
+_simple_new
Some concretish TODOs:
- Investigate validation-checking code between
DatetimeArray.__init__
andsequence_to_dt64ns
(checking user-provided freq / dtype / tz vs. those properties on DatetimeArrayvalues
) - Implement
freq
validation (blocked by
Bad freq invalidation in DatetimeIndex.where #24555 and maybe
Refactor DatetimeArray._generate_range #24562) - Standardize
DatetimeArray._simple_new
and the__init__
. Right now_simple_new
takes_simple_new(cls, values, freq=None, tz=None)
. Changing thattz
todtype
should lets use share more code between TDA/DTA/PeriodArray.
Metadata
Metadata
Assignees
Labels
API DesignClosing CandidateMay be closeable, needs more eyeballsMay be closeable, needs more eyeballsConstructorsSeries/DataFrame/Index/pd.array ConstructorsSeries/DataFrame/Index/pd.array ConstructorsDatetimeDatetime data dtypeDatetime data dtypeEnhancementNeeds DiscussionRequires discussion from core team before further actionRequires discussion from core team before further actionPeriodPeriod data typePeriod data typeRefactorInternal refactoring of codeInternal refactoring of codeTimedeltaTimedelta data typeTimedelta data type