-
-
Notifications
You must be signed in to change notification settings - Fork 31.9k
bpo-15873: Implement [date][time].fromisoformat #4699
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
2eb060d
to
8f7f1b1
Compare
One other thing I'll note - the pure python implementation here is at least partially optimized for speed. There is a fairly trivial implementation that looks, more or less, like this: from datetime import datetime
_base_strptimes = {
10: '%Y-%m-%d',
13: '%Y-%m-%dT%H',
16: '%Y-%m-%dT%H:%M',
19: '%Y-%m-%dT%H:%M:%S',
23: '%Y-%m-%dT%H:%M:%S.%f',
26: '%Y-%m-%dT%H:%M:%S.%f'
}
def from_strptime(dtstr):
if not isinstance(dtstr, str):
raise TypeError('isoformat takes str')
if len(dtstr) >= 19 and dtstr[-6] in '+-':
fmt = _base_strptimes[len(dtstr) - 6] + '%z'
else:
fmt = _base_strptimes[len(dtstr)]
return datetime.strptime(dtstr, fmt) This will work, but |
8f7f1b1
to
a957534
Compare
a957534
to
d7657a6
Compare
Modules/_datetimemodule.c
Outdated
/* --------------------------------------------------------------------------- | ||
* String parsing utilities and helper functions | ||
*/ | ||
static inline unsigned int to_int(char ptr) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think "static inline" is interpreted differently from "static" by any relevant compilers. AFAIK, we don't use "static inline" elsewhere in CPython code. Please just use "static".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not a problem. I translated these from macros in my original code so it seemed like inline
was appropriate at the time, but I don't have strong opinions in this regard.
Modules/_datetimemodule.c
Outdated
/* --------------------------------------------------------------------------- | ||
* String parsing utilities and helper functions | ||
*/ | ||
static inline unsigned int to_int(char ptr) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The name "ptr" is usually used for pointers. Please change "ptr" to "ch" here. On the other hand, I don't think to_int(ch)
is clearer than ch - '-'
. Call it digit_to_int
or just use the expanded expression.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, this is my mistake. As I mentioned in another comment, I translated this from a macro where it was actually manipulating the pointer directly. When I translated it into a function the variable name didn't get fixed.
Modules/_datetimemodule.c
Outdated
{ | ||
for (size_t i = 0; i < num_digits; ++i) { | ||
int tmp = to_int(*(ptr++)); | ||
if (!is_digit(tmp)) { return NULL; } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would just use ANSI C isdigit
on the character.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can actually just replace this with tmp <= 9
, I think isdigit
might be overkill. This is_digit
relies on the specific way that to_int
works - and I only used this particular combination because in early Cython
-based tests my profiling indicated that atoi
was a slow step.
A Python core developer has requested some changes be made to your pull request before we can consider merging it. If you could please address their requests along with any other requests in other reviews from core developers that would be appreciated. Once you have made the requested changes, please leave a comment on this pull request containing the phrase |
Doc/library/datetime.rst
Outdated
Return a :class:`datetime` corresponding to a *date_string* in one of the | ||
ISO 8601 formats emitted by :meth:`datetime.isoformat`. Specifically, this function | ||
supports strings in the format(s) ``YYYY-MM-DD[*[HH[:MM:[SS[.mmm[mmm]]]]][+HH:MM]]``, | ||
where ``*`` can match any single character. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should the colon after :MM be optional?
Doc/library/datetime.rst
Outdated
@@ -1486,6 +1515,23 @@ In boolean contexts, a :class:`.time` object is always considered to be true. | |||
error-prone and has been removed in Python 3.5. See :issue:`13936` for full | |||
details. | |||
|
|||
|
|||
Other constructors: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Singular: constructor
Doc/library/datetime.rst
Outdated
|
||
Return a :class:`time` corresponding to a *time_string* in one of the ISO 8601 | ||
formats emitted by :meth:`time.isoformat`. Specifically, this function supports | ||
strings in the format(s) ``HH[:MM:[SS[.mmm[mmm[+HH:MM]]]]]]``. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you need to parse the time zone without :MM:SS.mmmmmm, e.g. 09:12:32+11:00. See your datetime.datetime format.
|
||
dt_rt = DateSubclass.fromisoformat(dt.isoformat()) | ||
|
||
self.assertIsInstance(dt_rt, DateSubclass) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If it is worth testing this, maybe it is worth documenting it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That can be done. I considered doing it but decided against it because while this behavior is both true and tested for all the constructors in datetime
, it's not documented for any of them as far as I've seen. I can document it in just this one, or I can document it in all of them, I'm not picky.
for tzi in tzinfos] | ||
|
||
for dt in dts: | ||
for sep in separators: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
6 × 5 × 8 × 5 is 1200 tests. Perhaps just one test for each case (6 + 5 + 8 + 5) instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a good point. I translated these tests from hypothesis where the space is sampled sparsely. I think I can refactor out at least the separator dimension and probably reduce the dimensionality of some of the remaining tests. I'll look at it tomorrow.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I refactored out the separator tests and reduced some of the space that is sampled. I think most of the variation in the date and time portions separately is covered by the respective date and time parser functions. I think we're down to ~160 tests.
Lib/datetime.py
Outdated
if next_char != ':': | ||
raise ValueError('Invalid time separator') | ||
if pos >= len_str: | ||
break |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In retrospect, I think maybe it's logically impossible to hit this branch, since we only get here if tstr[pos:pos+1] == ':'
. I'll remove this in the next iteration.
Lib/datetime.py
Outdated
|
||
time_comps[-1] = timezone(td) | ||
|
||
if pos < len_str: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess this check is redundant with the (len_str - pos) != 6
check above.
39a65c3
to
df3c032
Compare
@abalkin @vadmium I think I've fixed all the concerns raised so far - some good catches in there. There are three documentation-related concerns:
|
def fromisoformat(cls, time_string): | ||
"""Construct a time from the output of isoformat().""" | ||
if not isinstance(time_string, str): | ||
raise TypeError('fromisoformat: argument must be str') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not really sure why this is not getting hit by the test suite. test_fromisoformat_fails_typeerror
is designed explicitly to hit this condition. Anyone have an idea why it's getting missed?
Lib/test/datetimetester.py
Outdated
|
||
for bad_type in bad_types: | ||
with self.assertRaises(TypeError): | ||
self.theclass(bad_type) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
self.theclass.fromisoformat(bad_type)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A-ha! Thanks!
df3c032
to
2a83701
Compare
Modules/_datetimemodule.c
Outdated
* String parsing utilities and helper functions | ||
*/ | ||
|
||
static const char* parse_digits(const char* ptr, int* var, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry I missed this on the first review. PEP 7 calls for "function name in column 1". See PEP 7.
Modules/_datetimemodule.c
Outdated
static const char* parse_digits(const char* ptr, int* var, | ||
size_t num_digits) | ||
{ | ||
for (size_t i = 0; i < num_digits; ++i) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
C99 - style for loops are not specifically mentioned in the "C dialect" section of PEP 7. I personally like this style, so if this compiles without warnings, we may leave it.
Modules/_datetimemodule.c
Outdated
} | ||
|
||
// Macro that short-circuits to timezone parsing | ||
#define PARSE_ISOFORMAT_ADVANCE_TIME_SEP(SEP) { \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please alight the .
Modules/_datetimemodule.c
Outdated
return -5; | ||
} | ||
|
||
parse_timezone:; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No need for a ; after the label.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately yes, there is. It's a syntax error without the ;
, because the following line is a variable declaration, not a statement, and the label is intended to be a labeled statement (Ref StackOverflow, citing section 6.8.1).
That said, there's no particular reason why those variables can't be declared at the top of the function (before the first call to PARSE_ISOFORMAT_ADVANCE_TIME_SEP
). Would you prefer that?
Modules/_datetimemodule.c
Outdated
*/ | ||
const char *p = dtstr; | ||
p = parse_digits(p, year, 4); | ||
if (NULL == p) { return -1; } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please use PEP 7 code layout.
To fully support isoformat output you also need to parse seconds on the timezone:
|
@mariocj89 Thanks for the catch. The PR is updated with tests and implementation. Didn't realize that second precision was allowed in |
Modules/_datetimemodule.c
Outdated
} | ||
|
||
PyObject *result; | ||
if (PyDate_CheckExact(cls)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This check is actually wrong, so this branch is never reached. The correct code is:
if ( (PyTypeObject*)cls == &PyDateTime_DateType ) {
9e37ae8
to
fb32d7f
Compare
Moreover, the sub-second precision is now allowed as well: >>> from datetime import *
>>> import random
>>> datetime.now(timezone(timedelta(hours=24*random.random()))).isoformat()
'2017-12-08T11:34:30.238049+15:47:34.297540' The good news is that you should be able to reuse the time fields parser logic to parse the timezone. cc: @pganssle |
@abalkin Unless I'm doing something wrong, it seems that the pure python implementation of import sys
sys.modules['_datetime'] = None # cause ImportError
from datetime import *
print(time(12, 30, 15, tzinfo=timezone(timedelta(microseconds=123456))).isoformat())
# 12:30:15+00:00:00
print(datetime(2017, 1, 1, 12, 30, 15,
tzinfo=timezone(timedelta(microseconds=123456))).isoformat())
# Traceback (most recent call last):
# File "<stdin>", line 2, in <module>
# File ".../cpython/Lib/datetime.py", line 1832, in isoformat
# assert not ss.microseconds
# AssertionError I suppose I'll fix that as part of this PR. |
Latest changes fix both |
Doc/library/datetime.rst
Outdated
|
||
Return a :class:`datetime` corresponding to a *date_string* in one of the | ||
ISO 8601 formats emitted by :meth:`datetime.isoformat`. Specifically, this function | ||
supports strings in the format(s) ``YYYY-MM-DD[*[HH[:MM[:SS[.mmm[mmm]]]]][+HH:MM]]``, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Documentation needs updating to reflect second and microsecond precision.
43f7ff5
to
3da73a3
Compare
@pganssle - It looks like you are right. Please mention bpo-5288 in the commit message when you fix this. Please also find out why this was not caught by the test suit. We should be running all tests with and without C acceleration. |
3da73a3
to
5a233fb
Compare
@pganssle - great job! This PR looks good to me now and I will merge it in a few days to give others a chance to review. |
Doc/library/datetime.rst
Outdated
|
||
Return a :class:`time` corresponding to a *time_string* in one of the ISO 8601 | ||
formats emitted by :meth:`time.isoformat`. Specifically, this function supports | ||
strings in the format(s) ``HH[:MM[:SS[.mmm[mmm]]]]][+HH:MM[:SS[.ffffff]]]``. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Too many square brackets.
Doc/library/datetime.rst
Outdated
.. classmethod:: datetime.fromisoformat(date_string) | ||
|
||
Return a :class:`datetime` corresponding to a *date_string* in one of the | ||
ISO 8601 formats emitted by :meth:`datetime.isoformat`. Specifically, this function |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would drop “ISO 8601” and just say “one of the formats emitted by isoformat”. As far as I understand, ISO 8601 doesn’t have a seconds field in time zones, but it seems you want to support this.
If you intend to support dates without the time part, maybe write “emitted by datetime.isoformat and date.isoformat”. This may also need a test case; I didn’t notice anything relevant.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@vadmium The test cases from TestDate
are actually inherited by TestDateTime
, so it is indeed supported. I think it's fair to support them.
Lib/datetime.py
Outdated
try: | ||
assert len(date_string) == 10 | ||
return cls(*_parse_isoformat_date(date_string)) | ||
except: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Write except Exception, to avoid catching KeyboardInterrupt or similar. Or even better, be explicit and list the exceptions you are expecting (AssertionError, ValueError, IndexError?).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@vadmium I think catching Exception
is probably the right thing to do, I didn't think about KeyboardInterrupt
. I'm mainly trying to get the C implementation and the Python implementation to always raise the same exceptions, and the C implementation only ever raises ValueError
(I will fuzz it some time this week to verify this), hence the catch-and-re-raise.
Lib/datetime.py
Outdated
@@ -1193,19 +1320,7 @@ def __hash__(self): | |||
def _tzstr(self, sep=":"): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good opportunity to remove the unsupported sep parameter
Lib/datetime.py
Outdated
@@ -1193,19 +1320,7 @@ def __hash__(self): | |||
def _tzstr(self, sep=":"): | |||
"""Return formatted timezone offset (+xx:xx) or None.""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
or the empty string
Doc/library/datetime.rst
Outdated
Return a :class:`datetime` corresponding to a *date_string* in one of the | ||
ISO 8601 formats emitted by :meth:`datetime.isoformat`. Specifically, this function | ||
supports strings in the format(s) ``YYYY-MM-DD[*HH[:MM[:SS[.mmm[mmm]]]]][+HH:MM[:SS[.ffffff]]]``, | ||
where ``*`` can match any single character. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would be less ambiguous putting the time zone inside the optional time part. Test case:
datetime(2017, 12, 18, 11, 0).isoformat(sep="+", timespec="minutes") -> "2017-12-18+11:00"
datetime.fromisoformat("2017-12-18+11:00") -> datetime(2017, 12, 18, 11, 0)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmmm.. As much as I dislike the idea of timezone being part of the time component, this is fine I suppose (and in fact that is how it is currently implemented).
That said, another design I considered is one where we take sep
as a keyword argument to isoparse
, which would relieve this ambiguity for all separators other than -
and +
if we wanted to eventually allow parsing of strings of the format YYYY-MM-DD+HH:MM
. That's one decision we'd have to make in this version because changing it would not be backwards compatible.
Doc/library/datetime.rst
Outdated
|
||
Return a :class:`date` corresponding to a *date_string* in one of the ISO 8601 | ||
formats emitted by :meth:`date.isoformat`. Specifically, this function supports | ||
strings in the format(s) ``YYYY-MM-DD``. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is only one format emitted by date.isoformat and supported by your date.fromisoformat, as far as I know.
Doc/library/datetime.rst
Outdated
|
||
Other constructor: | ||
|
||
.. classmethod:: time.fromisoformat(date_string) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
time_string?
@vadmium Thanks for all the comments. I've added a test for the ambiguous cases you identified and updated the documentation (plus dropped the unused |
@abalkin @vadmium Any thoughts on the question of whether to add a The only reason to do so would be if in a future release we want to support ISO-8601 style strings that are just dates with offsets attached (e.g. Having given this a bit of thought, I think it's probably a good idea to leave out If we assume that some fraction of people will want to be strict and some people will want to be lenient, no matter which one we choose, the other behavior can easily be emulated. If we choose lenient by default, strict users can do the equivalent with: def strict_fromisoformat(dtstr, sep='T'):
if dtstr[10:11] != sep:
raise ValueError('Invalid isoformat string: %s' % dstr)
return datetime.fromisoformat(dtstr) If we choose strict by default, the "lenient by default" option is: def lenient_fromisoformat(dtstr):
# Assuming sep requires a single character be passed, for dates it can be anything
return datetime.fromisoformat(dtstr, sep=dtstr[10:11] or 'T') Anyway, this is a kind of long comment just to say, "I think we should keep the status quo", but I thought I'd at least outline my thinking. |
-1 or adding a parameter, but I would not mind having fromisoformat() not checking the separator at all. |
@abalkin The current behavior is to allow any single character for the separator at all, if that's what you mean. |
Yes. That's what I want. |
Per discussion on the python-dev mailing list, this is a C implementation of
fromisoformat
as alternate constructors fordatetime
,date
andtime
, resolving this issue.If it is deemed desirable, it can be extended later to cover all
ISO-8601
datetime strings, but I believe the consensus so far is to have a minimum implementation that only covers the outputs ofdatetime.isoformat()
.One thing I'd like to call attention to is that my profiling seems to indicate that in the common case (i.e. not calling this from a subclass), using the C API directly is much faster than going through
PyObject_CallFunction
.Here is a profiling script:
Result on my laptop:
If there's no particularly pressing reason why all the other alternate constructors universally go through the main constructor call, I could write a small function or macro that would take the fast path if available and use it for all the alternate constructors.
CC @abalkin @mariocj89
https://bugs.python.org/issue15873