Skip to content

ENH: add timedelta modulus operator support (mm) #12120

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Oct 30, 2018

Conversation

tylerjereddy
Copy link
Contributor

@tylerjereddy tylerjereddy commented Oct 9, 2018

Add support for modulus operator when both operands
are timedelta64 with seconds units, and no other cases.

Related to #12092, though doesn't fully cover the modulus
scenarios requested there because I haven't added a branch
for modulus timedelta64 with a Python integer.

I think this approach can be summarized as intercepting the
array nb_remainder slot function before it dispatches to
the ufunc machinery.

@eric-wieser
Copy link
Member

Shouldn't the ufunc machinery be able to handle this itself? How is the other arithmetic handled?

@tylerjereddy tylerjereddy force-pushed the remainder_timedelta64 branch from c472685 to b662819 Compare October 9, 2018 16:29
@tylerjereddy
Copy link
Contributor Author

The ufunc machinery for timedelta arithmetic is here in umath/loops.c.src. I could try to refactor the work I've done here to fit into that machinery if that's strongly preferred.

@shoyer
Copy link
Member

shoyer commented Oct 9, 2018

I think it would definitely be preferred to implement remainder as another ufunc loop. Hard coding it into array_remainder is pretty awkward, but more importantly it guarantees that x % y and np.remainder(x, y) are exactly equivalent. This is important because we support overriding % via __array_ufunc__, as described in NEP 13.

@tylerjereddy
Copy link
Contributor Author

It seems that writing in a TIMEDELTA_mm_m_remainder() function in numpy/core/src/umath/loops.c.src along with the function prototype in numpy/core/src/umath/loops.h.src isn't sufficient for array_remainder to discover the ufunc loop, even though that seems to be all that was done for the other arithmetic operations on timedelta64. Adding a preprocessor def for TIMEDELTA_remainder doesn't help either.

I can disable a check in ufunc_loop_matches() in numpy/core/src/umath/ufunc_type_resolution.c to make things work better--this falls back to using remainder for built-in datetime.timedelta but seems fishy.

I'll keep trying!

@eric-wieser
Copy link
Member

A nice followup would be divmod, which can probably reuse the same code.

@eric-wieser
Copy link
Member

You probably need to tweak the type resolver function

@ewmoore
Copy link
Contributor

ewmoore commented Oct 10, 2018

Does it need to be added to numpy/core/code_generators/generate_umath.py?

@tylerjereddy
Copy link
Contributor Author

Does it need to be added to numpy/core/code_generators/generate_umath.py?

Apparently yes, and I'm likely going to have to write a TypeResolver function for remainder too. Looks like there's a fair bit of copy-pasting with PyUFunc_AdditionTypeResolver and PyUFunc_SubtractionTypeResolver for datetime handling, so likely similar to those.

As far as I can tell this isn't that different from what I've done in this PR -- the Python C API is still used for type checking and making decisions, the mess is just being hidden in the resolver machinery.

@charris
Copy link
Member

charris commented Oct 10, 2018

Needs a release note.

@charris charris added the 56 - Needs Release Note. Needs an entry in doc/release/upcoming_changes label Oct 10, 2018
@tylerjereddy tylerjereddy force-pushed the remainder_timedelta64 branch from b662819 to 9aaeafa Compare October 10, 2018 20:12
@tylerjereddy
Copy link
Contributor Author

tylerjereddy commented Oct 10, 2018

Refactored to use ufunc machinery as requested -- it appears remainder now works for all combinations of timedelta64 units, with appropriate exceptions on Years and Months of course.

My decision to typecast as: mm -> m on modulus (preserve timedelta64 type & report units) may require some discussion.

Points for:

  • @miccoli suggests that this matches with their expectations
  • some clarity to be gained by preserving the units in the remainder, and subtraction
    for timedelta64 is also mm -> m

Points against:

  • timedelta64 division operation, which is (more) closely related to remainder, casts to double with signature: mm -> d

Also good if we can decide on what type we want from remainder when dividing by i.e., an int64 as this was also mentioned in #12092 as logically perserving the type. For conventional division, we preserve timedelta64 type when dividing by int64 and float64, so that may be an easier decision to make.

@shoyer does this need a mailing list check first maybe?

Might be nice if we could confine this PR to mm remainder and then I can expand to md and mq (double and int64) type resolution in future PRs?

@tylerjereddy tylerjereddy changed the title ENH: add timedelta seconds modulus ENH: add timedelta modulus operator support (mm) Oct 10, 2018
@eric-wieser
Copy link
Member

As far as I'm concerned, mm->m is the only reasonable signature for modulus. If in doubt, look at the behavior of the builtin timedelta.

@tylerjereddy
Copy link
Contributor Author

from built-in datetime.timedelta:

  • divison matches what we currently have: mm->d
  • same goes for our other division operations: md->m and mq->m
  • remainder is indeed: mm->m
  • remainders with the other types mentioned above are:
    TypeError: unsupported operand type(s) for %: 'datetime.timedelta' and 'float'
    TypeError: unsupported operand type(s) for %: 'datetime.timedelta' and 'int'

So maybe supporting modulus with int and float is more controversial (was suggested as expected to work in the linked issue).

@tylerjereddy
Copy link
Contributor Author

The codecov checks are green, but it didn't actually do anything -- missing a report or something.

@miccoli
Copy link
Contributor

miccoli commented Oct 10, 2018

If I may clarify #12092: from

dividend = divisor × quotient + remainder

it follows that all three quantities (dividend, divisor × quotient, remainder) must be homogeneous and have the same time units.

Since for multiplication (divisor × quotient) we have mq -> m, qm -> m, md -> m, dm -> m, for the remainder function the signature should be

  • mm -> m
  • mq -> m
  • md -> m

i.e a numeric divisor should be accepted but the result should still be timedelta64.
(For division the signature is mm -> d because the time units cancel out and the quotient is dimensionless.)


The datetime.timedelta arithmetic is different from timedelta64: in fact

>>> datetime.timedelta(days=10) / 7
datetime.timedelta(days=1, seconds=37028, microseconds=571429)
>>> np.timedelta64(10, 'D') / 7
numpy.timedelta64(1,'D')

Therefore I find useful to define

>>> np.timedelta64(10, 'D') % 7 == np.timedelta64(3, 'D')

while the corresponding

datetime.timedelta(days=10)  % 7 == datetime.timedelta(microseconds=4)

or

datetime.timedelta(days=10)  % 7 == datetime.timedelta(days=3)

make no sense to me.

In other terms: in euclidean division the quotient should be an integer, and this makes sense for timedelta64. On the contrary, for datetime.timedelta it is hard to define a sensible integer quotient, and this makes the definition of the remainder problematic.

@tylerjereddy
Copy link
Contributor Author

@miccoli Ok, so timedelta stuff can be confusing, but in short -- we're in agreement for the currently proposed mm->m for timedelta64 remainder slot?

One point for possible clarification from your analysis:

must be homogeneous and have the same time units.

This PR currently proposes allowing:
np.timedelta64(1, 'us') % np.timedelta64(727, 'ns') -> np.timedelta64(273, 'ns')

That is type homogenous, but the units are not--are you suggesting we don't want that?

@shoyer
Copy link
Member

shoyer commented Oct 11, 2018

The datetime.timedelta arithmetic is different from timedelta64: in fact

>>> datetime.timedelta(days=10) / 7
datetime.timedelta(days=1, seconds=37028, microseconds=571429)
>>> np.timedelta64(10, 'D') / 7
numpy.timedelta64(1,'D')

I think this is arguably a bug, especially on Python 3 -- you should need to write np.timedelta64(10, 'D') // 7 for that. I don't know if have a good way to automatically pick the datatype for result, but silent truncation seems bad.

I think there's a case for sticking with mm remainder here.

This PR currently proposes allowing:
np.timedelta64(1, 'us') % np.timedelta64(727, 'ns') -> np.timedelta64(273, 'ns')

That is type homogenous, but the units are not--are you suggesting we don't want that?

I think this is the correct behavior.

@shoyer does this need a mailing list check first maybe?

Reproducing the behavior of datetime.timedelta in np.timedelta64 seems pretty uncontroversial to me. I don't think there's any cause for pinging the mailing list.

@miccoli
Copy link
Contributor

miccoli commented Oct 11, 2018

This PR currently proposes allowing:
np.timedelta64(1, 'us') % np.timedelta64(727, 'ns') -> np.timedelta64(273, 'ns')

I agree that this is correct.
(I missed the fact that np.timedelta64(1, 'us') - np.timedelta64(727, 'ns') -> numpy.timedelta64(273,'ns') and wrongly assumed that sum and subtraction only work with same time units)

@shoyer

I think this is arguably a bug, especially on Python 3 -- you should need to write np.timedelta64(10, 'D') // 7 for that. I don't know if have a good way to automatically pick the datatype for result, but silent truncation seems bad.

For reference:

>>> datetime.timedelta(days=10) / 7
datetime.timedelta(days=1, seconds=37028, microseconds=571429)
>>> datetime.timedelta(days=10) // 7
datetime.timedelta(days=1, seconds=37028, microseconds=571428)

thus datetime.timedelta(days=10) / 7 is rounded to the nearest µs while datetime.timedelta(days=10) // 7 is truncated. (Note however that the result is of the same type, while 10/7 and 10//7 have different types.)

Therefore I would argue that np.timedelta64(10, 'D') / 7 -> np.timedelta64(1, 'D') is correct, while
np.timedelta64(11, 'D') / 7 -> np.timedelta64(1, 'D') is a minor bug. For my usage cases it is important that the datetime64 and timedelta64 time units (or resolution) do not change, so I would not see favourably the fact that, for example
np.timedelta64(10, 'D') / 7 -> np.timedelta64(123428571428571, 'ns')
Of course this is a debatable opinion.

In conclusion: I agree that mm -> m can be implemented, while the other cases need more discussion, in order to clarify which is the desired result with a timedelta64 dividend and a numeric (integer or floating point) divisor.

@eric-wieser
Copy link
Member

For my usage cases it is important that the datetime64 and timedelta64 time units (or resolution) do not change, so I would not see favourably the fact that, for example np.timedelta64(10, 'D') / 7 -> np.timedelta64(123428571428571, 'ns')

I would argue that this is exactly what // is for - if you want want your variable to remain an integer rather than become a float, you use int // 2 not int / 2.

I would be in favor of deprecating timedelta64 / int


return 0;

type_reso_error: {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This indent is a little jarring - I'd put the brace on its own line

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better to fix as #12147, I think

PyObject_Repr((PyObject *)PyArray_DESCR(operands[1])));
PyErr_SetObject(PyExc_TypeError, errmsg);
Py_DECREF(errmsg);
return -1;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code is very exception-unsafe - you need to check for NULL from the result of PyObject_Repr, PyUString_ConcatAndDel, and PyUString_FromFormat

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Postponed as part of #12147

type_tup, out_dtypes);
}
if (type_num1 == NPY_TIMEDELTA) {
if (type_num2 == NPY_TIMEDELTA) {
Copy link
Member

@eric-wieser eric-wieser Oct 11, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not just write this:

if  (type_num1 == NPY_TIMEDELTA && type_num2 == NPY_TIMEDELTA) {
    // your code
}
else {
    return PyUFunc_DefaultTypeResolver(...)
}

That saves you from having to produce an error message for datetime, making all my above comments moot

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is all just copy-paste from other type resolvers. I was planning to leave room for implementing mq and md remainders, where there would be other switches to handle type_num2 on a case by case basis, so check type_num1 but multiple checks on type_num2.

@@ -1591,6 +1591,34 @@ TIMEDELTA_mm_d_divide(char **args, npy_intp *dimensions, npy_intp *steps, void *
}
}

NPY_NO_EXPORT void
TIMEDELTA_mm_m_remainder(char **args, npy_intp *dimensions, npy_intp *steps, void *NPY_UNUSED(func))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function looks correct, thanks

@tylerjereddy tylerjereddy force-pushed the remainder_timedelta64 branch from 9aaeafa to 17237f5 Compare October 11, 2018 19:22
@tylerjereddy
Copy link
Contributor Author

Updated with a small reference doc example and to reflect the error handling code changes merged in from Eric recently.

Also added a release note

@tylerjereddy tylerjereddy force-pushed the remainder_timedelta64 branch from 17237f5 to 0a807fe Compare October 11, 2018 20:00
const npy_timedelta in1 = *(npy_timedelta *)ip1;
const npy_timedelta in2 = *(npy_timedelta *)ip2;
if (in1 == NPY_DATETIME_NAT || in2 == NPY_DATETIME_NAT) {
*((npy_timedelta *)op1) = NPY_NAN;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe we should propagate NaT instead here to preserve the mm->m signature -- just noticing this as I try to round up the coverage % a little on the patch

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch

@tylerjereddy tylerjereddy force-pushed the remainder_timedelta64 branch from 0a807fe to e07f7dc Compare October 11, 2018 22:49
@eric-wieser
Copy link
Member

eric-wieser commented Oct 12, 2018

Seems strange to me that np.timedelta64(1, 'us') // np.timedelta64(1, 'us') is an error right now - floor division seems to have an obvious interpretation in my mind.

Something for a later PR.

@@ -119,6 +119,9 @@ simple datetime calculations.
>>> np.timedelta64(1,'W') / np.timedelta64(1,'D')
7.0

>>> np.timedelta64(1, 'us') % np.timedelta64(727, 'ns')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would have thought something like np.timedelta64(1,'W') % np.timedelta64(10,'D') would be a slightly clearer example, but not really important

TD(intflt),
[TypeDescription('m', FullTypeDescr, 'mm', 'm'),
],
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Line wrapping here is a little weird, and doesn't match the other cases with only one TypeDescription

Copy link
Member

@eric-wieser eric-wieser left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor nits, looks otherwise good.

# similar behavior enforced by CPython timedelta
with assert_raises_regex(RuntimeWarning,
"divide by zero encountered in remainder"):
np.timedelta64(10, 's') % np.timedelta64(0, 's')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should check the result (0) too

@tylerjereddy tylerjereddy force-pushed the remainder_timedelta64 branch from e07f7dc to 6461602 Compare October 12, 2018 18:01

@pytest.mark.parametrize("val1, val2", [
# years and months can't be unambiguously
# divided for modulus operation except for Y % M
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I means, strictly M % Y, M % M, Y % Y are all fine too. There is nothing special about how Y and M behave - there are just rules prohibiting mixing units larger than W with units smaller than or equal to W. In isolation, all the units behave the same.

@tylerjereddy tylerjereddy force-pushed the remainder_timedelta64 branch from 6461602 to abca780 Compare October 14, 2018 22:31
@tylerjereddy
Copy link
Contributor Author

Cleaned up the test comment a bit & rebased / force pushed so we get a Windows test on appveyor for the time being.

def test_timedelta_modulus_div_by_zero(self):
# similar behavior enforced by CPython timedelta
with assert_raises_regex(RuntimeWarning,
"divide by zero encountered in remainder"):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wait, why does this raise a warning? Shouldn't it warn a warning?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In CPython, ZeroDivisionError: integer division or modulo by zero is raised by timedelta(seconds=10) % timedelta(seconds=0)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For development all warnings except a few are raised as errors in pytest.ini, but in the absence of that file it should just be a warning. Is it actually raised by the code?

Copy link
Contributor Author

@tylerjereddy tylerjereddy Oct 15, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Eric is right -- in this feature branch it is just a warning when executed as plain code outside the test suite, so I should likely just check for a warning.

I assume we deviate from CPython timedelta because NumPy can gracefully handle division by 0 in scenarios where Python raises an exception.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hopefully assert_warns works then -- I'd normally use the pytest equivalent, but there's no precedent for that in NumPy IIRC.

with assert_raises_regex(RuntimeWarning,
"divide by zero encountered in remainder"):
actual = np.timedelta64(10, 's') % np.timedelta64(0, 's')
assert_equal(actual, 0)
Copy link
Member

@eric-wieser eric-wieser Oct 15, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Coverage says this line is never hit, meaning the result in the array is never actually checked.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I wasn't sure why you asked me to check that the result is zero in a previous review comment, but now I'm seeing that you thought / think we should deviate from standard Python on this one and have warning instead of exception that break control flow?

Copy link
Member

@eric-wieser eric-wieser Oct 15, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think something weird is going on within pytest that's promoting the warning to an error. I can divide by zero just fine:

In [13]: np.float64(1) % np.float64(0)
C:\Program Files\Python 3.5\Scripts\ipython:1: RuntimeWarning: invalid value encountered in double_scalars
Out[13]: nan
In [16]: np.int64(1) % np.int64(0)
C:\Program Files\Python 3.5\Scripts\ipython:1: RuntimeWarning: divide by zero encountered in longlong_scalars
Out[16]: 0

I think you need to use assert_warns or something here, and then the warning will not escalate, and you can check the result too.

* added support for modulus operator
with timedelta operands; type signature
is mm->m
@tylerjereddy tylerjereddy force-pushed the remainder_timedelta64 branch from abca780 to c9a6b02 Compare October 15, 2018 17:30
@tylerjereddy
Copy link
Contributor Author

Revised to switch a test from checking for exception to warning, as requested.

@tylerjereddy tylerjereddy removed the 56 - Needs Release Note. Needs an entry in doc/release/upcoming_changes label Oct 18, 2018
@stefanv stefanv merged commit 7cb9edf into numpy:master Oct 30, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants