PERF: Implement DataFrame-with-scalar ops block-wise #28583

jbrockmendel · 2019-09-23T19:38:24Z

One of four cases we'll need to implement (the others being Series-align-index, Series-align-columns, and DataFrame).

~670x speedup on the fastest ops, ~8x on the slower end.

In [3]: arr = np.arange(10**5).reshape(100, 1000)                                                               
In [4]: df = pd.DataFrame(arr)                                                                                  
In [5]: %timeit df + 1                                              
198 ms ± 2.17 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)         # <-- master                                            
294 µs ± 3.25 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)  # <-- PR

In [8]: ts = pd.Timestamp.now("UTC")                                                                            
In [9]: df2 = pd.DataFrame(arr.view("timedelta64[ns]"))                                                         
In [10]: %timeit ts - df2                                                                                       
319 ms ± 2.49 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)      # <-- master
40.2 ms ± 622 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)   # <-- PR

The 2D DTA and TDAs that get created exist only briefly, and the changes to these classes are about as minimal as I can make them.

…ray_ops

…ray_ops2

pep8speaks · 2019-09-23T19:38:31Z

Hello @jbrockmendel! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2019-10-09 23:58:19 UTC

TomAugspurger · 2019-09-23T19:54:40Z

pandas/core/arrays/datetimes.py

@@ -76,6 +76,17 @@
 """


+def compat_2d(meth):


Presumably this is going to be applied to other methods (else you'd just do the reshaping in _add_offset)?

I implemented this at a stage when it was needed in several places (for e.g. getting __repr__ to work in debugging), then trimmed the usages back to the bare minimum. So we could get by without it now.

Hmm, OK good to know. Let's hear what others prefer (my slight preference is to inline it in the one place it's used).

TomAugspurger · 2019-09-23T19:54:55Z

pandas/core/arrays/datetimes.py

+            return result.reshape(self.shape)
+        return meth(self, *args, **kwargs)
+
+    new_meth.__name__ = meth.__name__


Use functools.wraps instead?

TomAugspurger · 2019-09-23T20:00:50Z

pandas/tests/arrays/test_datetimes.py

-            # 2-dim
-            DatetimeArray(arr.reshape(2, 2))
+        # 2-dim allowed for ops compat
+        DatetimeArray(arr.reshape(2, 2))


Do we are to make any assertion about the ._data of this result?

sure, will update

TomAugspurger · 2019-09-23T20:03:49Z

pandas/core/ops/__init__.py

+                    assert new_vals.shape == (blk.shape[-1],)
+                    nb = make_block(new_vals, placement=blk.mgr_locs, ndim=2)
+                    new_blocks.append(nb)
+            elif blk.values.ndim == 1:


I'm curious: what hits this case? An op on an EA that returns an ndarray? Say DataFrame[Categorical] == 0?

Exactly. Block[Categorical].values == 0 needs to become a 2D ndarray[bool] block before we're done.

TomAugspurger · 2019-09-23T20:04:16Z

pandas/core/ops/__init__.py

+            new_vals = array_op(blk_vals, right, func, str_rep, eval_kwargs)
+
+            # Reshape for EA Block
+            if is_extension_array_dtype(new_vals.dtype):


For my own understanding: the if and the elif would be unnecessary if all EAs were allowed to be 2D?

Yep. Everything from 489-524 would boil down to something like:

new_vals = array_op(blk.values.T, right, func, str_rep, eval_kwargs) nb = blk.make_block(new_vals.T) new_blocks.append(nb)

jbrockmendel · 2019-09-24T01:07:47Z

There are three more cases after this: Series[axis="rows"], Series[axis="columns], and DataFrame. The axis="rows" case can share most of the implementation here. The other two cases not so much.

Question: do we want to do this in 4ish steps, or iterate in this PR until we get them all implemented?

WillAyd · 2019-09-24T01:20:26Z

pandas/core/arrays/datetimes.py

@@ -361,7 +372,7 @@ def __init__(self, values, dtype=_NS_DTYPE, freq=None, copy=False):
                "ndarray, or Series or Index containing one of those."
            )
            raise ValueError(msg.format(type(values).__name__))
-        if values.ndim != 1:
+        if values.ndim not in [1, 2]:


Why was this changed? Seems to conflict with the error message directly below

Applicable in a few places

Yah, this is kludge-adjacent. We don't really support 2D, so don't to tell users its OK.

WillAyd · 2019-09-24T01:21:16Z

pandas/core/frame.py

@@ -5290,9 +5290,11 @@ def _combine_match_columns(self, other: Series, func, level=None):
        new_data = ops.dispatch_to_series(left, right, func, axis="columns")
        return left._construct_result(right, new_data, func)

-    def _combine_const(self, other, func):
+    def _combine_const(self, other, func, str_rep=None, eval_kwargs=None):


Can you annotate new parameters?

WillAyd · 2019-09-24T01:25:14Z

pandas/core/ops/__init__.py

+        mgr = left._data
+        for blk in mgr.blocks:
+            # Reshape for EA Block
+            blk_vals = blk.values


Should this be using to_numpy instead of .values? Not super well versed on what types actually get preserved this way

we specifically want to get the .values attribute, which can be either a ndarray or EA (also I dont think Block has to_numpy)

jreback · 2019-09-24T12:30:09Z

will have a look soon.

…ray_ops2

topper-123 · 2019-10-02T10:14:31Z

pandas/core/ops/__init__.py

@@ -511,7 +562,7 @@ def wrapper(self, other):
        lvalues = extract_array(self, extract_numpy=True)
        rvalues = extract_array(other, extract_numpy=True)

-        res_values = comparison_op(lvalues, rvalues, op)
+        res_values = comparison_op(lvalues, rvalues, op, None, {})


For readability, I'd prefer the last two argument to be keyword arguments (It's relatively self-evident what the first three arguments for, but that's IMO not the case for the last two...). Same for calls to comparison_op below

topper-123 · 2019-10-02T10:36:15Z

pandas/core/ops/array_ops.py

@@ -217,7 +227,7 @@ def arithmetic_op(


 def comparison_op(
-    left: Union[np.ndarray, ABCExtensionArray], right: Any, op
+    left: Union[np.ndarray, ABCExtensionArray], right: Any, op, str_rep, eval_kwargs


str_rep and eval_kwargs don't seem to be used here, is that right? Could they be removed, or have default arguments of None?

Also, if they need to stay, could you add type hints in order to ease understanding? I assume they have types Optional[str] and dict, respectively?

Will add types, double-check whether these can be removed.

jreback · 2019-10-02T12:11:29Z

pandas/core/ops/__init__.py

    if lib.is_scalar(right) or np.ndim(right) == 0:

+        new_blocks = []


I would rather actually actualy do this in the block manager no? (maybe put in internals/ops.py) we should actually move all block type ops there (e.g. from groupby as well). OR isolate this code in ops somewhere (maybe internals.py), so things are separated.

Generally agree on getting Block/BlockManager-touching things isolated in internals. Will give some thought to how/when to move this. ATM this is in Proof of Concept phase while I figure out how to handle the remaining cases.

…ray_ops2

jbrockmendel · 2019-11-10T17:01:52Z

Closing to clear the queue; I'll be opening something similar (but implemented largely in internals) before long.

jbrockmendel added 30 commits September 12, 2019 15:25

REF: implement logical and comparison array ops

9a617e3

Merge branch 'master' of https://github.com/pandas-dev/pandas into ar…

3f414b1

…ray_ops

Merge branch 'master' of https://github.com/pandas-dev/pandas into ar…

396b4a8

…ray_ops

implement arithmetic_op

56dff20

Merge branch 'master' of https://github.com/pandas-dev/pandas into ar…

77e3241

…ray_ops

add comments, types

148a8e8

typo fixup

fcf9735

revert types

fec86de

add types

2abdccb

docstrings

121d783

Merge branch 'master' of https://github.com/pandas-dev/pandas into ar…

5faa820

…ray_ops

ignore type

267c7ca

revert technically-incorrect type

0b5aa34

Merge branch 'master' of https://github.com/pandas-dev/pandas into ar…

ad6da57

…ray_ops

REF: move na_op out

8ced97b

Merge branch 'master' of https://github.com/pandas-dev/pandas into ar…

6b9bce0

…ray_ops

Merge branch 'master' of https://github.com/pandas-dev/pandas into ar…

b0d6263

…ray_ops

Checkpoint, 5 expressions tests failing

524a1fb

Merge branch 'master' of https://github.com/pandas-dev/pandas into ar…

504a12d

…ray_ops2

revert

e968517

revert

709b1db

tests passing

274188a

Merge branch 'master' of https://github.com/pandas-dev/pandas into ar…

8f8f527

…ray_ops2

OK

7561f05

revert

837f028

revert

a6eada6

revert

936be5f

Merge branch 'master' of https://github.com/pandas-dev/pandas into ar…

5176a59

…ray_ops2

Merge branch 'master' of https://github.com/pandas-dev/pandas into ar…

26d1696

…ray_ops2

Fix tests by passing eval_kwargs

01e4922

update tests

16587e2

reenable check

b735d71

TomAugspurger reviewed Sep 23, 2019

View reviewed changes

lint fixup

4dd8944

WillAyd reviewed Sep 24, 2019

View reviewed changes

jbrockmendel added 2 commits September 27, 2019 16:41

Merge branch 'master' of https://github.com/pandas-dev/pandas into ar…

829e72a

…ray_ops2

blackify

e9d6cd9

topper-123 reviewed Oct 2, 2019

View reviewed changes

jreback requested changes Oct 2, 2019

View reviewed changes

jreback added the Reshaping Concat, Merge/Join, Stack/Unstack, Explode label Oct 2, 2019

jbrockmendel mentioned this pull request Oct 3, 2019

PERF: block-wise ops for scalar and series #28774

Closed

Merge branch 'master' of https://github.com/pandas-dev/pandas into ar…

4c65d37

…ray_ops2

jbrockmendel closed this Nov 10, 2019

jbrockmendel mentioned this pull request Nov 26, 2019

PERF: implement scalar ops blockwise #29853

Merged

jbrockmendel deleted the array_ops2 branch April 5, 2020 17:45

		if lib.is_scalar(right) or np.ndim(right) == 0:

		new_blocks = []

Uh oh!

PERF: Implement DataFrame-with-scalar ops block-wise #28583

PERF: Implement DataFrame-with-scalar ops block-wise #28583

Uh oh!

Conversation

jbrockmendel commented Sep 23, 2019

Uh oh!

pep8speaks commented Sep 23, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Comment last updated at 2019-10-09 23:58:19 UTC

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jbrockmendel commented Sep 24, 2019

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jreback commented Sep 24, 2019

Uh oh!

topper-123 Oct 2, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jbrockmendel commented Nov 10, 2019

Uh oh!

Uh oh!

pep8speaks commented Sep 23, 2019 •

edited

Loading

topper-123 Oct 2, 2019 •

edited

Loading