Skip to content

EA ops alignment with DataFrame #24301

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
jbrockmendel opened this issue Dec 16, 2018 · 5 comments
Closed

EA ops alignment with DataFrame #24301

jbrockmendel opened this issue Dec 16, 2018 · 5 comments
Labels
DataFrame DataFrame data structure ExtensionArray Extending pandas with custom dtypes or arrays.

Comments

@jbrockmendel
Copy link
Member

One more reason why 2-D should EA should be supported: df + ea won't treat ea as column-like since it can't be reshaped to (nrows, 1). And operating with df.T has its own set of problems because transposing drops EA dtypes.

@jorisvandenbossche
Copy link
Member

Can you give a code example of what you are meaning? (isn't it expected that in df + ea ea is not treated as column-like?)

@jbrockmendel
Copy link
Member Author

This came up in #24282 trying to make use of the decorator for IntNA arithmetic ops. The relevant portion of the test that would be affected is: (see #24326, turning the tests into something copy/paste-able is much harder than it should be)

d = (list(range(8)) +
            [np.nan] +
            list(range(10, 98)) +
            [np.nan] +
            [99, 100])

data = integer_array(d)
op = '__add__'

s = pd.Series(data)
opa = getattr(data, op)

df = pd.DataFrame({'A': s})

with pytest.raises(NotImplementedError):
    opa(df)

Instead of raising NIE, I want to make data.__add__(df) return NotImplemented, then making df.__radd__(data) do something useful. Given the way 1-dim objects broadcast, df.__radd__(data) correctly raises ValueError, so a nonzero amount of gymnastics needs to occur if we want to treat data as a column. Options that come to mind:

  1. make the user operate on data + df['S'], then re-wrap in a DataFrame manually. Not too bad with one column, but annoying if we were working with many columns.
  2. make the user do (data + df.T).T, would be nice and clean, but transpose doesn't preserve EA dtypes.
  3. make the user do df.add(data, axis=0), but that turns out to raise NotImplementedError with a message that looks spurious (haven't dug into this)
  4. allow the user to reshape data = data.reshape(-1, 1) like they could with a numpy array, at which point data + df would work fine.

I advocate option 4, as it matches numpy behavior that people are used to and would also let us avoid losing EA dtypes when transposing.

@TomAugspurger
Copy link
Contributor

I don't see how that example relates to the original post, which was about DataFrame + EA. Is the issue about DataFrame + Series[ea], in which case alignment matters? Or is it about DataFrame + EA, when there isn't any alignment (but maybe broadcasting?)

To me that should behave the same as DataFrame + ndarray

In [32]: df = pd.DataFrame({"A": [1, 2, 3]})

In [33]: df + pd.core.arrays.integer_array([1, 2, 3])
# raises ValueError, just like for ndarray

In [38]: df + np.array([1])
Out[38]:
   A
0  2
1  3
2  4

In [39]: df + pd.core.arrays.integer_array([1])
Out[39]:
   A
0  2
1  3
2  4

I don't think that last one is dispatching to EA implementation though.

@jbrockmendel
Copy link
Member Author

@TomAugspurger It’s about DataFrame + EA. The example with a length-1 array is a special case. Try the same thing with an array with length matching len(df). Both the EA and ndarray cases should raise ValueError. But in the ndarray case it can be solved by reshaping, while the EA case cannot.

@TomAugspurger
Copy link
Contributor

But in the ndarray case it can be solved by reshaping, while the EA case cannot.

I don't think that's the recommended way of doing it though. We'd direct users towards DataFrame.add in that case.

In [17]: df.add(pd.core.arrays.integer_array(arr), axis=0)
---------------------------------------------------------------------------
NotImplementedError                       Traceback (most recent call last)
<ipython-input-17-84452ee6a044> in <module>
----> 1 df.add(pd.core.arrays.integer_array(arr), axis=0)

~/sandbox/pandas/pandas/core/ops.py in f(self, other, axis, level, fill_value)
   2016             return _combine_series_frame(self, other, pass_op,
   2017                                          fill_value=fill_value, axis=axis,
-> 2018                                          level=level)
   2019         else:
   2020             if fill_value is not None:

~/sandbox/pandas/pandas/core/ops.py in _combine_series_frame(self, other, func, fill_value, axis, level)
   1903         axis = self._get_axis_number(axis)
   1904         if axis == 0:
-> 1905             return self._combine_match_index(other, func, level=level)
   1906         else:
   1907             return self._combine_match_columns(other, func, level=level)

~/sandbox/pandas/pandas/core/frame.py in _combine_match_index(self, other, func, level)
   4925             # fastpath --> operate directly on values
   4926             with np.errstate(all="ignore"):
-> 4927                 new_data = func(left.values.T, right.values).T
   4928             return self._constructor(new_data,
   4929                                      index=left.index, columns=self.columns,

~/sandbox/pandas/pandas/core/arrays/integer.py in integer_arithmetic_method(self, other)
    585             if getattr(other, 'ndim', 0) > 1:
    586                 raise NotImplementedError(
--> 587                     "can only perform ops with 1-d structures")
    588
    589             if isinstance(other, IntegerArray):

NotImplementedError: can only perform ops with 1-d structures

So I would focus on getting that working I think.

In this case, I think the issue is that integer_arithmetic_method doesn't implement broadcasting.

@mroeschke mroeschke added ExtensionArray Extending pandas with custom dtypes or arrays. DataFrame DataFrame data structure labels Jan 13, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
DataFrame DataFrame data structure ExtensionArray Extending pandas with custom dtypes or arrays.
Projects
None yet
Development

No branches or pull requests

4 participants