-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
Fix bool float no coercion #18607
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix bool float no coercion #18607
Conversation
Hello @aschade! Thanks for updating the PR.
Comment last updated on February 03, 2018 at 23:04 Hours UTC |
@aschade : Thanks for submitting this! Quick question: any reason why you submitted a new PR? |
pandas/tests/frame/test_dtypes.py
Outdated
@@ -669,6 +669,19 @@ def test_arg_for_errors_in_astype(self): | |||
|
|||
df.astype(np.int8, errors='ignore') | |||
|
|||
from operator import (add, mul, floordiv, sub) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see no reason why we shouldn't move this import to the top. You might even consider just doing "import operator as ops" so that you can namespace with (ops.[operation-name]
).
@gfyoung the other branch got out of sync so I just made a clean one |
@aschade you can simply merge master or rebase and force push to keep up to date |
Codecov Report
@@ Coverage Diff @@
## master #18607 +/- ##
==========================================
- Coverage 91.46% 91.45% -0.02%
==========================================
Files 157 157
Lines 51439 51441 +2
==========================================
- Hits 47051 47044 -7
- Misses 4388 4397 +9
Continue to review full report at Codecov.
|
Codecov Report
@@ Coverage Diff @@
## master #18607 +/- ##
==========================================
+ Coverage 91.62% 91.62% +<.01%
==========================================
Files 150 150
Lines 48681 48685 +4
==========================================
+ Hits 44604 44608 +4
Misses 4077 4077
Continue to review full report at Codecov.
|
@jreback Ok will do next time |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this also needs some tests for series
pandas/core/internals.py
Outdated
@@ -1026,7 +1026,7 @@ def f(m, v, i): | |||
|
|||
return [self.make_block(new_values, fastpath=True)] | |||
|
|||
def coerce_to_target_dtype(self, other): | |||
def coerce_to_target_dtype(self, other, force_coericion=False): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
huh? I don't think you should be touching this at all, this is much more related to some logic in ops.py
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like the problem is that before the operation is done, coerce_to_target_dtype
coerces the np.array
to object
in this block:
if self.is_bool or is_object_dtype(dtype) or is_bool_dtype(dtype):
# we don't upcast to bool
return self.astype(object)
Example here:
>>> import numpy as np
>>> (np.array([True]) * 1.0).dtype
dtype('float64')
>>> (np.array([True], dtype='object') * 1.0).dtype
dtype('O')
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you are missing the point; the fix is much higher up the stack
pandas/tests/frame/test_dtypes.py
Outdated
@@ -669,6 +670,17 @@ def test_arg_for_errors_in_astype(self): | |||
|
|||
df.astype(np.int8, errors='ignore') | |||
|
|||
@pytest.mark.parametrize("num", [1.0, 1]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should be in pandas/tests/frame/test_operators
pandas/tests/frame/test_dtypes.py
Outdated
def test_assert_list_and_bool_coerce(self, num, struct, op): | ||
# issue 18549 | ||
target_type = np.array([op(num, num)]).dtype | ||
res = op(struct([True]), num).dtypes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
actually compare the result and directly construct the expected value; use assert_frame_equal
pandas/core/ops.py
Outdated
@@ -1264,6 +1272,9 @@ def f(self, other, axis=default_axis, level=None, fill_value=None): | |||
if fill_value is not None: | |||
self = self.fillna(fill_value) | |||
|
|||
if is_number(other): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is better handled in combine_const
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what do you think would be the best approach in that method? cast bool columns to float/int before eval is used?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks good. small change.
pandas/core/frame.py
Outdated
return self._constructor(new_data) | ||
result = self._constructor(new_data) | ||
if is_number(other): | ||
coerce_bool_cols = {col: type(other) for col in self.select_dtypes('bool')} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe something more along the lines of
In [18]: df = pd.DataFrame({'A': [True], 'B':[1], 'C':[False]})
In [19]: df
Out[19]:
A B C
0 True 1 False
# this is the internal routine, but appropriate here
In [20]: bt = df.dtypes.apply(pandas.core.dtypes.common.is_bool_dtype)
In [22]: i = bt[bt].index
In [23]: i
Out[23]: Index(['A', 'C'], dtype='object')
then
result[i] = result[i].astype('bool')
as select_dtypes copies things.
The Series tests pass for me locally but on AppVeyer they're failing since the operations are casting the Series' |
# issue 18549 | ||
ser1 = pd.Series([op(num, num)]) | ||
ser2 = pd.Series([True]) | ||
assert_series_equal(ser1, op(ser2, num)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use
result = Series([op(num, num)])
and
assert_series_equal(result, expected)
and write the expected out (and same in DataFrame tests).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
actually its hard to tell which is result and expected here.
pandas/core/frame.py
Outdated
if is_number(other): | ||
bool_cols = self.dtypes.apply(is_bool_dtype) | ||
index = bool_cols[bool_cols].index | ||
result[index] = result[index].astype(type(other)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so on windows type(other)
if other
is an int would yield np.int32
, while on all other systems this would be np.int64
.
is there a reason we don't just cast back to bool
for these columns?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmmm shouldn't these cols be coerced to the same dtype
as other
? Ie copy the behavior from python:
>>> type(1.0 * True)
<class 'float'>
>>> 1.0 * True
1.0
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you rebase and i will take a look
doc/source/whatsnew/v0.22.0.txt
Outdated
@@ -262,6 +262,7 @@ Conversion | |||
- Fixed a bug where creating a Series from an array that contains both tz-naive and tz-aware values will result in a Series whose dtype is tz-aware instead of object (:issue:`16406`) | |||
- Adding a ``Period`` object to a ``datetime`` or ``Timestamp`` object will now correctly raise a ``TypeError`` (:issue:`17983`) | |||
- Fixed a bug where ``FY5253`` date offsets could incorrectly raise an ``AssertionError`` in arithmetic operatons (:issue:`14774`) | |||
- Bug in :class:`frame` where math operations on a `DataFrame` containing `bool` are coerced to `object` and not the target `dtype` (:issue: `18607`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you move to 0.23.0
7b32dc0
to
59cabff
Compare
59cabff
to
0f72afd
Compare
closing as stale |
git diff upstream/master -u -- "*.py" | flake8 --diff