Skip to content

Fix bool float no coercion #18607

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed

Conversation

aschade
Copy link
Contributor

@aschade aschade commented Dec 3, 2017

@pep8speaks
Copy link

pep8speaks commented Dec 3, 2017

Hello @aschade! Thanks for updating the PR.

Line 1070:80: E501 line too long (84 > 79 characters)

Comment last updated on February 03, 2018 at 23:04 Hours UTC

@gfyoung gfyoung added Bug Dtype Conversions Unexpected or buggy dtype conversions Numeric Operations Arithmetic, Comparison, and Logical operations Reshaping Concat, Merge/Join, Stack/Unstack, Explode labels Dec 3, 2017
@gfyoung
Copy link
Member

gfyoung commented Dec 3, 2017

@aschade : Thanks for submitting this! Quick question: any reason why you submitted a new PR?

@@ -669,6 +669,19 @@ def test_arg_for_errors_in_astype(self):

df.astype(np.int8, errors='ignore')

from operator import (add, mul, floordiv, sub)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see no reason why we shouldn't move this import to the top. You might even consider just doing "import operator as ops" so that you can namespace with (ops.[operation-name]).

@aschade
Copy link
Contributor Author

aschade commented Dec 3, 2017

@gfyoung the other branch got out of sync so I just made a clean one

@jreback
Copy link
Contributor

jreback commented Dec 3, 2017

@aschade you can simply merge master or rebase and force push to keep up to date
this branch is ok for now

@codecov
Copy link

codecov bot commented Dec 3, 2017

Codecov Report

Merging #18607 into master will decrease coverage by 0.01%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #18607      +/-   ##
==========================================
- Coverage   91.46%   91.45%   -0.02%     
==========================================
  Files         157      157              
  Lines       51439    51441       +2     
==========================================
- Hits        47051    47044       -7     
- Misses       4388     4397       +9
Flag Coverage Δ
#multiple 89.32% <100%> (ø) ⬆️
#single 40.6% <20%> (-0.11%) ⬇️
Impacted Files Coverage Δ
pandas/core/internals.py 94.47% <100%> (ø) ⬆️
pandas/io/gbq.py 25% <0%> (-58.34%) ⬇️
pandas/core/frame.py 97.81% <0%> (-0.1%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 0e16818...903ee6a. Read the comment docs.

@codecov
Copy link

codecov bot commented Dec 3, 2017

Codecov Report

Merging #18607 into master will increase coverage by <.01%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #18607      +/-   ##
==========================================
+ Coverage   91.62%   91.62%   +<.01%     
==========================================
  Files         150      150              
  Lines       48681    48685       +4     
==========================================
+ Hits        44604    44608       +4     
  Misses       4077     4077
Flag Coverage Δ
#multiple 89.99% <100%> (ø) ⬆️
#single 41.81% <50%> (+0.07%) ⬆️
Impacted Files Coverage Δ
pandas/core/ops.py 95.74% <ø> (ø) ⬆️
pandas/core/frame.py 97.43% <100%> (ø) ⬆️
pandas/core/reshape/merge.py 94.21% <0%> (-0.01%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 3f3b4e0...0f72afd. Read the comment docs.

@aschade
Copy link
Contributor Author

aschade commented Dec 3, 2017

@jreback Ok will do next time

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this also needs some tests for series

@@ -1026,7 +1026,7 @@ def f(m, v, i):

return [self.make_block(new_values, fastpath=True)]

def coerce_to_target_dtype(self, other):
def coerce_to_target_dtype(self, other, force_coericion=False):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

huh? I don't think you should be touching this at all, this is much more related to some logic in ops.py

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like the problem is that before the operation is done, coerce_to_target_dtype coerces the np.array to object in this block:

if self.is_bool or is_object_dtype(dtype) or is_bool_dtype(dtype):
    # we don't upcast to bool
     return self.astype(object)

Example here:

>>> import numpy as np
>>> (np.array([True]) * 1.0).dtype
dtype('float64')
>>> (np.array([True], dtype='object') * 1.0).dtype
dtype('O')

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you are missing the point; the fix is much higher up the stack

@@ -669,6 +670,17 @@ def test_arg_for_errors_in_astype(self):

df.astype(np.int8, errors='ignore')

@pytest.mark.parametrize("num", [1.0, 1])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should be in pandas/tests/frame/test_operators

def test_assert_list_and_bool_coerce(self, num, struct, op):
# issue 18549
target_type = np.array([op(num, num)]).dtype
res = op(struct([True]), num).dtypes
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually compare the result and directly construct the expected value; use assert_frame_equal

@@ -1264,6 +1272,9 @@ def f(self, other, axis=default_axis, level=None, fill_value=None):
if fill_value is not None:
self = self.fillna(fill_value)

if is_number(other):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is better handled in combine_const

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what do you think would be the best approach in that method? cast bool columns to float/int before eval is used?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good. small change.

return self._constructor(new_data)
result = self._constructor(new_data)
if is_number(other):
coerce_bool_cols = {col: type(other) for col in self.select_dtypes('bool')}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe something more along the lines of

In [18]: df = pd.DataFrame({'A': [True], 'B':[1], 'C':[False]})

In [19]: df
Out[19]: 
      A  B      C
0  True  1  False

# this is the internal routine, but appropriate here
In [20]: bt = df.dtypes.apply(pandas.core.dtypes.common.is_bool_dtype)

In [22]: i = bt[bt].index

In [23]: i
Out[23]: Index(['A', 'C'], dtype='object')

then
result[i] = result[i].astype('bool')

as select_dtypes copies things.

@aschade
Copy link
Contributor Author

aschade commented Dec 15, 2017

The Series tests pass for me locally but on AppVeyer they're failing since the operations are casting the Series' dtype to int32 and on my local machine they're int64. Not really sure how to debug this since I can't reproduce locally.

@aschade
Copy link
Contributor Author

aschade commented Dec 16, 2017

@jreback

# issue 18549
ser1 = pd.Series([op(num, num)])
ser2 = pd.Series([True])
assert_series_equal(ser1, op(ser2, num))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use
result = Series([op(num, num)])
and
assert_series_equal(result, expected)
and write the expected out (and same in DataFrame tests).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually its hard to tell which is result and expected here.

if is_number(other):
bool_cols = self.dtypes.apply(is_bool_dtype)
index = bool_cols[bool_cols].index
result[index] = result[index].astype(type(other))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so on windows type(other) if otheris an int would yield np.int32, while on all other systems this would be np.int64.

is there a reason we don't just cast back to bool for these columns?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm shouldn't these cols be coerced to the same dtype as other? Ie copy the behavior from python:

>>> type(1.0 * True)
<class 'float'>
>>> 1.0 * True
1.0

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you rebase and i will take a look

@@ -262,6 +262,7 @@ Conversion
- Fixed a bug where creating a Series from an array that contains both tz-naive and tz-aware values will result in a Series whose dtype is tz-aware instead of object (:issue:`16406`)
- Adding a ``Period`` object to a ``datetime`` or ``Timestamp`` object will now correctly raise a ``TypeError`` (:issue:`17983`)
- Fixed a bug where ``FY5253`` date offsets could incorrectly raise an ``AssertionError`` in arithmetic operatons (:issue:`14774`)
- Bug in :class:`frame` where math operations on a `DataFrame` containing `bool` are coerced to `object` and not the target `dtype` (:issue: `18607`)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you move to 0.23.0

@aschade aschade force-pushed the fix-bool-float-no-coercion branch 9 times, most recently from 7b32dc0 to 59cabff Compare February 3, 2018 04:26
@aschade aschade force-pushed the fix-bool-float-no-coercion branch from 59cabff to 0f72afd Compare February 3, 2018 23:04
@jreback
Copy link
Contributor

jreback commented Mar 16, 2018

closing as stale

@jreback jreback closed this Mar 16, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Dtype Conversions Unexpected or buggy dtype conversions Numeric Operations Arithmetic, Comparison, and Logical operations Reshaping Concat, Merge/Join, Stack/Unstack, Explode
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BUG: boolean frames multiplied by floats have dtypes=object
4 participants