Skip to content

rolling casts int to float unnecessarily #15599

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
mwiebusch78 opened this issue Mar 7, 2017 · 4 comments
Closed

rolling casts int to float unnecessarily #15599

mwiebusch78 opened this issue Mar 7, 2017 · 4 comments
Labels
Dtype Conversions Unexpected or buggy dtype conversions Enhancement Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate Numeric Operations Arithmetic, Comparison, and Logical operations Reshaping Concat, Merge/Join, Stack/Unstack, Explode

Comments

@mwiebusch78
Copy link

Take the following example:

>>> df = pd.DataFrame(list(range(5)))
>>> df
   0
0  0
1  1
2  2
3  3
4  4
>>> df.rolling(3, min_periods=1).sum()
     0
0  0.0
1  1.0
2  3.0
3  6.0
4  9.0

There is no need to cast the ints to floats in this case since no NaNs have to be substituted for missing values when min_periods=1. I think it would make sense to treat the case min_periods<=1 separately to prevent unnecessary type conversions like this one.

@jreback
Copy link
Contributor

jreback commented Mar 7, 2017

the window routines are only defined for double (in cython), so this would be a special case. Why is it necessary to add this complexity (IOW this is only possible for this special case).

@mwiebusch78
Copy link
Author

Well, if you're using ints to avoid rounding errors (e.g. because you later want to select rows based on their value in the integer column) it would be nice to have a function for computing rolling sums which doesn't mess with the type.

However, I didn't realise that all the windowing functions are only implemented for floats. I assumed the reason for the conversion was that integers can't be NaN. A common pattern I use is to call rolling(n) without the min_periods option and then discard the first n-1 rows (which have NaNs because the window wasn't full). So, I was hoping I could prevent the conversion to floats by avoiding NaNs.

@jreback
Copy link
Contributor

jreback commented Mar 7, 2017

@mwiebusch78

there are no rounding errors for ints, so not sure what you mean. Sure you can do

df.rolling(...).sum().dropna().astype(int) if you like. It will be performant and allow you to cast the type.

Yes we currently always have float outtypes because virtually all rolling operations produce a float (the exceptions are count and sum when integers are input, but we coerce because of the NaN). Note this is not specific to .rolling, but is true for integer operations that produce (possibly) NaN output.

closing as won't fix, but if you'd like to submit a PR would take it (its a bit of work, you'd have to either template code generate, we use tempita for this in many places, or use a fused numeric type).

@jreback jreback closed this as completed Mar 7, 2017
@jreback jreback added Dtype Conversions Unexpected or buggy dtype conversions Enhancement Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate Numeric Operations Arithmetic, Comparison, and Logical operations Reshaping Concat, Merge/Join, Stack/Unstack, Explode labels Mar 7, 2017
@jreback jreback added this to the won't fix milestone Mar 7, 2017
@TomAugspurger TomAugspurger modified the milestones: won't fix, No action Jul 6, 2018
@marcusinthesky
Copy link

I am trying to solve a natural language problem which requires me doing a rolling join or sum of strings. While I understand the performance issue, having generic rolling regardless of type would be incredibly useful for users. But also thanks to the makers of this feature generally, super useful in other contexts.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Dtype Conversions Unexpected or buggy dtype conversions Enhancement Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate Numeric Operations Arithmetic, Comparison, and Logical operations Reshaping Concat, Merge/Join, Stack/Unstack, Explode
Projects
None yet
Development

No branches or pull requests

4 participants