-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
BUG: return Series as DataFrame.dtypes/ftypes for empty dataframes #5740
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
It is also faster. OLD: In [10]: timeit df.dtypes
100 loops, best of 3: 7.3 ms per loop
In [11]: timeit df.ftypes
100 loops, best of 3: 7.42 ms per loop
In [12]: pd.__version__
Out[12]: '0.13.0rc1-92-gf6fd509' NEW: In [1]: pd.__version__
Out[1]: '0.13.0rc1-93-g55d081e'
In [2]: df = pd.DataFrame(columns=range(100), index=[1])
In [3]: timeit df.dtypes
1000 loops, best of 3: 277 µs per loop
In [4]: timeit df.ftypes
1000 loops, best of 3: 259 µs per loop |
assert_series_equal(norows_int_df.ftypes, pd.Series('int32:dense', index=list("abc"))) | ||
|
||
odict = OrderedDict | ||
df = pd.DataFrame(odict([('a', 1), ('b', True), ('c', 1.0)]), index=[1, 2, 3]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why is this test case here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there's a case below that tests an empty slice of dataframe with columns having different dtypes, I couldn't resist adding two more asserts just to be sure that the result matches that of non-empty slice.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Revisiting this comment after an hour I think I might have misattributed the question to the beginning of df
case, so just in case: tests concerning nocols_df
, norows_df
and norows_int_df
are about empty dataframes (empty in a sense that they contain no actual data cells) and all returned garbage before patching dtype
/ftype
funcs.
Oh, it breaks duplicate column test case (reindex is no good when columns are not unique) |
Fixed duplicate column test case. Now Also, the code is looking for trouble if |
this is essentially an apply_by_block reduction (which could be a separate method, maybe) ok....you can put they should both be a property, something like
this will break Series (as it cannot be reduced except to a scalar), so you may need to override Series.dtypes to just return the same as Series.dtype A Panel should breturn a DataFrame you can use the code that you have currently in core/frame.py/dtypes and put it in core/internals.py/get_dtypes ((in BlockManager) get_dtype_counts/get_ftype_counts could be similarly extended to Panel/Series (though these should return Series for Series/DataFrame and a DataFrame for Panel) |
FYI changing |
see for get_dtypes()/get_ftypes() you need to call |
assert_series_equal(norows_df.ftypes, pd.Series('object:dense', index=list("abc"))) | ||
|
||
norows_int_df = pd.DataFrame(columns=list("abc")).astype(np.int32) | ||
assert_series_equal(norows_int_df.dtypes, pd.Series(np.dtype('int32'), index=list("abc"))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jreback does this make sense? dtype's just going to change once you assign something anyways, no?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is fine; yes dtype will change once you assign something
@immerrr can you update this as discussed above? |
@jreback I'm not sure I get you right.
|
that's fine the internals method can just return that (or a Series directly for that matter)
These are sub-class of I am not sure how you would support even more fine-grained dtyping under the current implementation, nor why it would be necessary
This function needs to be done in |
fyi #5968 fixed the dtype slowness but would like to have these tests..so pls rebase when u have a chance |
Whoops, forgot about this PR... Will do the rebase.
The dimensionality is different, that's true. But wouldn't Panel.dtypes returning a DataFrame give a false impression that dtypes can vary along some axis other than "items"? I definitely would've fallen for that. |
yep that is fixed up |
if you rebase this you wil see that dtypes are now fixed (and you can take out the changes in core/frame) |
Indeed. I've enforced the returned Series dtype for consistency though, otherwise than that rebased tests pass. |
Rebased against master. Can this be merged? |
I think put a release note in as this is technically an API change (eg. that an empty Series/Frame would before be float64 I think), and now correctly object...can you update with that' thxs |
Whoops, now that I'm re-reading the last message I doubt that you wanted me to put this to bugfixes. Do you want to move it to API changes? |
API change |
Ok |
BUG: return Series as DataFrame.dtypes/ftypes for empty dataframes
thanks! |
for future reference.....you have to compare |
Yup, those should be |
DataFrame.dtypes
andDataFrame.ftypes
values were inconsistent for empty dataframes: