Skip to content

BUG: return Series as DataFrame.dtypes/ftypes for empty dataframes #5740

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Feb 17, 2014
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions doc/source/release.rst
Original file line number Diff line number Diff line change
Expand Up @@ -65,6 +65,7 @@ API Changes
- ``select_as_multiple`` will always raise a ``KeyError``, when a key or the selector is not found (:issue:`6177`)
- ``df['col'] = value`` and ``df.loc[:,'col'] = value`` are now completely equivalent;
previously the ``.loc`` would not necessarily coerce the dtype of the resultant series (:issue:`6149`)
- ``dtypes`` and ``ftypes`` now return a series with ``dtype=object`` on empty containers (:issue:`5740`)


Experimental Features
Expand Down
6 changes: 4 additions & 2 deletions pandas/core/generic.py
Original file line number Diff line number Diff line change
Expand Up @@ -1947,7 +1947,8 @@ def get_ftype_counts(self):
def dtypes(self):
""" Return the dtypes in this object """
from pandas import Series
return Series(self._data.get_dtypes(),index=self._info_axis)
return Series(self._data.get_dtypes(), index=self._info_axis,
dtype=np.object_)

@property
def ftypes(self):
Expand All @@ -1956,7 +1957,8 @@ def ftypes(self):
in this object.
"""
from pandas import Series
return Series(self._data.get_ftypes(),index=self._info_axis)
return Series(self._data.get_ftypes(), index=self._info_axis,
dtype=np.object_)

def as_blocks(self, columns=None):
"""
Expand Down
41 changes: 41 additions & 0 deletions pandas/tests/test_frame.py
Original file line number Diff line number Diff line change
Expand Up @@ -12164,6 +12164,47 @@ def test_concat_empty_dataframe_dtypes(self):
self.assertEqual(result['b'].dtype, np.float64)
self.assertEqual(result['c'].dtype, np.float64)

def test_empty_frame_dtypes_ftypes(self):
empty_df = pd.DataFrame()
assert_series_equal(empty_df.dtypes, pd.Series(dtype=np.object))
assert_series_equal(empty_df.ftypes, pd.Series(dtype=np.object))

nocols_df = pd.DataFrame(index=[1,2,3])
assert_series_equal(nocols_df.dtypes, pd.Series(dtype=np.object))
assert_series_equal(nocols_df.ftypes, pd.Series(dtype=np.object))

norows_df = pd.DataFrame(columns=list("abc"))
assert_series_equal(norows_df.dtypes, pd.Series(np.object, index=list("abc")))
assert_series_equal(norows_df.ftypes, pd.Series('object:dense', index=list("abc")))

norows_int_df = pd.DataFrame(columns=list("abc")).astype(np.int32)
assert_series_equal(norows_int_df.dtypes, pd.Series(np.dtype('int32'), index=list("abc")))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jreback does this make sense? dtype's just going to change once you assign something anyways, no?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is fine; yes dtype will change once you assign something

assert_series_equal(norows_int_df.ftypes, pd.Series('int32:dense', index=list("abc")))

odict = OrderedDict
df = pd.DataFrame(odict([('a', 1), ('b', True), ('c', 1.0)]), index=[1, 2, 3])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is this test case here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there's a case below that tests an empty slice of dataframe with columns having different dtypes, I couldn't resist adding two more asserts just to be sure that the result matches that of non-empty slice.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Revisiting this comment after an hour I think I might have misattributed the question to the beginning of df case, so just in case: tests concerning nocols_df, norows_df and norows_int_df are about empty dataframes (empty in a sense that they contain no actual data cells) and all returned garbage before patching dtype/ftype funcs.

assert_series_equal(df.dtypes, pd.Series(odict([('a', np.int64),
('b', np.bool),
('c', np.float64)])))
assert_series_equal(df.ftypes, pd.Series(odict([('a', 'int64:dense'),
('b', 'bool:dense'),
('c', 'float64:dense')])))

# same but for empty slice of df
assert_series_equal(df[:0].dtypes, pd.Series(odict([('a', np.int),
('b', np.bool),
('c', np.float)])))
assert_series_equal(df[:0].ftypes, pd.Series(odict([('a', 'int64:dense'),
('b', 'bool:dense'),
('c', 'float64:dense')])))

def skip_if_no_ne(engine='numexpr'):
if engine == 'numexpr':
try:
import numexpr as ne
except ImportError:
raise nose.SkipTest("cannot query engine numexpr when numexpr not "
"installed")


def skip_if_no_pandas_parser(parser):
Expand Down