Rendering Series[Categorical] raises UnicodeDecodeError #21002
Labels
Categorical
Categorical Data Type
Output-Formatting
__repr__ of pandas objects, to_string
Strings
String extension data type and string data
Milestone
calling repr() on a Series with categorical-dtype can raise UnicodeDecodeError under certain conditions. These conditions appear to include:
pd.get_option('max_rows') == 60
)Reproduce with:
It tentatively looks like the issue is in
_libs.hashing.hash_object_array
:When we get here,
val
is already astr
in both py2 and py3, so we go down theif PyString_Check(val):
branch. But when it tries toencode
astr
in py2, it first will try to decode withsys.getdefaultencoding()
, which raises.So my best guess is that the
PyString_Check
branch just doesn't belong.I'll take a look for related issues.
The text was updated successfully, but these errors were encountered: