Skip to content

Missing labels when applying style gives a KeyError #32125

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
simonvh opened this issue Feb 20, 2020 · 9 comments
Closed

Missing labels when applying style gives a KeyError #32125

simonvh opened this issue Feb 20, 2020 · 9 comments
Labels
Error Reporting Incorrect or improved errors from pandas Styler conditional formatting using DataFrame.style

Comments

@simonvh
Copy link

simonvh commented Feb 20, 2020

Code Sample, a copy-pastable example if possible

df = pd.DataFrame({"a": [1, 2, 3], "b":[4, 5, 6]})
df.style.bar(["a", "b", "c"])

Problem description

This issue occurs when passing a list of columns to bar() where one (or more) of the columns is not present in the dataframe. It will give a KeyError:

KeyError: 'Passing list-likes to .loc or [] with any missing labels is no longer supported, see https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#deprecate-loc-reindex-listlike'

Expected Output

This error does not describe the problem. I would expect one of two possible outcomes:

  1. An error message, which informs not to pass column names to bar() that are not present.
  2. Fixing _apply() in io/formats/style.py to deal with missing labels.

I would prefer option 2, but I can image that woudl not be the preferred behavior of pandas :-).

Output of pd.show_versions()

[paste the output of pd.show_versions() here below this line]
INSTALLED VERSIONS

commit : None
python : 3.8.1.final.0
python-bits : 64
OS : Linux
OS-release : 4.4.0-17134-Microsoft
machine : x86_64
processor : x86_64
byteorder : little
LC_ALL : None
LANG : C.UTF-8
LOCALE : en_US.UTF-8

pandas : 1.0.1
numpy : 1.18.1
pytz : 2019.3
dateutil : 2.8.1
pip : 20.0.2
setuptools : 45.2.0.post20200209
Cython : None
pytest : None
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : None
lxml.etree : None
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : 2.11.1
IPython : 7.12.0
pandas_datareader: None
bs4 : None
bottleneck : None
fastparquet : None
gcsfs : None
lxml.etree : None
matplotlib : None
numexpr : None
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : None
pytables : None
pytest : None
pyxlsb : None
s3fs : None
scipy : None
sqlalchemy : None
tables : None
tabulate : None
xarray : None
xlrd : None
xlwt : None
xlsxwriter : None
numba : None

@MarcoGorelli MarcoGorelli added the Error Reporting Incorrect or improved errors from pandas label Feb 20, 2020
@MarcoGorelli
Copy link
Member

Thanks @simonvh

I tried this on master and didn't get any error;

>>> df = pd.DataFrame({"a": [1, 2, 3], "b":[4, 5, 6]}) 
>>> df.style.bar(["a", "b", "c"])                                                                                                                                                                            
<pandas.io.formats.style.Styler at 0x7f8ebc074610>

@simonvh
Copy link
Author

simonvh commented Feb 20, 2020

Ah, that's something that I should have checked. Thanks for doing so!

@simonvh simonvh closed this as completed Feb 20, 2020
@MarcoGorelli
Copy link
Member

Sorry, wrote that in a bit of a rush - mind if I keep this open till we verify when it was fixed and if there's a test? Will check this tomorrow

@MarcoGorelli MarcoGorelli reopened this Feb 20, 2020
@MarcoGorelli
Copy link
Member

@simonvh are you sure

df = pd.DataFrame({"a": [1, 2, 3], "b":[4, 5, 6]})
df.style.bar(["a", "b", "c"])

is what you ran to get the error? I just tried it on a clean environment with just pandas v1.0.1 installed and it passed

@simonvh
Copy link
Author

simonvh commented Feb 21, 2020

Yes, only this code. This is with pandas installed from conda-forge:

pandas                    1.0.1            py38hb3f55d8_0    conda-forge

@MarcoGorelli
Copy link
Member

Thanks - TBH I don't know what's going on then, I just tried installing from conda-forge and got no error:

$ pip uninstall pandas
Uninstalling pandas-1.0.1:
  Would remove:
    /home/SERILOCAL/m.gorelli/miniconda3/lib/python3.7/site-packages/pandas-1.0.1.dist-info/*
    /home/SERILOCAL/m.gorelli/miniconda3/lib/python3.7/site-packages/pandas/*
Proceed (y/n)? y
  Successfully uninstalled pandas-1.0.1

$ conda install -c conda-forge pandas 
Collecting package metadata (current_repodata.json): done
Solving environment: done


==> WARNING: A newer version of conda exists. <==
  current version: 4.8.1
  latest version: 4.8.2

Please update conda by running

    $ conda update -n base -c defaults conda



## Package Plan ##

  environment location: /home/SERILOCAL/m.gorelli/miniconda3

  added / updated specs:
    - pandas


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    certifi-2019.11.28         |           py37_0         148 KB  conda-forge
    conda-4.8.2                |           py37_0         3.0 MB  conda-forge
    libgfortran-ng-7.3.0       |       hdf63c60_5         1.7 MB  conda-forge
    numpy-1.18.1               |   py37h95a1406_0         5.2 MB  conda-forge
    pandas-1.0.1               |   py37hb3f55d8_0        11.1 MB  conda-forge
    python-dateutil-2.8.1      |             py_0         220 KB  conda-forge
    pytz-2019.3                |             py_0         237 KB  conda-forge
    ------------------------------------------------------------
                                           Total:        21.6 MB

The following NEW packages will be INSTALLED:

  libblas            conda-forge/linux-64::libblas-3.8.0-14_openblas
  libcblas           conda-forge/linux-64::libcblas-3.8.0-14_openblas
  libgfortran-ng     conda-forge/linux-64::libgfortran-ng-7.3.0-hdf63c60_5
  liblapack          conda-forge/linux-64::liblapack-3.8.0-14_openblas
  libopenblas        conda-forge/linux-64::libopenblas-0.3.7-h5ec1e0e_6
  numpy              conda-forge/linux-64::numpy-1.18.1-py37h95a1406_0
  pandas             conda-forge/linux-64::pandas-1.0.1-py37hb3f55d8_0
  python-dateutil    conda-forge/noarch::python-dateutil-2.8.1-py_0
  pytz               conda-forge/noarch::pytz-2019.3-py_0

The following packages will be UPDATED:

  ca-certificates    pkgs/main::ca-certificates-2019.11.27~ --> conda-forge::ca-certificates-2019.11.28-hecc5488_0
  conda                       pkgs/main::conda-4.8.1-py37_0 --> conda-forge::conda-4.8.2-py37_0

The following packages will be SUPERSEDED by a higher-priority channel:

  certifi                                         pkgs/main --> conda-forge
  openssl              pkgs/main::openssl-1.1.1d-h7b6447c_3 --> conda-forge::openssl-1.1.1d-h516909a_0


Proceed ([y]/n)? y


Downloading and Extracting Packages
conda-4.8.2          | 3.0 MB    | ##################################### | 100% 
pytz-2019.3          | 237 KB    | ##################################### | 100% 
pandas-1.0.1         | 11.1 MB   | ##################################### | 100% 
numpy-1.18.1         | 5.2 MB    | ################################################################################## | 100% 
libgfortran-ng-7.3.0 | 1.7 MB    | ############################################################################################################# | 100% 
certifi-2019.11.28   | 148 KB    | ############################################################################################################# | 100% 
python-dateutil-2.8. | 220 KB    | ############################################################################################################# | 100% 
Preparing transaction: done
Verifying transaction: done
Executing transaction: done

$ python
Python 3.7.3 (default, Mar 27 2019, 22:11:17) 
[GCC 7.3.0] :: Anaconda, Inc. on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import pandas as pd
>>> df = pd.DataFrame({"a": [1, 2, 3], "b":[4, 5, 6]})
>>> df.style.bar(["a", "b", "c"])
<pandas.io.formats.style.Styler object at 0x7f724093bba8>
>>> 

@simonvh
Copy link
Author

simonvh commented Feb 21, 2020

Ah. I see. It's only when running the snippet in a Jupyter notebook. Then you get this error:

---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
~/anaconda3/envs/test3/lib/python3.8/site-packages/IPython/core/formatters.py in __call__(self, obj)
    343             method = get_real_method(obj, self.print_method)
    344             if method is not None:
--> 345                 return method()
    346             return None
    347         else:

~/anaconda3/envs/test3/lib/python3.8/site-packages/pandas/io/formats/style.py in _repr_html_(self)
    180         Hooks into Jupyter notebook rich display system.
    181         """
--> 182         return self.render()
    183 
    184     @Appender(

~/anaconda3/envs/test3/lib/python3.8/site-packages/pandas/io/formats/style.py in render(self, **kwargs)
    535         * table_attributes
    536         """
--> 537         self._compute()
    538         # TODO: namespace all the pandas keys
    539         d = self._translate()

~/anaconda3/envs/test3/lib/python3.8/site-packages/pandas/io/formats/style.py in _compute(self)
    610         r = self
    611         for func, args, kwargs in self._todo:
--> 612             r = func(self)(*args, **kwargs)
    613         return r
    614 

~/anaconda3/envs/test3/lib/python3.8/site-packages/pandas/io/formats/style.py in _apply(self, func, axis, subset, **kwargs)
    616         subset = slice(None) if subset is None else subset
    617         subset = _non_reducing_slice(subset)
--> 618         data = self.data.loc[subset]
    619         if axis is not None:
    620             result = data.apply(func, axis=axis, result_type="expand", **kwargs)

~/anaconda3/envs/test3/lib/python3.8/site-packages/pandas/core/indexing.py in __getitem__(self, key)
   1759                 except (KeyError, IndexError, AttributeError):
   1760                     pass
-> 1761             return self._getitem_tuple(key)
   1762         else:
   1763             # we by definition only have the 0th axis

~/anaconda3/envs/test3/lib/python3.8/site-packages/pandas/core/indexing.py in _getitem_tuple(self, tup)
   1286                 continue
   1287 
-> 1288             retval = getattr(retval, self.name)._getitem_axis(key, axis=i)
   1289 
   1290         return retval

~/anaconda3/envs/test3/lib/python3.8/site-packages/pandas/core/indexing.py in _getitem_axis(self, key, axis)
   1951                     raise ValueError("Cannot index with multidimensional key")
   1952 
-> 1953                 return self._getitem_iterable(key, axis=axis)
   1954 
   1955             # nested tuple slicing

~/anaconda3/envs/test3/lib/python3.8/site-packages/pandas/core/indexing.py in _getitem_iterable(self, key, axis)
   1592         else:
   1593             # A collection of keys
-> 1594             keyarr, indexer = self._get_listlike_indexer(key, axis, raise_missing=False)
   1595             return self.obj._reindex_with_indexers(
   1596                 {axis: [keyarr, indexer]}, copy=True, allow_dups=True

~/anaconda3/envs/test3/lib/python3.8/site-packages/pandas/core/indexing.py in _get_listlike_indexer(self, key, axis, raise_missing)
   1549             keyarr, indexer, new_indexer = ax._reindex_non_unique(keyarr)
   1550 
-> 1551         self._validate_read_indexer(
   1552             keyarr, indexer, o._get_axis_number(axis), raise_missing=raise_missing
   1553         )

~/anaconda3/envs/test3/lib/python3.8/site-packages/pandas/core/indexing.py in _validate_read_indexer(self, key, indexer, axis, raise_missing)
   1651             # just raising
   1652             if not (ax.is_categorical() or ax.is_interval()):
-> 1653                 raise KeyError(
   1654                     "Passing list-likes to .loc or [] with any missing labels "
   1655                     "is no longer supported, see "

KeyError: 'Passing list-likes to .loc or [] with any missing labels is no longer supported, see https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#deprecate-loc-reindex-listlike'

<pandas.io.formats.style.Styler at 0x7fee9e4cbc40>

@MarcoGorelli
Copy link
Member

Ah. I see. It's only when running the snippet in a Jupyter notebook.

Got it, thanks @simonvh !

Back to the issue, regarding your two suggestions:

An error message, which informs not to pass column names to bar() that are not present.

A more precise error message would be helpful, I imagine the core devs would accept a PR for this - would you interested in submitting a PR?

Fixing _apply() in io/formats/style.py to deal with missing labels.

The doc says:

subsetIndexSlice, optional

A valid slice for data to limit the style application to.

so I think it's reasonable to expect users to pass a valid slice. The decision to not support slicing with missing labels seems very deliberate

@jbrockmendel jbrockmendel added the Styler conditional formatting using DataFrame.style label Sep 21, 2020
@attack68
Copy link
Contributor

A recent PR has imporved the error message. The missing label 'c' is now communicated:

KeyError: "Passing list-likes to .loc or [] with any missing labels is no longer supported. The following labels were missing: Index(['c'], dtype='object')

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Error Reporting Incorrect or improved errors from pandas Styler conditional formatting using DataFrame.style
Projects
None yet
Development

No branches or pull requests

4 participants