Skip to content

"cannot reindex from duplicate axis" when adding cell of missing row to non-unique index #16018

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
toobaz opened this issue Apr 16, 2017 · 2 comments · Fixed by #41607
Closed
Labels
good first issue Needs Tests Unit test(s) needed to prevent regressions
Milestone

Comments

@toobaz
Copy link
Member

toobaz commented Apr 16, 2017

Code Sample, a copy-pastable example if possible

In [2]: df = pd.DataFrame([[1,2,5,6], [3,4,7,8]], index=['a', 'a'], columns=pd.MultiIndex.from_product([[1,2], ['A', 'B']]))

In [3]: df.loc['c'] = -1

In [4]: df.loc['c', (1, 'A')] = 3

In [5]: df.loc['d', (1, 'A')] = 3
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-5-34c3deb1645d> in <module>()
----> 1 df.loc['d', (1, 'A')] = 3

/home/pietro/nobackup/repo/pandas/pandas/core/indexing.py in __setitem__(self, key, value)
    176             key = com._apply_if_callable(key, self.obj)
    177         indexer = self._get_setitem_indexer(key)
--> 178         self._setitem_with_indexer(indexer, value)
    179 
    180     def _has_valid_type(self, k, axis):

/home/pietro/nobackup/repo/pandas/pandas/core/indexing.py in _setitem_with_indexer(self, indexer, value)
    348                     index = self.obj._get_axis(i)
    349                     labels = index.insert(len(index), key)
--> 350                     self.obj._data = self.obj.reindex_axis(labels, i)._data
    351                     self.obj._maybe_update_cacher(clear=True)
    352                     self.obj.is_copy = None

/home/pietro/nobackup/repo/pandas/pandas/core/frame.py in reindex_axis(self, labels, axis, method, level, copy, limit, fill_value)
   2844                      self).reindex_axis(labels=labels, axis=axis,
   2845                                         method=method, level=level, copy=copy,
-> 2846                                         limit=limit, fill_value=fill_value)
   2847 
   2848     @Appender(_shared_docs['rename'] % _shared_doc_kwargs)

/home/pietro/nobackup/repo/pandas/pandas/core/generic.py in reindex_axis(self, labels, axis, method, level, copy, limit, fill_value)
   2492                                                  limit=limit)
   2493         return self._reindex_with_indexers({axis: [new_index, indexer]},
-> 2494                                            fill_value=fill_value, copy=copy)
   2495 
   2496     def _reindex_with_indexers(self, reindexers, fill_value=np.nan, copy=False,

/home/pietro/nobackup/repo/pandas/pandas/core/generic.py in _reindex_with_indexers(self, reindexers, fill_value, copy, allow_dups)
   2515                                                 fill_value=fill_value,
   2516                                                 allow_dups=allow_dups,
-> 2517                                                 copy=copy)
   2518 
   2519         if copy and new_data is self._data:

/home/pietro/nobackup/repo/pandas/pandas/core/internals.py in reindex_indexer(self, new_axis, indexer, axis, fill_value, allow_dups, copy)
   3879         # some axes don't allow reindexing with dups
   3880         if not allow_dups:
-> 3881             self.axes[axis]._can_reindex(indexer)
   3882 
   3883         if axis >= self.ndim:

/home/pietro/nobackup/repo/pandas/pandas/indexes/base.py in _can_reindex(self, indexer)
   2733         # trying to reindex on an axis with duplicates
   2734         if not self.is_unique and len(indexer):
-> 2735             raise ValueError("cannot reindex from a duplicate axis")
   2736 
   2737     def reindex(self, target, method=None, level=None, limit=None,

ValueError: cannot reindex from a duplicate axis

Problem description

I think the reindexing code path shouldn't be taken at all (in this case - I'm not referring to the general design choice): anyway, the operation should work, coherently with the case where the index is unique and with the case where the row exists (In [4]).

Expected Output

Like In [4].

Output of pd.show_versions()

INSTALLED VERSIONS

commit: None
python: 3.5.3.final.0
python-bits: 64
OS: Linux
OS-release: 4.7.0-1-amd64
machine: x86_64
processor:
byteorder: little
LC_ALL: None
LANG: it_IT.utf8
LOCALE: it_IT.UTF-8

pandas: 0.19.0+783.gcd35d22a0
pytest: 3.0.6
pip: 9.0.1
setuptools: 33.1.1
Cython: 0.25.2
numpy: 1.12.0
scipy: 0.18.1
xarray: 0.9.1
IPython: 5.1.0.dev
sphinx: 1.4.9
patsy: 0.3.0-dev
dateutil: 2.5.3
pytz: 2016.7
blosc: None
bottleneck: 1.2.0
tables: 3.3.0
numexpr: 2.6.1
feather: 0.3.1
matplotlib: 2.0.0
openpyxl: 2.3.0
xlrd: 1.0.0
xlwt: 1.1.2
xlsxwriter: 0.9.6
lxml: 3.7.1
bs4: 4.5.3
html5lib: 0.999999999
sqlalchemy: 1.0.15
pymysql: None
psycopg2: None
jinja2: 2.8
s3fs: None
pandas_gbq: None
pandas_datareader: 0.2.1

@jreback
Copy link
Contributor

jreback commented Apr 17, 2017

yep looks like a bug.

@jreback jreback added Bug Difficulty Intermediate Indexing Related to indexing on series/frames, not to indexes themselves MultiIndex labels Apr 17, 2017
@jreback jreback added this to the Next Major Release milestone Apr 17, 2017
@mroeschke
Copy link
Member

Looks to work in master now. Could use a test now

In [15]: In [2]: df = pd.DataFrame([[1,2,5,6], [3,4,7,8]], index=['a', 'a'], columns=pd.MultiIndex.from_product([
    ...: [1,2], ['A', 'B']]))
    ...:
    ...: In [3]: df.loc['c'] = -1
    ...:
    ...: In [4]: df.loc['c', (1, 'A')] = 3
    ...:
    ...: In [5]: df.loc['d', (1, 'A')] = 3

In [16]: df
Out[16]:
     1         2
     A    B    A    B
a  1.0  2.0  5.0  6.0
a  3.0  4.0  7.0  8.0
c  3.0 -1.0 -1.0 -1.0
d  3.0  NaN  NaN  NaN

@mroeschke mroeschke added good first issue Needs Tests Unit test(s) needed to prevent regressions and removed Bug Indexing Related to indexing on series/frames, not to indexes themselves MultiIndex labels May 8, 2021
@mroeschke mroeschke mentioned this issue May 21, 2021
10 tasks
@jreback jreback modified the milestones: Contributions Welcome, 1.3 May 21, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue Needs Tests Unit test(s) needed to prevent regressions
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants