-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
De-duplicate code for indexing with list-likes of keys #21503
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
(there is still room for improvement, by the way, but I will need to look in the |
Codecov Report
@@ Coverage Diff @@
## master #21503 +/- ##
==========================================
- Coverage 91.92% 91.92% -0.01%
==========================================
Files 153 153
Lines 49594 49570 -24
==========================================
- Hits 45590 45566 -24
Misses 4004 4004
Continue to review full report at Codecov.
|
ASV run (of
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
assumed this to make some other of your PRs simpler if this is a precursor?
Not sure I understand. This transforms three copies of the same code into one (and in doing so removes some superfluous calls, explaining the performance improvements). So yes, incidentally it will also make it easier to further rationalize the indexing code, and yes, I plan to work on it. |
@toobaz no that's what I was asking. thanks. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks good on the cleanup, a few comments.
@@ -3627,7 +3627,7 @@ def _reindex_non_unique(self, target): | |||
else: | |||
|
|||
# need to retake to have the same size as the indexer | |||
indexer[~check] = 0 | |||
indexer[~check] = -1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what effect did this have? e.g. was this a bug before or just not used?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To the best of my understanding, it was not used: not in the sense that this line was not hit, but because the locations for missing keys were then taken from new_indexer
(see three lines below), where they were correctly marked with -1.
pandas/core/indexing.py
Outdated
o._get_axis_number(axis)) | ||
d[axis] = (keyarr, indexer) | ||
d = {axis: self._get_listlike_indexer(key, axis) | ||
for (key, axis) in zip(tup, o._AXIS_ORDERS)} | ||
return o._reindex_with_indexers(d, copy=True, allow_dups=True) | ||
except (KeyError, IndexingError) as detail: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can only raise KeyError now, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and in fact, maybe can just use self._exception()
in _get_listlike_indexer
? then can simply remove the try/except (here)? (and I think you do this below)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In fact, I think the entire block was just superfluous: if we get here, it means we have collections of keys, and if a key is not found, it will raise later.
pandas/core/indexing.py
Outdated
values : array-like | ||
An indexer for the return object; -1 denotes keys not found | ||
""" | ||
try: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you move ops that won't fail outside the try
pandas/core/indexing.py
Outdated
if new_indexer is not None: | ||
result = self.obj._take(indexer[indexer != -1], axis=axis) | ||
def _getitem_iterable(self, key, axis=None): | ||
if axis is None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you add a doc-string here
pandas/core/indexing.py
Outdated
|
||
return result | ||
if com.is_bool_indexer(key): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you add a comment on these cases (e.g. what is being checked)
pandas/core/indexing.py
Outdated
@@ -1214,6 +1220,10 @@ def _validate_read_indexer(self, key, indexer, axis): | |||
u"None of [{key}] are in the [{axis}]".format( | |||
key=key, axis=self.obj._get_axis_name(axis))) | |||
|
|||
if not(self.name == 'loc' and not raise_missing): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you add a comment here
@jreback ping |
very nice @toobaz more like this very welcome! |
git diff upstream/master -u -- "*.py" | flake8 --diff
Just refactoring, removing duplicated code (I don't think the bug in
Index._reindex_non_unique
was actually appearing anywhere)