-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
DOC: .loc doesn't document that it can take sets or dicts as arguments #42825
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
+1 raising and documenting. also restricting .loc to lists only and not sequence inputs: see #42329 (comment) |
I'm assuming for speed the implementation of |
cc @jbrockmendel Should we decide for 1.4 here? |
it'd be nice to get in, yah |
Will look into this the coming days |
Can I ask what was the reasoning behind deprecating the ability to use a set as an indexer? I've followed the discussion and links above, but cannot discern why the decision was made to deprecate the use of sets as indexers. In my experience, it's incredibly helpful to be able to use sets as indexers, and I'd be happy to contribute code towards un-deprecating this option if possible. Looking through the code a bit, my understanding is that the index will be converted into a list in any case here: https://github.com/phofl/pandas/blob/f04ec539eaf4ab1d03c1654e6f90b32c6b56f078/pandas/core/frame.py#L3505-L3507. |
-1 on undeprecating, sets only work in some cases, in others they naturally raise an error. iloc is one of those cases. |
Thanks for the response. I meant to ask about why it was deprecated in Is the issue that if it is deprecated for one, it must be deprecated for both? |
Loc is already to flexible, so we restricted this since it was not documented anyway |
I see, thanks. Would you be open to a pull request that adds documentation rather than alters behavior? I have a lot of code that relies on this behavior. |
Again, I am -1 on undeprecating |
Can I ask why? With such an established and widely used library, I would think removing functionality that people rely on should only happen if there's a substantive reason it shouldn't be maintained. I'm trying to understand whether that's the case here. So far I can't see any reason for this functionality to have been removed. |
Set has random order, e.g. the order of your DataFrame is random as well, which leads to all sorts of weird behavior, you will probably end up implicitly aligning your DataFrame in follow up operations, which costs performance and ist just counterintuitive. A set as an indexer is in general not a good idea |
I see, thanks for explaining your reasoning. I disagree with your assessment for the following reasons. There are many cases in which dataframe order does not matter, and there are also many cases in which dataframe order should be determined by elements of the dataframe, such that it must be sorted after selection in any case. If the indexer is already a set, as it often is, than in either of these cases, converting it to a list just to satisfy an apparently arbitrary requirement of pandas is inefficient in terms of performance, and also adds unnecessary code complexity. If the reasoning for the deprectaion is just what's contained in the above comment, I'll go ahead and submit a pull request to un-deprecate the use of sets as indexers for |
I have been a supporter of formalising and reducing the number of potential input combinations to loc. It is a complex part of the code base especially when considering hierarchical indexes. I believe there are various scattered historical discussions around this. You are welcome to submit to a PR but my instinct tells me it will be difficult for you to garner the support to get it approved. Are there any alternatives you can use to set, i.e. is there a sensible workaround to your problems? |
Loc is complicated enough, inconsistency with iloc is another reason. Behavior is just counterintuitive. It is literally not covered at all in our testing suite. No clue which corner cases work and which don’t. It was never documented, so people shouldn’t be relying on it anyway. We deprecated this a year ago and nobody complained till now. |
Thanks to you both for explaining more of your reasoning. Set operations are very well suited to creating indexers in a wide variety of settings, and so I've used sets for that purpose extensively over the course of many years, both for hierarchical and non-hierarchical indices. I could go through quite a lot of code and wrap the indexer in all my calls to My understanding is that this conversion to a list is exactly what is currently happening anyway in the following lines: https://github.com/phofl/pandas/blob/f04ec539eaf4ab1d03c1654e6f90b32c6b56f078/pandas/core/frame.py#L3505-L3507. Is this understanding incorrect? After exploring the code a bit more, my understanding is that the issue with sets with hierarchical indices arise not when the indexer is a set, but when the indexer contains sets. I.e. if the collection of indices is a set, there is no problem -- it just gets turned into a list in the lines linked above. If the collection contains sets, then there's a problem because the level of the index that each element in the indexer refers to is ambiguous. Does this match your understanding of the issues you are referencing above? My suggestion is that we just de-deprecate using sets as indexers, not using indexers that contain sets. I'd be happy to add the appropriate documentation and tests. Given that set indexers are just converted to list indexers, unless I'm mistaken, I'm curious to know what about the behavior you find counter-intuitive. Please let me know your thoughts. |
I understand that this functionality will not be undeprecated. Thank you all for your time and consideration, and your work on pandas.
…------- Original Message -------
On Thursday, December 1st, 2022 at 7:58 PM, jbrockmendel ***@***.***> wrote:
If updating your code in too many places is the issue, then consider patching Loc.__getitem__/__setitem__ to convert lists in just one place.
I agree with ***@***.***(https://github.com/phofl) and ***@***.***(https://github.com/attack68); maintaining Loc is a beast and simplifying the behavior where feasible is a positive.
—
Reply to this email directly, [view it on GitHub](#42825 (comment)), or [unsubscribe](https://github.com/notifications/unsubscribe-auth/AAWIYOT7XGV254F2LWUV4SDWLFCRRANCNFSM5BJA4UEA).
You are receiving this because you commented.Message ID: ***@***.***>
|
Location of the documentation
https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.loc.html?highlight=loc#pandas.DataFrame.loc
Documentation problem
Surprisingly, the following code works:
In the documentation, we don't say that you can pass a
set
or adict
as one of the arguments to.loc
.So either we update the documentation, or maybe we should be raising an Exception?
The text was updated successfully, but these errors were encountered: