Skip to content

Sorting tuples containing None raises TypeError in 3.11 #95173

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
jacobtylerwalls opened this issue Jul 23, 2022 · 4 comments · Fixed by #95176
Closed

Sorting tuples containing None raises TypeError in 3.11 #95173

jacobtylerwalls opened this issue Jul 23, 2022 · 4 comments · Fixed by #95176
Labels
3.11 only security fixes 3.12 only security fixes release-blocker triaged The issue has been accepted as valid by a triager. type-bug An unexpected behavior, bug, or error

Comments

@jacobtylerwalls
Copy link
Contributor

jacobtylerwalls commented Jul 23, 2022

Bug report

3.10

>>> sorted([(None, 2), (None, 1)])
[(None, 1), (None, 2)]

3.11.0b4

>>> sorted([(None, 2), (None, 1)])
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: '<' not supported between instances of 'NoneType' and 'NoneType'

Sorting lists works fine:

>>> sorted([[None, 2], [None, 1]])
[[None, 1], [None, 2]]

From the release notes, I expect this is due to #29076, however the release notes merely say that the "order of the result may differ", not that comparisons might newly raise TypeError:

Cases of sorting using tuples as keys may be significantly faster
in some cases. This is worth mentioning because, if the tuple
elements don't define a total ordering, the order of the result
may differ from earlier releases. It's generally true that the
result of sorting simply isn't well-defined in the absence of a
total ordering on list elements.

And the issue discussion says this about sorted():

Since very little is defined about the result of sorted() in the absence of a total ordering (just that it's some permutation of the original), that's not surprising.

So while I agree with the sentiment that it's not great to be sorting tuples containing elements without a total ordering, I wonder if the current release note might merit some editing.


Real-life example involved a pattern like:

sorted(iterable, key=attrgetter("optional_member", "required_member"))
@JelleZijlstra
Copy link
Member

Confirmed. Adding the release-blocker tag since this regression could affect reasonable real-world code.

cc @pablogsal as release manager and @rhettinger and @tim-one who where involved with #29076

@JelleZijlstra JelleZijlstra added triaged The issue has been accepted as valid by a triager. 3.11 only security fixes 3.12 only security fixes labels Jul 23, 2022
@rhettinger
Copy link
Contributor

rhettinger commented Jul 23, 2022

I suggest reverting the PR. It was a minor optimization that was tenuous because it was premised on sorting not having made explicit promises about what it would do with objects that didn't define a total ordering.

In retrospect, that was a mistake because that premise was extended to tuples where we do have explicit promises about when __lt__ and __eq__ are called.

We've documented that sorting only calls __lt__ and people do rely on that fact. And lexicographic sorting ordering for sequences guarantees that the order is determined by the first unequal pair (plus rules for unequal length and nesting). That amounts to a guarantee that in a tuple comparison, elements will only test __lt__ if __eq__ has returned False. This PR violates that documented promise.

Additionally, this PR made sorted() and list.sort() inconsistent with the rest of the language (special cases aren't special enough to break the rules):

# If min() and max() work, sorting should too.
>> min([(None, 2), (None, 1)])
(None, 1)
>>> max([(None, 2), (None, 1)])
(None, 2)

# Tuple ordering is well-defined.
>>> (None, 2) < (None, 1)
False
>>> (None, 1) < (None, 2)
True

# Sequence types, list and tuple, should be consistent.
# Sorting lists with None still works correctly.
>>> sorted([[None, 2], [None, 1]])
[[None, 1], [None, 2]

# Sort should be consistent with heaps which still work.
# Also, heapify() should always be replaceable by list.sort().
>>> from heapq import heapify, heappop
>>> data = [(None, 2), (None, 1)]
>>> heapify(data)
>>> heappop(data)
(None, 1)
>>> heappop(data)
(None, 2)

Possibly, the PR could restored someday if fallback logic were to restore consistent sequence comparison guarantees. Also, it might be worth discussing whether None < None should return False instead of raising an exception.

@pablogsal
Copy link
Member

I am producing to revert #29076 to unblock the 3.11b5 release of next week

@sweeneyde
Copy link
Member

I agree that it's safest to just revert for now, to unblock 3.11.

But I don't think the optimization is necessarily doomed: we could decide to only apply it to situations where it's actually known to be safe, something like this:

diff --git a/Objects/listobject.c b/Objects/listobject.c
index 83dfb7da01..38a99849b7 100644
--- a/Objects/listobject.c
+++ b/Objects/listobject.c
@@ -2455,16 +2455,10 @@ list_sort_impl(PyListObject *self, PyObject *keyfunc, int reverse)
             ms.key_compare = safe_object_compare;
         }

-        if (keys_are_in_tuples) {
-            /* Make sure we're not dealing with tuples of tuples
-             * (remember: here, key_type refers list [key[0] for key in keys]) */
-            if (key_type == &PyTuple_Type) {
-                ms.tuple_elem_compare = safe_object_compare;
-            }
-            else {
-                ms.tuple_elem_compare = ms.key_compare;
-            }
-
+        if (keys_are_in_tuples && ms.key_compare != unsafe_object_compare
+                               && ms.key_compare != safe_object_compare) {
+            /* For use when each key[0] has the same safe type */
+            ms.tuple_elem_compare = ms.key_compare;
             ms.key_compare = unsafe_tuple_compare;
             ms.first_tuple_items_resolved_it = 1; /* be optimistic */
         }

If the ms.tuple_elem_compare=unsafe_object_compare case is important, we could instead revert the change to unsafe_tuple_compare and retain it for that case, but then add an additional function very_unsafe_tuple_compare that uses the a[0] < b[0] optimization from #29076, but only when key_compare is specialized to a known type with a total order.

Repository owner moved this from Todo to Done in Release and Deferred blockers 🚫 Jul 24, 2022
pablogsal added a commit that referenced this issue Jul 24, 2022
miss-islington pushed a commit to miss-islington/cpython that referenced this issue Jul 24, 2022
(cherry picked from commit 9007dec)

Co-authored-by: Pablo Galindo Salgado <[email protected]>
miss-islington added a commit that referenced this issue Jul 24, 2022
(cherry picked from commit 9007dec)

Co-authored-by: Pablo Galindo Salgado <[email protected]>
miss-islington pushed a commit to miss-islington/cpython that referenced this issue Aug 1, 2022
miss-islington added a commit that referenced this issue Aug 1, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3.11 only security fixes 3.12 only security fixes release-blocker triaged The issue has been accepted as valid by a triager. type-bug An unexpected behavior, bug, or error
Projects
Development

Successfully merging a pull request may close this issue.

5 participants