-
-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Add defaultdict.__or__
; improve ChainMap.__or__
and UserDict.__or__
#10427
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This comment has been minimized.
This comment has been minimized.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! Looks pretty good, but a few nitpicks:
stdlib/collections/__init__.pyi
Outdated
def __or__(self, other: Mapping[_T1, _T2]) -> defaultdict[_KT | _T1, _VT | _T2]: ... | ||
def __ror__(self, other: Mapping[_T1, _T2]) -> defaultdict[_KT | _T1, _VT | _T2]: ... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you have a subclass of defaultdict
, then __or__
will actually return an instance of the subclass, even if __or__
has not been overridden on the subclass (note this differs from how dict.__or__
behaves):
>>> from collections import defaultdict
>>> class Foo(defaultdict): ...
...
>>> type(Foo(int) | defaultdict(int))
<class '__main__.Foo'>
We can't express this fully without higher-kinded typevars (python/typing#548), but we can give a more specific return type in the specific case where a mapping that has the same type parameters is being __or__
'd.
Also: the second parameter should be named value
, not other
:
>>> from collections import defaultdict
>>> help(defaultdict.__or__)
Help on wrapper_descriptor:
__or__(self, value, /)
Return self|value.
And it should be positional-only:
>>> from collections import defaultdict
>>> x = defaultdict(int)
>>> x.__or__(value={})
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: wrapper __or__() takes no keyword arguments
Putting all this together, I think the signature should be:
def __or__(self, other: Mapping[_T1, _T2]) -> defaultdict[_KT | _T1, _VT | _T2]: ... | |
def __ror__(self, other: Mapping[_T1, _T2]) -> defaultdict[_KT | _T1, _VT | _T2]: ... | |
@overload | |
def __or__(self, __value: Mapping[_KT, _VT]) -> Self: ... | |
@overload | |
def __or__(self, __value: Mapping[_T1, _T2]) -> defaultdict[_KT | _T1, _VT | _T2]: ... | |
@overload | |
def __or__(self, __value: Mapping[_KT, _VT]) -> Self: .. | |
@overload | |
def __ror__(self, __value: Mapping[_T1, _T2]) -> defaultdict[_KT | _T1, _VT | _T2]: ... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably the type of the other parameter should be _typeshed.SupportsKeysAndGetItem
, not Mapping
, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably the type of the other parameter should be
_typeshed.SupportsKeysAndGetItem
, notMapping
, right?
Doesn't look like it?
>>> from collections import defaultdict
>>> class Foo:
... def __init__(self):
... self.data = {}
... def keys(self):
... return self.data.keys()
... def __getitem__(self, key):
... return self.data[key]
...
>>> defaultdict(int) | Foo()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: unsupported operand type(s) for |: 'collections.defaultdict' and 'Foo'
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@JelleZijlstra @AlexWaygood defaultdict actually only supports subclasses of dict: https://github.com/python/cpython/blob/388b5daa523b828dc0f7e2a1a6886bebc20833ba/Modules/_collectionsmodule.c#L2136
However, we cannot declare it here as dict
, because dict
is declared to accept Mapping
in typeshed, so this causes a violation of Liskov substitution. This also seems wrong. The code only accept dicts: https://github.com/python/cpython/blob/388b5daa523b828dc0f7e2a1a6886bebc20833ba/Objects/dictobject.c#L3606
Quick demo:
>>> from collections.abc import Mapping
>>> class D(Mapping):
def __getitem__(self, key):
raise KeyError
def __len__(self):
return 0
def __iter__(self):
return iter(())
>>> {"a": 1} | D()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: unsupported operand type(s) for |: 'dict' and 'D'
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@JelleZijlstra @AlexWaygood defaultdict actually only supports subclasses of dict:
Yes we had just come to the same conclusion :)
However, we cannot declare it here as
dict
, becausedict
is declared to acceptMapping
in typeshed, so this causes a violation of Liskov substitution. This also seems wrong. The code only accept dicts: https://github.com/python/cpython/blob/388b5daa523b828dc0f7e2a1a6886bebc20833ba/Objects/dictobject.c#L3606
Good catch! Want to update dict.__(r)or__
in this PR as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know why this works, but it works. Mypy (with this PR) and Python somehow decide that defaultdict(int, {"foo": 1}) | {"bar": 2}
and {"bar": 2} | defaultdict(int, {"foo": 1})
are defaultdicts, even though dict
and defaultdict
both define __or__
and __ror__
.
@Akuli defaultdict is a subclass of dict, so python prefers its method over dict's. |
When evaluating the expression In all other cases, Python will use |
Here's a full description of the nightmarish complexity Python goes through to evaluate |
Thanks. I suspected it might have something to do with subclassing, but was somehow too lazy to look it up :) |
383ccfc
to
0750b52
Compare
This comment has been minimized.
This comment has been minimized.
0750b52
to
0938fbd
Compare
This comment has been minimized.
This comment has been minimized.
@AlexWaygood It looks like the correct type for def fn1(lhs: dict[str, str], rhs: Mapping[str, str]) -> None:
lhs | rhs # does not type check now
def fn2(lhs: dict[str, str], rhs: dict[str, str]) -> None:
lhs | rhs # type checks, but cannot be called with rhs=os.environ Also, I got an error from mypy:
which did not make sense to me, so I added ignore. I'm thinking to revert the change to dict. WDYT? |
Maybe a better change would be to propose a change in cpython to use PyMapping_Check instead of PyDict_Check in dict and defaultdict. This is what almost all Python implemented |
That thought crossed my mind too, but it's not clear if it's right, and in any case we'd only make this change in 3.13+, so it doesn't really solve the problem of what to do in typeshed right now. |
I'll try something by pushing to your PR branch and see if it fixes some of the primer errors |
As for I'd be OK with just accepting |
This comment has been minimized.
This comment has been minimized.
Well, my idea didn't help anything, and it's not surprising that it didn't. The issue isn't anything to do with our stubs for Arguably this is a true positive, as |
This comment has been minimized.
This comment has been minimized.
Also, add overloads with Self type to other __[r]or__ methods.
@AlexWaygood it is a true positive, but it is annoying to properly type a function like run. |
Yeah, just go with |
9e60e1f
to
e1484a2
Compare
This comment has been minimized.
This comment has been minimized.
Done. |
e1484a2
to
443a9b4
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thanks!
defaultdict.__or__
; improve ChainMap.__or__
and UserDict.__or__
By the way, a workflow nit for future typeshed PRs: we generally prefer to avoid force pushes at typeshed. They don't work so well with GitHub's UI, making it hard to see what changed in between commits and thereby making it harder to do incremental reviews. We squash everything into a single commit before merging, anyway, so there's no need to worry about a messy commit history in a PR :) |
Will note for future PRs. |
According to mypy_primer, this change has no effect on the checked open source code. 🤖🎉 |
No description provided.