Skip to content

Add defaultdict.__or__; improve ChainMap.__or__ and UserDict.__or__ #10427

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jul 11, 2023

Conversation

eltoder
Copy link
Contributor

@eltoder eltoder commented Jul 9, 2023

No description provided.

@github-actions

This comment has been minimized.

Copy link
Member

@AlexWaygood AlexWaygood left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Looks pretty good, but a few nitpicks:

Comment on lines 395 to 396
def __or__(self, other: Mapping[_T1, _T2]) -> defaultdict[_KT | _T1, _VT | _T2]: ...
def __ror__(self, other: Mapping[_T1, _T2]) -> defaultdict[_KT | _T1, _VT | _T2]: ...
Copy link
Member

@AlexWaygood AlexWaygood Jul 11, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you have a subclass of defaultdict, then __or__ will actually return an instance of the subclass, even if __or__ has not been overridden on the subclass (note this differs from how dict.__or__ behaves):

>>> from collections import defaultdict
>>> class Foo(defaultdict): ...
...
>>> type(Foo(int) | defaultdict(int))
<class '__main__.Foo'>

We can't express this fully without higher-kinded typevars (python/typing#548), but we can give a more specific return type in the specific case where a mapping that has the same type parameters is being __or__'d.

Also: the second parameter should be named value, not other:

>>> from collections import defaultdict
>>> help(defaultdict.__or__)
Help on wrapper_descriptor:

__or__(self, value, /)
    Return self|value.

And it should be positional-only:

>>> from collections import defaultdict
>>> x = defaultdict(int)
>>> x.__or__(value={})
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: wrapper __or__() takes no keyword arguments

Putting all this together, I think the signature should be:

Suggested change
def __or__(self, other: Mapping[_T1, _T2]) -> defaultdict[_KT | _T1, _VT | _T2]: ...
def __ror__(self, other: Mapping[_T1, _T2]) -> defaultdict[_KT | _T1, _VT | _T2]: ...
@overload
def __or__(self, __value: Mapping[_KT, _VT]) -> Self: ...
@overload
def __or__(self, __value: Mapping[_T1, _T2]) -> defaultdict[_KT | _T1, _VT | _T2]: ...
@overload
def __or__(self, __value: Mapping[_KT, _VT]) -> Self: ..
@overload
def __ror__(self, __value: Mapping[_T1, _T2]) -> defaultdict[_KT | _T1, _VT | _T2]: ...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably the type of the other parameter should be _typeshed.SupportsKeysAndGetItem, not Mapping, right?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably the type of the other parameter should be _typeshed.SupportsKeysAndGetItem, not Mapping, right?

Doesn't look like it?

>>> from collections import defaultdict
>>> class Foo:
...     def __init__(self):
...         self.data = {}
...     def keys(self):
...         return self.data.keys()
...     def __getitem__(self, key):
...         return self.data[key]
...
>>> defaultdict(int) | Foo()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: unsupported operand type(s) for |: 'collections.defaultdict' and 'Foo'

Copy link
Contributor Author

@eltoder eltoder Jul 11, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@JelleZijlstra @AlexWaygood defaultdict actually only supports subclasses of dict: https://github.com/python/cpython/blob/388b5daa523b828dc0f7e2a1a6886bebc20833ba/Modules/_collectionsmodule.c#L2136
However, we cannot declare it here as dict, because dict is declared to accept Mapping in typeshed, so this causes a violation of Liskov substitution. This also seems wrong. The code only accept dicts: https://github.com/python/cpython/blob/388b5daa523b828dc0f7e2a1a6886bebc20833ba/Objects/dictobject.c#L3606
Quick demo:

>>> from collections.abc import Mapping
>>> class D(Mapping):
    def __getitem__(self, key):
        raise KeyError
    def __len__(self):
        return 0
    def __iter__(self):
        return iter(())

>>> {"a": 1} | D()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: unsupported operand type(s) for |: 'dict' and 'D'

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@JelleZijlstra @AlexWaygood defaultdict actually only supports subclasses of dict:

Yes we had just come to the same conclusion :)

However, we cannot declare it here as dict, because dict is declared to accept Mapping in typeshed, so this causes a violation of Liskov substitution. This also seems wrong. The code only accept dicts: https://github.com/python/cpython/blob/388b5daa523b828dc0f7e2a1a6886bebc20833ba/Objects/dictobject.c#L3606

Good catch! Want to update dict.__(r)or__ in this PR as well?

Copy link
Collaborator

@Akuli Akuli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know why this works, but it works. Mypy (with this PR) and Python somehow decide that defaultdict(int, {"foo": 1}) | {"bar": 2} and {"bar": 2} | defaultdict(int, {"foo": 1}) are defaultdicts, even though dict and defaultdict both define __or__ and __ror__.

@eltoder
Copy link
Contributor Author

eltoder commented Jul 11, 2023

@Akuli defaultdict is a subclass of dict, so python prefers its method over dict's.

@AlexWaygood
Copy link
Member

AlexWaygood commented Jul 11, 2023

I don't know why this works, but it works. Mypy (with this PR) and Python somehow decide that defaultdict(int, {"foo": 1}) | {"bar": 2} and {"bar": 2} | defaultdict(int, {"foo": 1}) are defaultdicts, even though dict and defaultdict both define __or__ and __ror__.

When evaluating the expression x | y, Python will use type(y).__ror__ instead of type(x).__or__ iff both type(x).__or__ and type(y).__ror__ exist and type(y) is a subclass of type(x). Mypy's presumably aware of this and figures out the type accordingly :)

In all other cases, Python will use type(x).__or__ to evaluate x | y if type(x).__or__ and type(y).__ror__ are both defined.

@AlexWaygood
Copy link
Member

Here's a full description of the nightmarish complexity Python goes through to evaluate x | y: https://snarky.ca/unravelling-binary-arithmetic-operations-in-python/

@Akuli
Copy link
Collaborator

Akuli commented Jul 11, 2023

Thanks. I suspected it might have something to do with subclassing, but was somehow too lazy to look it up :)

@eltoder eltoder force-pushed the feature/defaultdict-or branch from 383ccfc to 0750b52 Compare July 11, 2023 15:31
@github-actions

This comment has been minimized.

@eltoder eltoder force-pushed the feature/defaultdict-or branch from 0750b52 to 0938fbd Compare July 11, 2023 15:56
@github-actions

This comment has been minimized.

@eltoder
Copy link
Contributor Author

eltoder commented Jul 11, 2023

@AlexWaygood It looks like the correct type for dict.__or__ has an unfortunate side-effect: now there's no nice way to type a function that does dict | os.environ. This works because _Environ defines __ror__, but the type is not public and one wouldn't want to put it in the function signature. IOW:

def fn1(lhs: dict[str, str], rhs: Mapping[str, str]) -> None:
    lhs | rhs  # does not type check now

def fn2(lhs: dict[str, str], rhs: dict[str, str]) -> None:
    lhs | rhs  # type checks, but cannot be called with rhs=os.environ

Also, I got an error from mypy:

stdlib/collections/__init__.pyi:408: error: Signatures of "__ror__" of "defaultdict" and "__or__" of "dict[_T1, _T2]" are unsafely overlapping  [misc]

which did not make sense to me, so I added ignore.

I'm thinking to revert the change to dict. WDYT?

@eltoder
Copy link
Contributor Author

eltoder commented Jul 11, 2023

Maybe a better change would be to propose a change in cpython to use PyMapping_Check instead of PyDict_Check in dict and defaultdict. This is what almost all Python implemented __or__ methods do. This is inconsistent with set, though.

@JelleZijlstra
Copy link
Member

Maybe a better change would be to propose a change in cpython to use PyMapping_Check instead of PyDict_Check in dict and defaultdict. This is what almost all Python implemented __or__ methods do. This is inconsistent with set, though.

That thought crossed my mind too, but it's not clear if it's right, and in any case we'd only make this change in 3.13+, so it doesn't really solve the problem of what to do in typeshed right now.

@AlexWaygood
Copy link
Member

I'll try something by pushing to your PR branch and see if it fixes some of the primer errors

@JelleZijlstra
Copy link
Member

As for environ, we do have stubs for the os._Environ type, and they define an __or__ that accepts Mapping. I haven't checked whether that's exactly correct.

I'd be OK with just accepting Mapping for now in defaultdict.__or__ and fixing the type in a separate PR.

@github-actions

This comment has been minimized.

@AlexWaygood
Copy link
Member

AlexWaygood commented Jul 11, 2023

Well, my idea didn't help anything, and it's not surprising that it didn't.

The issue isn't anything to do with our stubs for os._Environ.__(r)or__. The issue is that mypy has no way of knowing that the object being |'d is an instance of os._Environ, as they're just using Mapping as the type hint: https://github.com/systemd/mkosi/blob/84ba040dc048971f675c47f45b9b154e087e342e/mkosi/run.py#L227-L246C11

Arguably this is a true positive, as __or__ and __ror__ aren't included in the Mapping interface, so you can't guarantee that an arbitrary mapping can be |'d with a dictionary or an instance of os._Environ. But perhaps it makes it more annoying for users, for little gain. So maybe we should just stick with Mapping for dict.__(r)or__ and defaultdict.__(r)or__ for now.

@github-actions

This comment has been minimized.

Also, add overloads with Self type to other __[r]or__ methods.
@eltoder
Copy link
Contributor Author

eltoder commented Jul 11, 2023

@AlexWaygood it is a true positive, but it is annoying to properly type a function like run. env: Mapping[str, PathString] seems logical, but actually needs to be "a type that can be or'ed to a dict", which is "anything that accepts dict in __ror__". I think we can do this with a protocol and stick it into typing_extensions or something, but it's a lot of work. A more conservative type of dict will disallow os.environ. A quick fix for that is dict | os._Environ, but that seems ugly.

@AlexWaygood
Copy link
Member

Yeah, just go with Mapping for dict.__or__ and defaultdict.__or__ for now :)

@eltoder eltoder force-pushed the feature/defaultdict-or branch from 9e60e1f to e1484a2 Compare July 11, 2023 18:09
@github-actions

This comment has been minimized.

@eltoder
Copy link
Contributor Author

eltoder commented Jul 11, 2023

Done.

@eltoder eltoder force-pushed the feature/defaultdict-or branch from e1484a2 to 443a9b4 Compare July 11, 2023 18:28
Copy link
Member

@AlexWaygood AlexWaygood left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks!

@AlexWaygood AlexWaygood changed the title Add __or__ to defaultdict Add defaultdict.__or__; improve ChainMap.__or__ and UserDict.__or__ Jul 11, 2023
@AlexWaygood
Copy link
Member

AlexWaygood commented Jul 11, 2023

By the way, a workflow nit for future typeshed PRs: we generally prefer to avoid force pushes at typeshed. They don't work so well with GitHub's UI, making it hard to see what changed in between commits and thereby making it harder to do incremental reviews. We squash everything into a single commit before merging, anyway, so there's no need to worry about a messy commit history in a PR :)

@eltoder
Copy link
Contributor Author

eltoder commented Jul 11, 2023

Will note for future PRs.

@github-actions
Copy link
Contributor

According to mypy_primer, this change has no effect on the checked open source code. 🤖🎉

@AlexWaygood AlexWaygood merged commit cfc5425 into python:main Jul 11, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants