Description
With apologies for opening an issue related to #2063 (which was about the arguments of a function), I have a related issue with the output of said function (difflib.get_close_matches
).
Issue
The issue is with sequences and strings. Currently, a definition like this:
_T = TypeVar('_T')
def get_close_matches(word: Sequence[_T], possibilities: Iterable[Sequence[_T]],
n: int = ..., cutoff: float = ...) -> List[Sequence[_T]]: ...
is still problematic, because mypy (and presumable pytype, PyCharm and pyflakes) assume that the output is a List[Sequence[_T]]
, but if the input would be strings, the output would be a list of strings. Since str
is not a Sequence[_T]
, this leads to spurious errors such as incompatible type "Sequence[str]"; expected "str"
.
Here is a trivial example:
from typing import TypeVar, Sequence, Iterable
_T = TypeVar('_T')
def process_sequence(word: Sequence[_T]) -> Sequence[_T]:
return word
def print_str(word: str):
print(word)
def print_sequence_of_ints(ints: Iterable[int]):
for i in ints:
print(i)
a = process_sequence("spam")
print_str(a)
b = process_sequence([1,2,3,42])
print_sequence_of_ints(b)
This works fine for b
, a List of ints, but gives an error for a
, a str:
error: Argument 1 to "print_str" has incompatible type "Sequence[str]"; expected "str"
Proposed solution
I propose to define:
_S = TypeVar('_S', str, Sequence)
So in difflib:
_S = TypeVar('_S', str, Sequence)
def get_close_matches(word: _S, possibilities: Iterable[_S],
n: int = ..., cutoff: float = ...) -> List[_S]: ...
Tests
Because PEP 484 apparently can't distinguish between str and Sequence[str], I did test if this solution works fine for Sequences of strings too. It does:
from typing import TypeVar, Sequence, Iterable
_S = TypeVar('_S', str, Sequence)
def process_sequence(s: _S) -> _S:
return s
def print_str(word: str):
print(word)
def print_sequence_of_ints(ints: Iterable[int]):
for i in ints:
print(i)
def print_sequence_of_str(words: Iterable[str]):
for word in words:
print(word)
word = "spam"
a = process_sequence(word)
print_str(a)
b = process_sequence([1,2,3,42])
print_sequence_of_ints(b)
c = process_sequence(["ham", "bacon", "eggs", "spam"])
print_sequence_of_str(c)
The above works fine for both mypy and pyflakes. I have not tested PyCharm or pytype.
Also, I wanted to get confirmation that this is the right approach before submitting a pull request. The reason is that the order of the types in _S = TypeVar('_S', str, Sequence)
is relevant. The other way around does not work: _S = TypeVar('_S', Sequence, str)
.
So before this is rolled out: is the ordering of a TypeVar well defined? I did find a (somewhat related) discussion on the ordering of @overload functions, where indeed the most specific type must go first, so I suspect that's the case here too.
Let me know your thoughts, and I'll make a PR if this is deemed useful.