Skip to content

Functions with Sequences or strings as input and output #2067

Closed
@macfreek

Description

@macfreek

With apologies for opening an issue related to #2063 (which was about the arguments of a function), I have a related issue with the output of said function (difflib.get_close_matches).

Issue

The issue is with sequences and strings. Currently, a definition like this:

_T = TypeVar('_T')
def get_close_matches(word: Sequence[_T], possibilities: Iterable[Sequence[_T]],
                      n: int = ..., cutoff: float = ...) -> List[Sequence[_T]]: ...

is still problematic, because mypy (and presumable pytype, PyCharm and pyflakes) assume that the output is a List[Sequence[_T]], but if the input would be strings, the output would be a list of strings. Since str is not a Sequence[_T], this leads to spurious errors such as incompatible type "Sequence[str]"; expected "str".

Here is a trivial example:

from typing import TypeVar, Sequence, Iterable

_T = TypeVar('_T')
def process_sequence(word: Sequence[_T]) -> Sequence[_T]:
    return word

def print_str(word: str):
    print(word)

def print_sequence_of_ints(ints: Iterable[int]):
    for i in ints:
        print(i)

a = process_sequence("spam")
print_str(a)

b = process_sequence([1,2,3,42])
print_sequence_of_ints(b)

This works fine for b, a List of ints, but gives an error for a, a str:

error: Argument 1 to "print_str" has incompatible type "Sequence[str]"; expected "str"

Proposed solution

I propose to define:

_S = TypeVar('_S', str, Sequence)

So in difflib:

_S = TypeVar('_S', str, Sequence)
def get_close_matches(word: _S, possibilities: Iterable[_S],
                      n: int = ..., cutoff: float = ...) -> List[_S]: ...

Tests

Because PEP 484 apparently can't distinguish between str and Sequence[str], I did test if this solution works fine for Sequences of strings too. It does:

from typing import TypeVar, Sequence, Iterable

_S = TypeVar('_S', str, Sequence)

def process_sequence(s: _S) -> _S:
    return s

def print_str(word: str):
    print(word)

def print_sequence_of_ints(ints: Iterable[int]):
    for i in ints:
        print(i)

def print_sequence_of_str(words: Iterable[str]):
    for word in words:
        print(word)

word = "spam"
a = process_sequence(word)
print_str(a)

b = process_sequence([1,2,3,42])
print_sequence_of_ints(b)

c = process_sequence(["ham", "bacon", "eggs", "spam"])
print_sequence_of_str(c)

The above works fine for both mypy and pyflakes. I have not tested PyCharm or pytype.

Also, I wanted to get confirmation that this is the right approach before submitting a pull request. The reason is that the order of the types in _S = TypeVar('_S', str, Sequence) is relevant. The other way around does not work: _S = TypeVar('_S', Sequence, str).

So before this is rolled out: is the ordering of a TypeVar well defined? I did find a (somewhat related) discussion on the ordering of @overload functions, where indeed the most specific type must go first, so I suspect that's the case here too.

Let me know your thoughts, and I'll make a PR if this is deemed useful.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions