Skip to content

regex group class should return AnyStr | None instead of AnyStr | Any #10526

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
cn-ml opened this issue Aug 1, 2023 · 5 comments
Closed

regex group class should return AnyStr | None instead of AnyStr | Any #10526

cn-ml opened this issue Aug 1, 2023 · 5 comments

Comments

@cn-ml
Copy link

cn-ml commented Aug 1, 2023

Every time when using the regex library I'm stumbling over this when extracting the group text from a regex match. From what I read the function returns the match as an AnyStr which corresponds to the type variable of the Match[AnyStr]. But it also returns Any. I don't possibly know how to return anything else from this method.

If a match was successful the group method will do one of the following:

  1. The last text matched by the capturing group (either by name or number) as an AnyStr type (?P<groupname>.+)
  2. None if the capturing group was part of a regex branch that did not match (\d+)|(\w+)
  3. The Error IndexError("no such group") if a group was referred that is not part of the pattern.

This is the method and here is the comment that argues similar.

The same problem can be observed in the following places:

Note: I'm not a professional and I don't even know if this is the right repository for this issue, but I didn't find any better guidance on where to post this.

The advantages of the AnyStr | None annotation would be:

  • null coalescion: text: str = match.group(1) or "default"
  • branching: if match.group(1): print("Match group 1 has matched something")
@AlexWaygood
Copy link
Member

I don't even know if this is the right repository for this issue

This is the right repository for this issue :)

Just to confirm -- your issue title mentions "regex", and there's a third-party library called regex which you can install from PyPI, but all of your links point to the stdlib re module. Does your issue concern the third-party regex library or the stdlib re module? We have stubs for both in typeshed :)

@AlexWaygood
Copy link
Member

Assuming you are discussing the re stdlib module rather than the regex third-party package, I'm afraid this issue is probably a duplicate of #9465. (We tried fixing it in #9482, but as you can see from the mypy_primer output on that PR, the fix would have been hugely disruptive, so we could never have merged it.)

@hauntsaninja hauntsaninja closed this as not planned Won't fix, can't repro, duplicate, stale Aug 1, 2023
@cn-ml
Copy link
Author

cn-ml commented Aug 3, 2023

Just to confirm -- your issue title mentions "regex", and there's a third-party library called regex which you can install from PyPI, but all of your links point to the stdlib re module. Does your issue concern the third-party regex library or the stdlib re module? We have stubs for both in typeshed :)

Sorry, I didn't know about this ambiguity. I was referring to the stdlib re module.

@cn-ml
Copy link
Author

cn-ml commented Aug 3, 2023

Assuming you are discussing the re stdlib module rather than the regex third-party package, I'm afraid this issue is probably a duplicate of #9465. (We tried fixing it in #9482, but as you can see from the mypy_primer output on that PR, the fix would have been hugely disruptive, so we could never have merged it.)

I didn't find any issues like mine when I searched the tracker, but it seems i missed some.

Can anybody explain to me what the lower issue here is? In the diff from the mypy_primer I'm seeing a whole lot of

Item "None" of "Optional[str]" has no attribute "split".

error messages. In code that seems like someone has forgot to check if the regex group actually produced a matching capturing group or None. To me this very much seems like the issue is not in the re module, but the code that uses it in that way. I think types are used to clearly and precisely define what a method behaves like, which Any definitely prevents.

On the worst case scenario the re module returning any will likely introduce a lot more bugs into user code than a precise type will produce in false positives. I'd much rather cast or check None in all false positives than keep an Any type in such a frequently used piece of stdlib code.

@KotlinIsland
Copy link
Contributor

Basedmypy works correctly with these cases:

if m := re.match("a(b)?(c)", s):
    reveal_type(m.groups())  #  (str | None, str)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants