-
-
Notifications
You must be signed in to change notification settings - Fork 31.9k
Question about adjacent empty matches in regular expressions #122055
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
The quoted line is currently 1082-3 in the main branch. #76489 (PR ##4846) changed 'previous match' to 'previous empty match' because the result of @serhiy-storchaka This was your PR. The implication of the current doc is that there can be adjacent empty matches, but some of MRABarnett's comment " the general consensus has been to match once" implies otherwise. |
Assuming the rules for empty matches are now consistent across the various functions (sub, finditer, etc), maybe it would be clearer to say “Empty matches do not occur adjacent to a previous empty match”? Same for the re.split documentation. |
Aren't these two adjacent empty matches? >>> pat = re.compile('^(.?)(.?)')
>>> m = pat.match('')
>>> m.groups()
('', '')
>>> pat = re.compile('^(.?)(.?)(a+)')
>>> pat.match('a').groups()
('', '', 'a') |
Thank you @terryjreedy for the context. Indeed, it was a change in the existing text which can now look confusing. In fact, adjacent empty matches are not possible because it is considered the same match. Like @vadmium, yes, it would perhaps be better. @ronaldoussoren, no, in this context, we are talking about matches of the whole regular expression in functions that search matches repeatedly. |
Thank you @terryjreedy for providing the historical context and the before-and-after comparison from the PR, which significantly contributes to understanding the evolution of this behavior. While I appreciate @vadmium's suggestion for clarity, I find that it still doesn't fully elucidate the cause-and-effect relationship in this context. To address this, I have attempted to make the explanation more intuitive and directly address the core behavior; therefore, I propose the following modification: Adjacent empty matches are not possible, but an empty match can occur immediately after a non-empty match. As a result, This:
The inclusion of both the current and previous behavior in the example directly addresses the change introduced by the PR, making the documentation more informative for both new and experienced Python developers. Could you please review this suggestion and provide your thoughts? |
I think I ran into this today on Python 3.13.2:
Here, I'd expect This doesn't happen for an empty string, where a single replacements is made.
By using a Callable for
So, I think this is an instance, and perhaps a good example, of adjacent empty matches. |
I like your suggestion, @krave1986. There is also a similar phrase for |
@shtrom, adjacent matches are possible, but they cannot both be empty. There should be a progress, otherwise we would have infinite number of matches at the same position. |
Documentation
The Python documentation for re.sub() states:
However, after some testing, I have been unable to construct a regular expression pattern that produces adjacent empty matches. This leads to the following questions:
Is it actually possible to create a regular expression pattern that results in adjacent empty matches in Python's re module?
If not, should we consider updating the documentation to avoid potential confusion among developers?
My Investigation
I've tried various patterns that might theoretically produce adjacent empty matches, such as:
None of these patterns produce adjacent empty matches. The regex engine seems to always move forward after finding a match, even an empty one.
Request for Clarification
This issue sincerely requests clarification on this matter:
If adjacent empty matches are indeed possible as the documentation suggests, could you provide some examples that demonstrate this behavior?
Are there specific scenarios or edge cases where adjacent empty matches can occur?
If possible, could you share a minimal working example that shows how
re.sub()
handles adjacent empty matches differently from non-adjacent ones?These examples would greatly help in understanding the documentation and the behavior of the
re
module in such cases.Thank you for your time and attention to this matter. Any insights or examples you can provide would be greatly appreciated.
Linked PRs
The text was updated successfully, but these errors were encountered: