Skip to content

Python: Regexp: Handle repetions {n} (with no ,) #3500

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 8 commits into from
Jun 25, 2020

Conversation

yoff
Copy link
Contributor

@yoff yoff commented May 18, 2020

To solve FP report #2403 with internal issue https://github.com/github/codeql-python-team/issues/85.
The code currently only handles the syntax {n,m}. This adds support for {n}.

@yoff yoff requested a review from a team as a code owner May 18, 2020 12:50
@yoff
Copy link
Contributor Author

yoff commented May 19, 2020

As Taus mentioned, we might consider checking if there is a dual problem with caret and if solving that should then be part of this PR.

As ar as I can tell, all these are improvements
@@ -207,9 +207,9 @@
| ax{3,} | sequence | 0 | 6 |
| ax{3} | char | 0 | 1 |
| ax{3} | char | 1 | 2 |
| ax{3} | char | 2 | 3 |
| ax{3} | char | 3 | 4 |
| ax{3} | char | 4 | 5 |
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure why 4-5 is in there, but both that and removing 2-3 is consistent with how {n,m} is handled..

@tausbn
Copy link
Contributor

tausbn commented May 20, 2020

I was perusing the Python re module documentation just now (as you do), and I noticed the \A and \Z escapes, which match the start and end of the string respectively. Thus, something like

>>> re.match("(\Aab$|\Aba$)$\Z", "ab")
<_sre.SRE_Match object; span=(0, 2), match='ab'>

works as expected. I don't know that we're explicitly handling these cases here, so perhaps a few more tests would be in order? 🙂

@yoff
Copy link
Contributor Author

yoff commented May 20, 2020

Looks like lastPart has an explicit case for \$ and would need one for \Z.

@yoff
Copy link
Contributor Author

yoff commented May 26, 2020

Made the test @tausbn sugested pass, the code feels a bit ad-hock, though.

@yoff yoff requested a review from RasmusWL June 10, 2020 17:50
Copy link
Member

@RasmusWL RasmusWL left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Besides autoformatting and a maybe adding a test-case, it all looks good to me 👍

@yoff
Copy link
Contributor Author

yoff commented Jun 24, 2020

I added https://github.com/github/codeql-python-team/issues/114 to track the fact that, as @tausbn observed, we probably have the dual problem with carets.

@yoff yoff requested a review from RasmusWL June 24, 2020 10:17
Copy link
Member

@RasmusWL RasmusWL left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still LGTM 🚀

@RasmusWL RasmusWL merged commit b36c23e into github:master Jun 25, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants