-
Notifications
You must be signed in to change notification settings - Fork 179
Check for non-ast SyntaxErrors #8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This includes return and yield outside of a function and break and continue outside of a loop. Fixes lp 1293654. The problem is that these SyntaxErrors are not encoded in the ast grammar, so they are not detected when just compiling to ast. You must compile down to bytecode to catch them. The advantage here is that we can still check for other kinds of errors in this case, because the ast is still valid.
I wasn't able to figure out how to add a test case for checking both a SyntaxError and a warning, like break
import sys
notdefined I tried def test_nonastSyntaxErrorPlusOther(self):
"""
When we have a non-ast SyntaxError (like break outside of a loop),
we can also detect other errors, since we still get a generated ast.
"""
source = """\
break
import sys
notdefined
"""
sourcePath = self.makeTempFile(source)
self.assertHasErrors(
sourcePath,
["""\
%s:1:0: 'break' outside loop
break
^
""" % sourcePath,
"%s:2: 'sys' imported but unused" % sourcePath,
"%s:3: undefined name 'notdefined'" % sourcePath,
]) But I got (from py.test)
|
Aw man, flake8? Really? |
I hope you at least appreciate the irony of requiring the code to pass pep8. If you're going to have this requirement, could you make is so that The other test failures are because older versions of Python apparently do not add the column number for these syntax errors. I will fix that. |
Except they're not really errors, which is kind of the whole point here...
The "'return' outside of function" SyntaxError apparently doesn't give column information in Python 2.
The other thing I did not figure out here was how to inject the SyntaxError in the correct order in the messages. |
I have a couple concerns: PerformanceI would think that compiling the code twice would be a big performance hit. One of the distinguishing features of pyflakes vs. other similar tools is its speed, so this is pretty important. I did some quick tests on whatever .py files were lying about in /usr/lib/python. I did a handful of runs and took the fastest time:
That's not as bad as I would have guessed, but still, it's 24% slower. That's a pretty big hit for not a lot of benefit: syntax errors are among the most obvious. StructureThis patch, in effect, adds more things that pyflakes checks. Thus conceptually, the additions should belong in Of course, However, that may be moot, depending on how the performance problem is addressed. It seems there are just a few things ( |
I was also worried about performance but I also didn't notice a slowdown. I think it's not really running twice, just an extra step (the first step generates the ast and the second step compiles it).
These are the ones I know about. I'd have to look at the Python compiler to see what fully is caught at this stage. |
Also it should be noted that existing SyntaxErrors are caught in api.py. I don't disagree that the code should be reorganized, but I'm not knowledgable enough to do it myself. Regarding the usefulness of warning about this, it has come up in my own work at least twice, which is why I finally decided to implement it. I personally use flycheck in emacs to catch all pyflakes warnings on the fly as I write my code, and it's pretty surprising and annoying when pyflakes shows no errors but the code fails with a syntax error (especially if you're lazy and rely on pyflakes to check your code before committing rather than actually running it 😉) |
I agree, it can be annoying. However, more annoying is a 24% slowdown. You may not notice the slowdown on a single file, but there are a number of projects using pyflakes on quite large repositories. I can't justify catching a couple of SyntaxErrors that will surely be noticed the first time the code is run at the expense of that performance regression. Unless you can find some other solution to the performance problem, probably the way to move forward involves adding some checks in Granted, it will take some finesse to get all the edge cases just right, and it isn't guaranteed to have the same behavior as compiling to bytecode, but given the performance tradeoff I think this is the best solution. |
There are several other errors. Search https://hg.python.org/cpython/file/6dcc96fa3970/Python/compile.c for "compiler_error". |
It looks like in addition to
|
At any rate, that shows us the logic we would need to implement (we would need to check each Python version). The logic is not too bad, although not trivial (certainly not as easy as just calling compile). It's also quite clear from reading that that there is no simpler way built into Python to just get these errors without doing the full bytecode compilation. It is written in C, though, so the compile function is actually quite fast, as it goes (it only takes an extra five seconds 4258 files in my test, from 21 seconds to 26 seconds). |
So this means there is something that will catch these errors without needing them to be checked by PyFlakes, right? Assuming there are tests for a piece of software, running them would catch all of these, not by virtue of tests being written for them, but by virtue of the tests ostensibly causing Python to compile the files. I'm not sure why we should be enhancing PyFlakes so as to discourage users from running their test suites. |
@sigmavirus24 I'm not sure what you use pyflakes for, but I thought that the whole point of it is to catch errors as early as possible. My editor runs pyflakes every time I enter text into it. It does not run the full test suite every time. For some throwaway code that I write, pyflakes is the entirety of my "test suite". |
It's a linter. That means it will catch some errors (e.g., variables undefined before use) and catch some other non-errors (e.g., unused imports, unused variable bindings, etc.). All of those are easily checked without needing to compile the code. Your assertion seems to be that compilation is necessary for these checks and frankly, it adds what I consider to be an unacceptable cost in performance. As the maintainer of Flake8, we've added a few features lately to allow users to improve the performance of checks against large numbers of files and I'd rather have those features slowly negated by adding solutions like this one. Finally, pyflakes certainly is not going to catch every possible problem in your code. If you do if False:
x = 'hello world'
print(x) There's no way for it to catch that type of error (at present) without doing much more complicated checking or running the code (because even in this case, compilation won't catch it). The only way you could catch an error like that would be by running it (or perhaps using something far more advanced like pylint). tl;dr PyFlakes can't reasonably catch every error and I'm not sure it should try. I don't think the benefit of the performance costs for large code bases is worth it. |
@sigmavirus24 maybe you are on the wrong issue. You're looking for https://bugs.launchpad.net/pyflakes/+bug/1369671. I'm not arguing that pyflakes should try to be smarter and smarter and do type inference and all this crazy stuff. Please don't set up the straw man that I am. All I want is for pyflakes to be at least as smart as Python itself in catching the simplest possible error you can have, a syntax error. @bitglue may be right that this can be done more efficiently. It will take more work, as we'd have to translate those errors from the compile source to the pyflakes checker. If someone else wants to do it please jump on it. I probably won't have time to do it in a while. Meanwhile, this works (minus the issue of the errors being out of order. I don't know how to fix that or how important it is). It is a little slower, but you can consider that a motivation to make it faster. Maybe we can add an option to disable it (or enable it; I don't care what the default is). |
What Python versions does pyflakes support? The README says "It is available on PyPI and it supports all active versions of Python from 2.5 to 3.4.", but to me, that means 2.7, 3.2 (?), 3.3, and 3.4, as the official docs state that all other versions are no longer supported at all (see http://legacy.python.org/dev/peps/pep-0361/). I doubt you actually want to drop at least 2.6 support yet, but it would be better if this were more explicit. I'm asking because we'll have to check the compile rules for each supported version, to see if there are any differences (the starred stuff obviously is Python 3 only, but I didn't check yet if there is some Python 2 only stuff). |
Currently it supports the versions which are regularly tested on Travis CI plus version 2.5 explicitly, for historical reason. We don't feel any urge to drop Python 2.5 support ... but there's actually no automated test on this version. |
I wasn't setting up any straw man. I was trying to illustrate that by compiling we only get a couple of extra benefits that are otherwise better caught through any sort of testing (even just running the script you're writing). Compiling gets us very little extra on top of that. It can't be used as a jumping off point for other items on our wishlist (which might justify the performance cost). |
It doesn't sound like we are making progress on this issue. I understand your use case @asmeurer, and yes it would be nice if pyflakes could catch every error. However, it must also be fast, and sometimes those goals are mutually exclusive. In this particular instance I'm judging speed to be more important than correctness. I appreciate your contribution, but I'm making a BDFL pronouncement: this is too slow. If you can approach this in a way that is faster please open a new PR. There may be other ways to address your use case. I don't use flycheck, but I bet it can be configured to invoke python to compile your code in addition to invoking pyflakes. For your use case of checking a single file at a time the performance impact should be acceptable. I'd also encourage you to consider unit testing. Even if your test coverage is very poor, it will catch these errors since it need only import the module to find them. There are tools which will watch the filesystem for changes or integrate with an editor to run unit tests automatically. With a setup like this I've found TDD to be much more fun and valuable. |
Of course I use unit testing, but I don't run my tests constantly as I write my code. |
This includes return and yield outside of a function and break and continue
outside of a loop. Fixes lp 1293654.
The problem is that these SyntaxErrors are not encoded in the ast grammar, so
they are not detected when just compiling to ast. You must compile down to
bytecode to catch them. The advantage here is that we can still check for
other kinds of errors in this case, because the ast is still valid.