Skip to content

Conversation

asmeurer
Copy link
Contributor

This includes return and yield outside of a function and break and continue
outside of a loop. Fixes lp 1293654.

The problem is that these SyntaxErrors are not encoded in the ast grammar, so
they are not detected when just compiling to ast. You must compile down to
bytecode to catch them. The advantage here is that we can still check for
other kinds of errors in this case, because the ast is still valid.

This includes return and yield outside of a function and break and continue
outside of a loop. Fixes lp 1293654.

The problem is that these SyntaxErrors are not encoded in the ast grammar, so
they are not detected when just compiling to ast. You must compile down to
bytecode to catch them.  The advantage here is that we can still check for
other kinds of errors in this case, because the ast is still valid.
@asmeurer
Copy link
Contributor Author

I wasn't able to figure out how to add a test case for checking both a SyntaxError and a warning, like

break
import sys
notdefined

I tried

    def test_nonastSyntaxErrorPlusOther(self):
        """
        When we have a non-ast SyntaxError (like break outside of a loop),
        we can also detect other errors, since we still get a generated ast.
        """
        source = """\
break
import sys
notdefined
"""
        sourcePath = self.makeTempFile(source)
        self.assertHasErrors(
            sourcePath,
            ["""\
%s:1:0: 'break' outside loop
break
    ^
""" % sourcePath,
"%s:2: 'sys' imported but unused" % sourcePath,
"%s:3: undefined name 'notdefined'" % sourcePath,
            ])

But I got (from py.test)

    def assertHasErrors(self, path, errorList):
        """
            Assert that C{path} causes errors.

            @param path: A path to a file to check.
            @param errorList: A list of errors expected to be printed to stderr.
            """
        err = StringIO()
        count = withStderrTo(err, checkPath, path)
        self.assertEqual(
>           (count, err.getvalue()), (len(errorList), ''.join(errorList)))
E       AssertionError: Tuples differ: (3, "[97 chars]  ^\n") != (3, "[97 chars]  ^\n/var/folders/yc/8wpl9rlx47qgzxqpcf003k280[136 chars]ed'")
E
E       First differing element 1:
E       /var/folders/yc/8wpl9rlx47qgzxqpcf003k280000gn/T/tmpq_cr5e5n:1:0: 'break' outside loop
E       break
E           ^
E
E       /var/folders/yc/8wpl9rlx47qgzxqpcf003k280000gn/T/tmpq_cr5e5n:1:0: 'break' outside loop
E       break
E           ^
E       /var/folders/yc/8wpl9rlx47qgzxqpcf003k280000gn/T/tmpq_cr5e5n:2: 'sys' imported but unused/var/folders/yc/8wpl9rlx47qgzxqpcf003k280000gn/T/tmpq_cr5e5n:3: undefined name 'notdefined'
E
E         (3,
E          "/var/folders/yc/8wpl9rlx47qgzxqpcf003k280000gn/T/tmpq_cr5e5n:1:0: 'break' "
E          'outside loop\n'
E          'break\n'
E       -  '    ^\n')
E       ?           -
E
E       +  '    ^\n'
E       +  "/var/folders/yc/8wpl9rlx47qgzxqpcf003k280000gn/T/tmpq_cr5e5n:2: 'sys' "
E       +  'imported but '
E       +  'unused/var/folders/yc/8wpl9rlx47qgzxqpcf003k280000gn/T/tmpq_cr5e5n:3: '
E       +  "undefined name 'notdefined'")

pyflakes/test/test_api.py:251: AssertionError

@asmeurer
Copy link
Contributor Author

Aw man, flake8? Really?

@asmeurer
Copy link
Contributor Author

I hope you at least appreciate the irony of requiring the code to pass pep8.

If you're going to have this requirement, could you make is so that py.test runs flake8. That way, I would have known that the code was going to fail the tests before I submitted it.

The other test failures are because older versions of Python apparently do not add the column number for these syntax errors. I will fix that.

asmeurer added 2 commits July 26, 2014 12:29
Except they're not really errors, which is kind of the whole point here...
The "'return' outside of function" SyntaxError apparently doesn't give column
information in Python 2.
@asmeurer
Copy link
Contributor Author

The other thing I did not figure out here was how to inject the SyntaxError in the correct order in the messages.

@bitglue
Copy link
Member

bitglue commented Sep 23, 2014

I have a couple concerns:

Performance

I would think that compiling the code twice would be a big performance hit. One of the distinguishing features of pyflakes vs. other similar tools is its speed, so this is pretty important.

I did some quick tests on whatever .py files were lying about in /usr/lib/python. I did a handful of runs and took the fastest time:

  • without patch: 2.41s
  • with patch: 2.98s

That's not as bad as I would have guessed, but still, it's 24% slower. That's a pretty big hit for not a lot of benefit: syntax errors are among the most obvious.

Structure

This patch, in effect, adds more things that pyflakes checks. Thus conceptually, the additions should belong in Checker. api.py should just be some minimal glue code. It probably does too much already.

Of course, Checker has a reference only to the AST, and so far as I was able to find on a brief reading, there is no way to compile to bytecode from AST. So, implementing these checks would require somehow changing the interface to Checker.

However, that may be moot, depending on how the performance problem is addressed. It seems there are just a few things (return/yield outside a function) that compiling to bytecode catches that AST does not: as such it shouldn't be too hard to check for these things in the AST, without compiling to bytecode. That would address both this concern and likely perform better.

@asmeurer
Copy link
Contributor Author

I was also worried about performance but I also didn't notice a slowdown. I think it's not really running twice, just an extra step (the first step generates the ast and the second step compiles it).

However, that may be moot, depending on how the performance problem is addressed. It seems there are just a few things (return/yield outside a function) that compiling to bytecode catches that AST does not: as such it shouldn't be too hard to check for these things in the AST, without compiling to bytecode. That would address both this concern and likely perform better.

These are the ones I know about. I'd have to look at the Python compiler to see what fully is caught at this stage.

@asmeurer
Copy link
Contributor Author

Also it should be noted that existing SyntaxErrors are caught in api.py. I don't disagree that the code should be reorganized, but I'm not knowledgable enough to do it myself.

Regarding the usefulness of warning about this, it has come up in my own work at least twice, which is why I finally decided to implement it. I personally use flycheck in emacs to catch all pyflakes warnings on the fly as I write my code, and it's pretty surprising and annoying when pyflakes shows no errors but the code fails with a syntax error (especially if you're lazy and rely on pyflakes to check your code before committing rather than actually running it 😉)

@bitglue
Copy link
Member

bitglue commented Sep 23, 2014

I agree, it can be annoying. However, more annoying is a 24% slowdown. You may not notice the slowdown on a single file, but there are a number of projects using pyflakes on quite large repositories. I can't justify catching a couple of SyntaxErrors that will surely be noticed the first time the code is run at the expense of that performance regression.

Unless you can find some other solution to the performance problem, probably the way to move forward involves adding some checks in def RETURN (and def YIELD) of Checker. It should be possible to examine self.scopeStack and emit a warning if there isn't an enclosing function.

Granted, it will take some finesse to get all the edge cases just right, and it isn't guaranteed to have the same behavior as compiling to bytecode, but given the performance tradeoff I think this is the best solution.

@asmeurer
Copy link
Contributor Author

There are several other errors. Search https://hg.python.org/cpython/file/6dcc96fa3970/Python/compile.c for "compiler_error".

@asmeurer
Copy link
Contributor Author

It looks like in addition to return and yield outside of a function and break and continue outside of a loop we have:

  • __future__ imports must be at the beginning of the file (this one gets me all the time)
  • default except: must be last
  • 'continue' not supported inside 'finally' clause
  • Lots of errors relating to new Python 3 starred unpacking syntax (a, *b = stuff). For example, a, *b, *c is valid in the ast, even though it has too many starred items and is only allowed in an assignment.

@asmeurer
Copy link
Contributor Author

At any rate, that shows us the logic we would need to implement (we would need to check each Python version). The logic is not too bad, although not trivial (certainly not as easy as just calling compile). It's also quite clear from reading that that there is no simpler way built into Python to just get these errors without doing the full bytecode compilation. It is written in C, though, so the compile function is actually quite fast, as it goes (it only takes an extra five seconds 4258 files in my test, from 21 seconds to 26 seconds).

@sigmavirus24
Copy link
Member

So this means there is something that will catch these errors without needing them to be checked by PyFlakes, right? Assuming there are tests for a piece of software, running them would catch all of these, not by virtue of tests being written for them, but by virtue of the tests ostensibly causing Python to compile the files. I'm not sure why we should be enhancing PyFlakes so as to discourage users from running their test suites.

@asmeurer
Copy link
Contributor Author

@sigmavirus24 I'm not sure what you use pyflakes for, but I thought that the whole point of it is to catch errors as early as possible. My editor runs pyflakes every time I enter text into it. It does not run the full test suite every time. For some throwaway code that I write, pyflakes is the entirety of my "test suite".

@sigmavirus24
Copy link
Member

but I thought that the whole point of it is to catch errors as early as possible.

It's a linter. That means it will catch some errors (e.g., variables undefined before use) and catch some other non-errors (e.g., unused imports, unused variable bindings, etc.). All of those are easily checked without needing to compile the code. Your assertion seems to be that compilation is necessary for these checks and frankly, it adds what I consider to be an unacceptable cost in performance. As the maintainer of Flake8, we've added a few features lately to allow users to improve the performance of checks against large numbers of files and I'd rather have those features slowly negated by adding solutions like this one.

Finally, pyflakes certainly is not going to catch every possible problem in your code. If you do

if False:
   x = 'hello world'

print(x)

There's no way for it to catch that type of error (at present) without doing much more complicated checking or running the code (because even in this case, compilation won't catch it). The only way you could catch an error like that would be by running it (or perhaps using something far more advanced like pylint).

tl;dr PyFlakes can't reasonably catch every error and I'm not sure it should try. I don't think the benefit of the performance costs for large code bases is worth it.

@asmeurer
Copy link
Contributor Author

@sigmavirus24 maybe you are on the wrong issue. You're looking for https://bugs.launchpad.net/pyflakes/+bug/1369671. I'm not arguing that pyflakes should try to be smarter and smarter and do type inference and all this crazy stuff. Please don't set up the straw man that I am. All I want is for pyflakes to be at least as smart as Python itself in catching the simplest possible error you can have, a syntax error.

@bitglue may be right that this can be done more efficiently. It will take more work, as we'd have to translate those errors from the compile source to the pyflakes checker. If someone else wants to do it please jump on it. I probably won't have time to do it in a while.

Meanwhile, this works (minus the issue of the errors being out of order. I don't know how to fix that or how important it is). It is a little slower, but you can consider that a motivation to make it faster. Maybe we can add an option to disable it (or enable it; I don't care what the default is).

@asmeurer
Copy link
Contributor Author

What Python versions does pyflakes support? The README says "It is available on PyPI and it supports all active versions of Python from 2.5 to 3.4.", but to me, that means 2.7, 3.2 (?), 3.3, and 3.4, as the official docs state that all other versions are no longer supported at all (see http://legacy.python.org/dev/peps/pep-0361/). I doubt you actually want to drop at least 2.6 support yet, but it would be better if this were more explicit.

I'm asking because we'll have to check the compile rules for each supported version, to see if there are any differences (the starred stuff obviously is Python 3 only, but I didn't check yet if there is some Python 2 only stuff).

@florentx
Copy link
Contributor

Currently it supports the versions which are regularly tested on Travis CI plus version 2.5 explicitly, for historical reason.
https://github.com/pyflakes/pyflakes/blob/master/.travis.yml

We don't feel any urge to drop Python 2.5 support ... but there's actually no automated test on this version.

@sigmavirus24
Copy link
Member

Please don't set up the straw man that I am

I wasn't setting up any straw man. I was trying to illustrate that by compiling we only get a couple of extra benefits that are otherwise better caught through any sort of testing (even just running the script you're writing). Compiling gets us very little extra on top of that. It can't be used as a jumping off point for other items on our wishlist (which might justify the performance cost).

@bitglue
Copy link
Member

bitglue commented Sep 24, 2014

It doesn't sound like we are making progress on this issue. I understand your use case @asmeurer, and yes it would be nice if pyflakes could catch every error. However, it must also be fast, and sometimes those goals are mutually exclusive. In this particular instance I'm judging speed to be more important than correctness. I appreciate your contribution, but I'm making a BDFL pronouncement: this is too slow. If you can approach this in a way that is faster please open a new PR.

There may be other ways to address your use case. I don't use flycheck, but I bet it can be configured to invoke python to compile your code in addition to invoking pyflakes. For your use case of checking a single file at a time the performance impact should be acceptable.

I'd also encourage you to consider unit testing. Even if your test coverage is very poor, it will catch these errors since it need only import the module to find them. There are tools which will watch the filesystem for changes or integrate with an editor to run unit tests automatically. With a setup like this I've found TDD to be much more fun and valuable.

@bitglue bitglue closed this Sep 24, 2014
@asmeurer
Copy link
Contributor Author

Of course I use unit testing, but I don't run my tests constantly as I write my code.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants