Skip to content

Run mypy parser against a lot of Python code and file bugs for errors #487

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
JukkaL opened this issue Nov 2, 2014 · 24 comments
Closed

Comments

@JukkaL
Copy link
Collaborator

JukkaL commented Nov 2, 2014

The mypy parser still doesn't support all of Python 3 syntax. Run the parser against a corpus of Python code and file tasks for any remaining parser issues. Note that there are open 2 pull requests that implement some new syntax (#367 and #483).

@JukkaL
Copy link
Collaborator Author

JukkaL commented Nov 2, 2014

Some issues I've found:

  • namedtuple(...) as base class EDIT: fixed
  • nonlocal EDIT: fixed
  • *foo lvalues (already fixed in Checking of starred expressions #483)
  • some nested tuple lvalues in for loops (profile.py in Python 3.2 stdlib, line 546) EDIT: false alarm?

TODO: File the above as issues, or point to existing issues.

@spkersten
Copy link
Contributor

I'll investigate the nested tuple lvalues issue. Apparently I overlooked something.

@JukkaL
Copy link
Collaborator Author

JukkaL commented Nov 9, 2014

Added open issue #494 for nonlocal support.

EDIT: implemented (thanks @spkersten!)

@JukkaL
Copy link
Collaborator Author

JukkaL commented Nov 9, 2014

There already is the issue #460 for namedtuple.

EDIT: implemented

@JukkaL
Copy link
Collaborator Author

JukkaL commented Nov 9, 2014

Also #60, relative imports.

EDIT: implemented (thanks @mason-bially!)

@spkersten
Copy link
Contributor

I'm having trouble reproducing the issue with nested tuple lvalues. When I try running mypy on profile.py, it complains it cannot find all kind of modules.

@JukkaL
Copy link
Collaborator Author

JukkaL commented Dec 1, 2014

profile.py now works for me as well. Odd. I'll try rerunning the whole thing and see if I misreported the issue.

@JukkaL
Copy link
Collaborator Author

JukkaL commented Dec 8, 2014

Complex literals was another missing missing feature (just pushed an implementation by @spkersten, thanks!).

@JukkaL
Copy link
Collaborator Author

JukkaL commented Dec 8, 2014

Another missing feature is yield expressions: #498

@JukkaL
Copy link
Collaborator Author

JukkaL commented Dec 8, 2014

Mypy doesn't support the ... expression: #524

EDIT: implemented by @spkersten

@JukkaL
Copy link
Collaborator Author

JukkaL commented Dec 18, 2014

Calls such as f(x=y, *z) where we have *args and a keyword arg aren't accepted (#153).

Example in the wild: Python-3.2.3, Lib/pickle.py, line 344.

EDIT: fixed

@JukkaL
Copy link
Collaborator Author

JukkaL commented Dec 18, 2014

Escapes within raw string literals are broken: #532

EDIT: fixed

@JukkaL
Copy link
Collaborator Author

JukkaL commented Dec 18, 2014

Another small parser problem: #533

EDIT: fixed

@JukkaL
Copy link
Collaborator Author

JukkaL commented Dec 18, 2014

0b int literals are not supported: #534

EDIT: fixed

@JukkaL
Copy link
Collaborator Author

JukkaL commented Dec 21, 2014

Misc parser issues:

@JukkaL
Copy link
Collaborator Author

JukkaL commented Feb 8, 2015

Also: #573

@JukkaL
Copy link
Collaborator Author

JukkaL commented Feb 22, 2015

Another parser bug (... in relative import): #585

EDIT: fixed

@JukkaL
Copy link
Collaborator Author

JukkaL commented Feb 22, 2015

Source file encodings aren't supported: #522 EDIT: fixed

@JukkaL
Copy link
Collaborator Author

JukkaL commented Feb 22, 2015

Unicode identifiers: #586

from __future__ import * crash: #587 EDIT: fixed

@JukkaL
Copy link
Collaborator Author

JukkaL commented Feb 22, 2015

Another parses crash: #588 EDIT: fixed

@JukkaL JukkaL removed the priority label Mar 29, 2015
@o11c
Copy link
Contributor

o11c commented Aug 8, 2015

I'm currently working on a full reimplementation of the parser. Among other things, it will provide full support for 3.5 syntax (though not semanal). Matmul was easy, but async/await was a huge pain (blame @gvanrossum for accepting the PEP without requiring a __future__. Where is "explicit is better than implicit" now?).

Goals:

  • Efficiently keep track of column information and start/end of tokens.
  • Support exact dialects for each python version (TODO figure out how to preserve annotations with the mypy codec - possibly inject a bogus __future__-like thing into the dialect object?).
  • Support comments as distinct objects, not as whitespace.
  • Produce a stupid-consumer-friendly parse tree, that can trivially reproduce the input file by catting together all the whitespace, comments, and "blackspace".
  • Produce high quality errors, even if there were errors earlier in the file. Usually this means "resume at the end of the current expression, statement, or block"
  • Throw no exceptions during parsing, always produce a parse tree even if the file contains random nonsense.
  • Likely there will always be unsupported features when lowering to mypy.nodes.

Work in progress: https://github.com/o11c/mypy/tree/syntax

Current status:

  • Encoding comment detection: todo (Python's rules for this are nasty, )
  • Span infrastructure: done
  • XML error reports: todo (trivial, but will likely wait until my XML reports PR is merged)
  • Warning level control: deferred (needs thinking - could be implemented in the "front-end" (CodeMap) or "back-end" (ErrorStream))
  • Support for combining characters: todo (visual bug only)
  • Tokens: done
  • Lexer: done
  • Token Trees: done (this is technically a form of parsing, but it is tightly coupled to the lexer's ability to produce meaningful errors and keep going)
  • Parse Trees: done (but might still change as I implement Parsing)
  • Parsing: todo
  • Lowering: todo (only these two steps need to be done)
  • Pretty print tool: todo
  • Syntax-highlighted HTML: todo (probably via XML like my other PR)

@JukkaL
Copy link
Collaborator Author

JukkaL commented Aug 9, 2015

@o11c That is interesting! I'm excited to see what the finished parser looks like.

One important thing to keep in mind is that I don't want the lexer and parser to get significantly slower than they are currently (hopefully we can make them faster).

Also, I tried to to run the lexer from the command line but I couldn't get it to work:

$ python3 -m mypy.syntax.lexer foo.py
Traceback (most recent call last):
  File "/usr/lib/python3.4/runpy.py", line 170, in _run_module_as_main
    "__main__", mod_spec)
  File "/usr/lib/python3.4/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/home/jukka/project/o11c/mypy/mypy/syntax/lexer.py", line 633, in <module>
    main()
  File "/home/jukka/project/o11c/mypy/mypy/syntax/lexer.py", line 626, in main
    rv += _do_input(cm, dialect, a, f.read())
  File "/home/jukka/project/o11c/mypy/mypy/syntax/lexer.py", line 611, in _do_input
    token_file = lexer.pull_file()
  File "/home/jukka/project/o11c/mypy/mypy/syntax/lexer.py", line 451, in pull_file
    assert isinstance(terminator, TokenLineOrTokenBlock)
AssertionError

@o11c
Copy link
Contributor

o11c commented Aug 10, 2015

That exception is odd, the code should not allow it unless something has gone horribly wrong, I have tested the lexer (in tree mode, with 3.4 dialect selected) against all the .py and .pyi files shipped included in the mypy repository, and don't get any errors. This part of the discussion should be continued by filing a bug on my fork with an offending file (and add , repr(terminator) to that line get a fuller error message).


I haven't measured, but the lexer (at least in non-tree mode) should almost certainly be faster, because it offloads the bulk of the work to a single call to the regular expression engine, which is implemented in C, instead of doing a lot of branches and function calls in python. There's still some room for improvement, especially for the super-evil async/await cases until they become full keywords in 3.7.

The parser does have a disadvantage because the parse-tree stage will generate a lot of temporary objects (including for tokens like comma that were previously just dropped) that will be thrown away during the new "lowering" stage. But I expect the cost of semanal to dwarf the cost of parsing though so micro-optimizations for the intermediate parse objects aren't worth it.

@gvanrossum
Copy link
Member

I think this issue has run its course.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants