Skip to content

error reporting includes columns #2163

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 17 commits into from
Sep 25, 2016
Merged

Conversation

bavardage
Copy link
Contributor

fixes #1216

@bavardage
Copy link
Contributor Author

(the last three commits are clean-ups from the rebase, I can interactive-rebase those away if desired)

@bavardage
Copy link
Contributor Author

also, not sure if there's any CI verification? but

bduffield05-mac:mypy bduffield$ ./runtests.py
PARALLEL 2
SUMMARY  212 tasks selected
SUMMARY  all 212 tasks and 3525 tests passed
*** OK ***

Copy link

@gracew gracew left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(as someone totally new to this codebase) this generally makes sense to me, aside from a couple questions.

i = 0
while i < len(errors):
dup = False
j = i - 1
while (j >= 0 and errors[j][0] == errors[i][0] and
errors[j][1] == errors[i][1]):
if errors[j] == errors[i]:
if (errors[j][3] == errors[i][3] and
errors[j][4] == errors[i][4]): # ignore column
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't the 0th and first elements need to be compared too?

also, is ignoring the column necessary b/c of the TODO in TypeConverter#generic_visit in fastparse.py?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

they're taken care of because we're within the while loop.

Ignoring the column here, so that we only get the first instance of this particular error within the line. This logic could be tweaked, for sure. Maybe we actually do want to expose all instances of a given error. E.g.

  • invalid type at column 1
  • invalid type at column 5

@@ -740,15 +747,18 @@ def lex_break(self) -> None:
last_tok.string += self.pre_whitespace + s
self.i += len(s)
self.line += 1
self.column = 0
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why the assignment of 0? the value of i, or 1 character beyond the end of the string seems to make sense for describing a line break

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so we're incrementing the line here, so have to reset column back to 0.
in this class, self.column is keeping track of the column we're at (i.e. within the line) whereas self.i keeps track of the position within the overall string.

@@ -390,6 +405,10 @@ def set_line(self, target: Union[Token, Node, int]) -> Node:
self.initialization_statement.set_line(self.line)
self.initialization_statement.lvalues[0].set_line(self.line)

def set_column(self, target: Union[Token, Node, int]) -> Node:
super().set_column(target)
return self
Copy link

@gracew gracew Sep 21, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

inconsistent to return self? do we need to do the same stuff with self.initializer, self.variable, and self.initialization_statement as above in set_line?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

perhaps? I just matched the behaviour of set_line. Not sure what the philosophy is generally regarding chaining.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just matching behaviour of set_line - not sure of philosophy around chaining generally

there are some places in the code where already we do
some_thing = SomeNode(blah).set_line(1)
so with this they become
some_thing = SomeNode(blah).set_line(1).set_column(2)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

perhaps this should be removed for now (until I figure out (in FLUP?) how the delegation should work

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find return self an anti-pattern that encourages unneeded cleverness and confuses side effects with functions.

Maybe set_line() should get an optional column method? That would also beautifully allow you to call it with a Token or Node and copy both line and column from there.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Also, what's FLUP? Googling was inconclusive. :-)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(And, if you can remove the return self from set_line() and fix the fallout, if any, that would be great.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FLUP oops.. :P follow-up, as in - in a future PR

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and yep, will combine them/see how bad the fallout is if we take out the return self (think it shouldn't be that bad at all, from the places I've seen set_line so far)

@@ -457,6 +476,10 @@ def set_line(self, target: Union[Token, Node, int]) -> Node:
arg.set_line(self.line)
return self

def set_column(self, target: Union[Token, Node, int]) -> Node:
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need to handle self.arguments?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep, probably - was going to investigate that in future

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

off the top of my head I can't remember how tokens -> argument node happens, it's possible they already have the column (and actually from editing the tests, my gut says they do)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

perhaps this should be removed for now, until I figure out how (in FLUP?) to set columns for args

@gvanrossum
Copy link
Member

Thanks! This PR clearely represents a significant contribution.

Normally the tests run automatically on Travis-CI. I don't know why they didn't run this time but I suspect the sheer size of the diff might explain that.

I find such a huge diff (66 files!) hard to review myself. Could you perhaps come up with a way to reduce most of the "trivial" changes e.g. the endless "E:" -> "E:0:" in the tests and the addition of of ", 0" in the errors.report() calls? (Maybe something with argument default values?)

@bavardage
Copy link
Contributor Author

Not sure if there have been contributions from a fork repo before - not familiar with travis, but for circle (https://circleci.com/) there's some explicit setting to run builds for PRs from forks.

Yep, the structure of the tests (that the tests test error messages and this changes the error messages, as a feature :P) makes this a very scary PR.

Some thoughts of approaches here:

  • I could split off some of this into independent PRs
    • the first four commits of this PR are (I believe) valuable standalone
    • I can additionally chop out the non-error-reporting changes from the 5th commit (7f38445)
    • at this point we have the column information in the Nodes, but no changes to error messages. I could write unit tests (i.e. not checking output) to verify that this is good.
    • we'd at some point have some fairly scary-large change when the error reporting change actually happens, but that would be all it was
  • I could modify the way the test-checking works, have the 'Expected' not be compared absolutely but instead as a pattern.
    • The existing E: blah could produce the pattern file:line:(\d+:)? error: blah
    • Then a sufficient subset of the test assertions could be modified (but a lot less than 1800 of them or whatever it was) such that we were comfortable about the error messages.
    • There would still be the fairly large change from updating all of the tests that assert on the output directly (in an output section) rather than using the # E: .. syntax

@refi64
Copy link
Contributor

refi64 commented Sep 22, 2016

Travis is supposed to always run on PRs. @gvanrossum What if you try closing-and-reopening this PR? That sometimes works for me.

@gvanrossum
Copy link
Member

OK, trying that...

@gvanrossum gvanrossum closed this Sep 23, 2016
@gvanrossum gvanrossum reopened this Sep 23, 2016
@refi64
Copy link
Contributor

refi64 commented Sep 23, 2016

Looks like they're running now.

@gvanrossum
Copy link
Member

And it's failed. Somehow GitHub then removed the Travis CI results, so here's the link:
https://travis-ci.org/python/mypy/jobs/162288057

And indeed the breakage looks like it's been caused by some new tests that were added recently.

Maybe we could add a command-line flag that must be set to enable column numbers? That way only a few tests would need to be updated (say, tests just for this feature).

Finally, I'm getting crashes when running mypy --fast-parser mypy; a quick pdb session suggests the problem is caused by a Return node not having a col_offset attribute, but that may well be the tip of the iceberg.

I would really like to merge this, but I want to see a smaller diff. Please?

@gvanrossum
Copy link
Member

(Ih wait, now the test results are back. I wonder if it just takes a really long time for some reason?)

@bavardage
Copy link
Contributor Author

Command line flag sounds reasonable - will try to get to it this weekend.

Will also look into the error you're seeing with --fast-parser :(

@bavardage
Copy link
Contributor Author

tests passing! diff now ~ 1/10th of the old size
currently working on some tests explicitly for the column reporting...

else:
assert isinstance(typ, ast35.Expression)
return TypeConverter(line=line).visit(typ.body)


def with_line(f: Callable[['ASTConverter', T], U]) -> Callable[['ASTConverter', T], U]:
def with_line_and_column(f: Callable[['ASTConverter', T], U]) -> Callable[['ASTConverter', T], U]:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you rename this back? That would reduce the diff size a bit more. I know the name would not be completely covering the semantics but then again the name isn't all that intuitive either way. I don't think the churn caused by the rename is worth it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done - could totally change the name at some point in the future in a single-purpose PR if it becomes confusing

@gvanrossum
Copy link
Member

(But thanks for reducing the diff size!)

main:4: error: Incompatible types in assignment (expression has type Callable[[], str], variable has type Callable[[], int])
main:4: error: Incompatible return value type (got "str", expected "int")
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the churn here (and in some other *.test files) is because now errors on earlier columns come first

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's fine!

Ben Duffield added 2 commits September 24, 2016 22:16
additionally, set_line no longer returns self (and thus do associated cleanup)
@@ -95,7 +95,8 @@ def with_line(f: Callable[['ASTConverter', T], U]) -> Callable[['ASTConverter',
@wraps(f)
def wrapper(self: 'ASTConverter', ast: T) -> U:
node = f(self, ast)
node.set_line(ast.lineno)
# some ast nodes (e.g. Return) do not come with col_offset
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Heh, I figured out why (by putting a pdb trap here). It's because visit_lambda() synthesizes a Return node and sets only the lineno. Once you fix that I think the getattr() call is no longer necessary (and it shouldn't be!).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sweet, yep - works

@@ -804,7 +805,8 @@ def visit_raw_str(self, s: str) -> Type:
return parse_type_comment(s.strip(), line=self.line)

def generic_visit(self, node: ast35.AST) -> None:
raise TypeCommentParseError(TYPE_COMMENT_AST_ERROR, self.line)
raise TypeCommentParseError(TYPE_COMMENT_AST_ERROR, self.line,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unfortunately I think we still need the getattr here... not all AST nodes come with column info, only the ones deriving from stmt, expr, etc...

(https://github.com/python/typeshed/blob/master/third_party/3/typed_ast/ast35.pyi)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's fine!

Copy link
Member

@gvanrossum gvanrossum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm going to merge this now, unless some quick tests reveal more issues. Thank you so much for your work on this PR, and for being flexible in response to my review comments!

@@ -804,7 +805,8 @@ def visit_raw_str(self, s: str) -> Type:
return parse_type_comment(s.strip(), line=self.line)

def generic_visit(self, node: ast35.AST) -> None:
raise TypeCommentParseError(TYPE_COMMENT_AST_ERROR, self.line)
raise TypeCommentParseError(TYPE_COMMENT_AST_ERROR, self.line,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's fine!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

mypy error reporting should include column position
4 participants