Skip to content

Schneems/infinite loops are bad #2

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Nov 5, 2020
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
54 changes: 45 additions & 9 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -34,7 +34,7 @@ Or install it yourself as:

## What does it do?

When your code triggers a SyntaxError due to an "expecting end-of-input" in a file, this library fires to narrow down your search to the most likely offending locations.
When your code triggers a SyntaxError due to an "unexpected `end'" in a file, this library fires to narrow down your search to the most likely offending locations.

## Sounds cool, but why isn't this baked into Ruby directly?

Expand All @@ -45,20 +45,56 @@ I would love to get something like this directly in Ruby, but I first need to pr

## How does it detect syntax error locations?

Source code with a syntax error in it can be thought of valid code with one or more invalid chunks in it. With this in mind we can "search" for both invalid and valid chunks of code. This library uses a parser to tell if a given chunk of code is valid in which case it's certainly not the cause of our problem. If it's invalid, then we can test to see if removing that chunk from our file would make the whole thing valid. When that happens, we've narrowed down our search. But...things aren't always so easy.
We know that source code that does not contain a syntax error can be parsed. We also know that code with a syntax error contains both valid code and invalid code. If you remove the invalid code, then we can programatically determine that the code we removed contained a syntax error. We can do this detection by generating small code blocks and searching for which blocks need to be removed to generate valid source code.

Since there can be multiple syntax errors in a document it's not good enough to check individual code blocks, we've got to check multiple at the same time. We will keep creating and adding new blocks to our search until we detect that our "frontier" (which contains all of our blocks) contains the syntax error. After this, we can stop our search and instead focus on filtering to find the smallest subset of blocks that contain the syntax error.

## How is source code broken up into smaller blocks?

By definition source code with a syntax error in it cannot be parsed, so we have to guess how to chunk up the file into smaller pieces. Once we've split up the file we can safely rule out or zoom into a specific piece of code to determine the location of the syntax error. This libary uses indentation and empty lines to make guesses about what might be a "block" of code. Once we've got a chunk of code, we can test it.

- If the code parses, it cannot be the cause of our syntax error. We can remove it from our search
- If the code does not parse, it may be the cause of the error, but we also might have made a bad guess in splitting up the source
- If we remove that chunk of code from the document and that allows the whole thing to parse, it means the syntax error was for sure in that location.
- Otherwise, it could mean that either there are multiple syntax errors or that we have a bad guess and need to expand our search.
At the end of the day we can't say where the syntax error is FOR SURE, but we can get pretty close. It sounds simple when spelled out like this, but it's a very complicated problem. Even when code is not correctly indented/formatted we can still likely tell you where to start searching even if we can't point at the exact problem line or location.

## Complicating concerns

The biggest issue with searching for syntax errors stemming from "unexpected end" is that while the `end` in the code triggered the error, the problem actually came from somewhere else. Effectively these syntax errors always involve 2 or more lines of code, but one of those lines (without the end) may be syntatically valid on its own. For example:

```
1 Foo.call
2
3 puts "lol
4 end
```

Here there's a missing `do` after `Foo.call` however `Foo.call` by itself is perfectly valid ruby code syntax. We don't find the error until we remove the `end` even though the problem is caused on the first line. This means that if our clode blocks aren't sliced totally correctly the error output might just point at:

```
4 end
```

Instead of:

```
1 Foo.call
4 end
```

Here's a similar issue, but with more `end` lines in the code to demonstrate. The same line of code causes the issue:

```
1 it "foo" do
2 Foo.call
3
4 puts "lol
5 end
6 end
```

At the end of the day we can't say where the syntax error is FOR SURE, but we can get pretty close. It sounds simple when spelled out like this, but it's a very complicated problem.
In this example we could make this code valid by either the end on line 5 or 6. As far as the program is concerned it's effectively got one too many ends and it won't care which you remove. The "correct" line to remove would be for the inner block, but it's hard to know this programatically. Whitespace can help guide us, but it's still a guess.

This one person on twitter told me it's "not possible".
One of the biggest challenges then is not finding code that can be removed to make the program syntatically correct (just remove an `end` and it works) but to also provide a reasonable guess as to the "pair" line that would have otherwise required an end (such as a `do` or a `def`).

## How does this gem know when a syntax error occured?
## How does this gem know when a syntax error occured in my code?

While I wish you hadn't asked: If you must know, we're monkey-patching require. It sounds scary, but bootsnap does essentially the same thing and we're way less invasive.

Expand Down
4 changes: 1 addition & 3 deletions lib/syntax_error_search/code_frontier.rb
Original file line number Diff line number Diff line change
Expand Up @@ -35,9 +35,7 @@ def holds_all_syntax_errors?(block_array = @frontier)
def pop
return nil if empty?

if generate_new_block?
self << next_block
end
self << next_block unless @indent_hash.empty?

return @frontier.pop
end
Expand Down
1 change: 1 addition & 0 deletions lib/syntax_error_search/code_search.rb
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,7 @@ def call
end

@invalid_blocks.concat(frontier.detect_invalid_blocks )
@invalid_blocks.sort_by! {|block| block.starts_at }
self
end
end
Expand Down
90 changes: 57 additions & 33 deletions spec/unit/code_search_spec.rb
Original file line number Diff line number Diff line change
@@ -1,40 +1,64 @@

require_relative "../spec_helper.rb"

module SyntaxErrorSearch
RSpec.describe CodeSearch do
it "does not go into an infinite loop" do
skip("infinite loop")
search = CodeSearch.new(<<~EOM)
Foo.call
def foo
puts "lol"
puts "lol"
end
end
EOM
search.call

expect(search.invalid_blocks.join).to eq(<<~EOM)
end
EOM
end

it "handles mis-matched-indentation-but-maybe-not-so-well" do
skip("wip")
search = CodeSearch.new(<<~EOM)
Foo.call
def foo
puts "lol"
puts "lol"
end
end
EOM
search.call

expect(search.invalid_blocks.join).to eq(<<~EOM)
end
EOM
# For code that's not perfectly formatted, we ideally want to do our best
# These examples represent the results that exist today, but I would like to improve upon them
describe "needs improvement" do
describe "mis-matched-indentation" do
it "stacked ends " do
search = CodeSearch.new(<<~EOM)
Foo.call
def foo
puts "lol"
puts "lol"
end
end
EOM
search.call

# Does not include the line with the error Foo.call
expect(search.invalid_blocks.join).to eq(<<~EOM)
def foo
end
end
EOM
end

it "extra space before end" do
search = CodeSearch.new(<<~EOM)
Foo.call
def foo
puts "lol"
puts "lol"
end
end
EOM
search.call

# Does not include the line with the error Foo.call
expect(search.invalid_blocks.join).to eq(<<~EOM.indent(3))
end
EOM
end

it "missing space before end" do
search = CodeSearch.new(<<~EOM)
Foo.call
def foo
puts "lol"
puts "lol"
end
end
EOM
search.call

# Does not include the line with the error Foo.call
expect(search.invalid_blocks.join).to eq(<<~EOM)
end
EOM
end
end
end

it "returns syntax error in outer block without inner block" do
Expand Down