Skip to content

~1.15x faster perf w/ CodeFrontier insertion sort #104

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Nov 4, 2021

Conversation

schneems
Copy link
Collaborator

@schneems schneems commented Nov 4, 2021

Perf difference

Before: 0.230749 0.005489 0.236238 ( 0.237043)
After: 0.197075 0.005009 0.202084 ( 0.202950)

Profile code

To generate profiler output, run:

$ DEBUG_PERF=1 bundle exec rspec spec/integration/dead_end_spec.rb

See the readme for more details. You can do that against the commit before this one, and this one to see the difference.

Before sha: 948ee5c
After sha: b9ff7bd

How I found the issue

Using qcachegrind on Mac, generating an output with RubyProf::CallTreePrinter I saw:

That a lot of time was being spent in CodeFrontier#<<, specifically in the sort! function and CodeBlock<=>.

The fix

Because we control insertion into the array, we know that it is always sorted. We can leverage this info to place new elements in at the right location instead of placing them and then re-sorting the whole array just to place one element.

I experimented with iterating from front to back, and in reverse. I found that in my test case reverse took 7,373 comparisons while forwards took 3,130. Both represent large savings.

After changing this logic:

You can see sort! no longer shows up. Instead, we're seeing reject! as a hotspot (though only taking up 8.4% time instead of 54% time seen in sort).

Strangely such a large bump in results only yielded ~1.15x faster overall performance change. It's still worth it, but not in line with what I expected from the tools.

## Perf difference

Before:  0.230749   0.005489   0.236238 (  0.237043)
After:  0.197075   0.005009   0.202084 (  0.202950)

## Profile code

To generate profiler output, run:

```
$ DEBUG_PERF=1 bundle exec rspec spec/integration/dead_end_spec.rb
```

See the readme for more details. You can do that against the commit before this one, and this one to see the difference.


## How I found the issue


Using `qcachegrind` on Mac, generating an output with `RubyProf::CallTreePrinter` I saw:

![](https://www.dropbox.com/s/xian4mbsgvi8xr7/Screen%20Shot%202021-11-03%20at%203.00.05%20PM.png?raw=1)

That a lot of time was being spent in `CodeFrontier#<<`, specifically in the `sort!` function and `CodeBlock<=>`.

## The fix

Because we control insertion into the array, we know that it is always sorted. We can leverage this info to place new elements in at the right location instead of placing them and then re-sorting the whole array just to place one element.

I experimented with iterating from front to back, and in reverse. I found that in my test case reverse took 7,373 comparisons while forwards took 3,130. Both represent large savings.

After changing this logic:

![](https://www.dropbox.com/s/oba4mmrmlndvkac/Screen%20Shot%202021-11-03%20at%202.58.40%20PM.png?raw=1)

You can see `sort!` no longer shows up. Instead, we're seeing `reject!` as a hotspot (though only taking up 8.4% time instead of 54% time seen in sort).

Strangely such a large bump in results only yielded ~1.15x faster overall performance change. It's still worth it, but not in line with what I expected from the tools.
@schneems schneems merged commit 050a6be into main Nov 4, 2021
@schneems schneems deleted the schneems/insertion-sort branch November 4, 2021 02:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant