Skip to content

[vec] growth-strategy optimization #45434

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 12 commits into from
Closed

Conversation

gnzlbg
Copy link
Contributor

@gnzlbg gnzlbg commented Oct 21, 2017

This commits introduces a growth-strategy optimization for RawVec (and indirectly Vec and VecDeque). It introduces a method grow_by(capacity_increase) that tells the RawVec by how much the user would like to increase its capacity (e.g. 1 on vec.push(val)). It then uses following growth strategy:

  • If the RawVec is empty: it allocates at least 64 bytes.

  • If the RawVec is not empty:

    • it uses a growth-factor of 2 for small (<4096 bytes) and large (>4096*32 bytes) vectors, and 1.5 otherwise
    • it uses this growth factor to compute a suitable capacity
    • it takes the max between this capacity and the desired capacity increase (e.g. by using the desired capacity increase of self.cap one can force this method to double the capacity)
    • it passes the result to the usable_size function of the allocator to obtain the max usable size

The commit also refactors the logic of Vec's growth test into a is_full function, and uses the core::intrinsic::unlikely on the result of both Vec's and VecDeque's test to indicate that growth is an unlikely event. The grow_by function is not #[inline(never)] but #[inline] + #[cold]. That is, the function can be inlined, but the function author expects it to not be called often. Combined with the unlikely annotation on the call site, the compiler should have enough information to decide when eliding the call is worth it.

@rust-highfive
Copy link
Contributor

Thanks for the pull request, and welcome! The Rust team is excited to review your changes, and you should hear from @aturon (or someone else) soon.

If any changes to this PR are deemed necessary, please add them as extra commits. This ensures that the reviewer can see what has changed since they last reviewed the code. Due to the way GitHub handles out-of-date commits, this should also make it reasonably obvious what issues have or haven't been addressed. Large or tricky changes may require several passes of review and changes.

Please see the contribution instructions for more information.

@kennytm kennytm added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Oct 21, 2017
@leonardo-m
Copy link

Do you have benchmark that show before/after?

@bluss
Copy link
Member

bluss commented Oct 21, 2017

I think we're going to request benchmarks from bors on this PR. Other benchmarks are good as well.

// `from_size_align_unchecked`).
let new_cap = Self::suitable_capacity(self.cap, capacity_increase);
let new_size = new_cap * elem_size;
let new_layout = unsafe { Layout::from_size_align_unchecked(new_size, cur.align()) };
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These lines are too long.

[00:04:18] tidy error: /checkout/src/liballoc/raw_vec.rs:262: line longer than 100 chars
[00:04:18] tidy error: /checkout/src/liballoc/raw_vec.rs:264: line longer than 100 chars
[00:04:18] tidy error: /checkout/src/liballoc/raw_vec.rs:354: trailing whitespace
[00:04:18] tidy error: /checkout/src/liballoc/raw_vec.rs:378: line longer than 100 chars
[00:04:18] tidy error: /checkout/src/liballoc/raw_vec.rs:380: line longer than 100 chars
[00:04:18] tidy error: /checkout/src/liballoc/raw_vec.rs:457: line longer than 100 chars

Copy link
Contributor Author

@gnzlbg gnzlbg Oct 21, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kennytm I've fixed these except the two links, AFAIK rustdoc does not support splitting links over multiple lines: how should I handle these?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gnzlbg You could use the syntax:

blah blah blah [link][x] blah blah blah

[x]: https://www.example.com/extremely/long/link?can=use&syntax=like#this

Copy link
Contributor Author

@gnzlbg gnzlbg Oct 22, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kenny but what happens when the link is longer than 100 chars ? How do I break it so that the docs still work ? (We also have this problem in stdsimd, no solution AFAIK).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gnzlbg tidy will ignore long lines in that form. Not sure about rustfmt.

@gnzlbg
Copy link
Contributor Author

gnzlbg commented Oct 21, 2017

@leonardo-m I have some synthetic benchmarks for the medium sized-vectors but I am more interested into seeing if this is an overall win. So I'd like to request some benchmarks on rustc with the proposed strategy. And then do that again but without special casing large vectors (that is, using 1.5x growth factor for large vectors as well). I only have a laptop and its hard for me to run all of servo's benchmarks but before merging this I'd like to benchmark servo as well.

@Mark-Simulacrum
Copy link
Member

I'm happy to run perf.rlo against this PR once Travis passes, just ping me. That won't include servo benchmarks, but if we see a net win in rustc then we can look into running against servo as well.

@@ -1754,7 +1755,7 @@ impl<T> VecDeque<T> {
fn grow_if_necessary(&mut self) {
if self.is_full() {
let old_cap = self.cap();
self.buf.double();
self.buf.grow_by(1);
Copy link
Member

@bluss bluss Oct 21, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah like travis points out, VecDeque relies on a power of two capacity. So it should continue using double.

@@ -1754,7 +1755,7 @@ impl<T> VecDeque<T> {
fn grow_if_necessary(&mut self) {
if self.is_full() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like the right place to use unlikely, if anywhere (and not inside the is_full method)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bluss indeed, see #45440.

@kennytm
Copy link
Member

kennytm commented Oct 22, 2017

Failed to build stage1-std on CI:

[00:19:10] error: internal compiler error: unexpected panic
[00:19:10]
[00:19:10] note: the compiler unexpectedly panicked. this is a bug.
[00:19:10]
[00:19:10] note: we would appreciate a bug report: https://github.com/rust-lang/rust/blob/master/CONTRIBUTING.md#bug-reports
[00:19:10]
[00:19:10] note: rustc 1.22.0-dev running on x86_64-unknown-linux-gnu
[00:19:10]
[00:19:10] thread 'rustc' panicked at 'assertion failed: self.cap().count_ones() == 1', /checkout/src/liballoc/vec_deque.rs:364:8
[00:19:10] note: Run with `RUST_BACKTRACE=1` for a backtrace.
[00:19:10]
[00:19:10] error: Could not compile `compiler_builtins`.

@kennytm kennytm added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. labels Oct 22, 2017
@kennytm
Copy link
Member

kennytm commented Oct 22, 2017

@bors try

Prepare for perf.

@bors
Copy link
Collaborator

bors commented Oct 22, 2017

⌛ Trying commit 0963f11 with merge 4e26722...

bors added a commit that referenced this pull request Oct 22, 2017
[vec] growth-strategy optimization

This commits introduces a growth-strategy optimization for `RawVec` (and indirectly `Vec` and `VecDeque`). It introduces a method `grow_by(capacity_increase)` that tells the `RawVec` by how much the user would like to increase its capacity (e.g. `1` on `vec.push(val)`). It then uses following growth strategy:

- If the `RawVec` is empty: it allocates at least 64 bytes.

- If the `RawVec` is not empty:
  - it uses a growth-factor of 2 for small (<4096 bytes) and large (>4096*32 bytes) vectors, and 1.5 otherwise
  - it uses this growth factor to compute a suitable capacity
  - it takes the max between this capacity and the desired capacity increase (e.g. by using the desired capacity increase of `self.cap` one can force this method to double the capacity)
  - it passes the result to the `usable_size` function of the allocator to obtain the max usable size

The commit also refactors the logic of `Vec`'s growth test into a `is_full` function, and uses the `core::intrinsic::unlikely` on the result of both `Vec`'s and `VecDeque`'s test to indicate that growth is an `unlikely` event. The `grow_by` function is not `#[inline(never)]` but `#[inline] + #[cold]`. That is, the function can be inlined, but the function author expects it to not be called often. Combined with the `unlikely` annotation on the call site, the compiler should have enough information to decide when eliding the call is worth it.
@bors
Copy link
Collaborator

bors commented Oct 22, 2017

☀️ Test successful - status-travis
State: approved= try=True

@kennytm
Copy link
Member

kennytm commented Oct 22, 2017

Ping @Mark-Simulacrum, build is ready for perf.rlo.

@Mark-Simulacrum
Copy link
Member

Looks like a rather large increase in memory usage, which probably explains the otherwise somewhat minor regressions in wall time and instructions: http://perf.rust-lang.org/compare.html?start=548109827454f759e9c88b4a1f724c4cdbff8bfa&end=4e26722f7780a52602644592cbc2bb8e30ad0ff0&stat=instructions%3Au.

@gnzlbg
Copy link
Contributor Author

gnzlbg commented Oct 22, 2017

@Mark-Simulacrum yes, this was a spectacular fail. There is a bug where I set the capacity to usable_size instead of usable_size / elem_size... I am actually surprised the memory usage wasn't bigger :D

I've fixed it but am adding some tests for suitable_capacity and grow_by so that this doesn't happen again.

@Mark-Simulacrum
Copy link
Member

Let me know when you're ready for another perf run.

}
};
self.ptr = uniq;
self.cap = new_cap;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Whole function) Tightly cuddled unsafe { } doesn't make the code just outside of that "safe": it all needs to add up — every calculation and comparison in this function is critical for memory safety.

I'd prefer to keep the original style of a large unsafe block around it all. In particular, inline unsafe { ... } expressions are not really idiomatic.

This function needs to check for overflow in the capacity number. It's a harder job now that it no longer doubles and we can't rely on the isize::MAX trick documented in the double function.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the capacity number overflows in suitable_capacity then it will panic. If the elem_size * capacity operation overflows then that will panic. Or what am I missing?

Copy link
Contributor Author

@gnzlbg gnzlbg Oct 22, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean, the new capacity can be bigger than isize::MAX on 32-bit archs, and the usable size of the allocator might make it even bigger, but if for some reason none of these operations panic before an actual allocation is attempted there is an alloc_guard check that will make it panic anyways.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see where the check or explicit panic is for that? I mean, plain a + b panics on overflow only when debug assertions are enabled. We need overflow checks that are active all the time.

Copy link
Contributor Author

@gnzlbg gnzlbg Oct 22, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought unsigned integer wraparound always panic'ed... does it only panic with debug_assert enabled? I'll need to add the checks then.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, they are not there in release mode. Old version of this uses quite elaborate arguments about isize::MAX and maximum allocation size, and usize::MAX to avoid most of those checks.

Maybe the API can be changed so that it doesn't need to take into account exactly arbitrary growth numbers? Otherwise we're just reimplementing reserve.

Look at reserve for what it does for overflow checks.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is grow_by a new function, and how is it different from reserve? 😄

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bluss indeed, I think I should leave double and double_in_place as they were, add this optimizations to reserve and then make Vec use reserve instead of double on insertion/push. This is basically the reason why I wanted a mentor.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The overflow arguments apply to suitable_capacity as well. if cap < isize::MAX, then cap * 1.5 and cap * 2 cannot overflow usize.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As long as the additional capacity is a parameter to a safe function, that's an arbitrary value and it needs to be checked, not trusted.

@@ -235,7 +235,66 @@ impl<T, A: Alloc> RawVec<T, A> {
}
}

/// Doubles the size of the type's backing allocation. This is common enough
/// Grows the vector capacity by `capacity_increase`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doc should probably say it's growing for at least this capacity_increase, adjusted by the growth policy.

A postcondition is that if the function returns, at least capacity + capacity_increase memory is allocated and that sum does not wrap around.

// maintained by `alloc_guard`; the alignment will never be too
// large as to "not be specifiable" (so we can use
// `from_size_align_unchecked`).
let new_cap = Self::suitable_capacity(self.cap, capacity_increase);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A wraparound/overflow check is needed here, or in that method, or somewhere close. Or an argument why it's not needed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(The old code used the argument that the new capacity was <= 2 * old_capacity)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new capacity was always 2 * old_capacity.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll check this. I am adding some tests close/at the i32::MAX boundary for RawVec<u8>. If there are any problems all the 32bit architectures should fail these tests.

@kennytm kennytm added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Oct 22, 2017
@gnzlbg
Copy link
Contributor Author

gnzlbg commented Oct 22, 2017

Everytime I change something in raw_vec I need to compile rustc stage0 to run the tests, is there a faster way than that? I am running ./x.py test src/liballoc --stage 1.

@kennytm
Copy link
Member

kennytm commented Oct 22, 2017

@gnzlbg Try to add --keep-stage 1 as well.

@shepmaster shepmaster added the T-libs-api Relevant to the library API team, which will review and decide on the PR/issue. label Nov 11, 2017
@shepmaster
Copy link
Member

Reassigning from aturon; looks like bluss has some context.

r? @bluss

@rust-highfive rust-highfive assigned bluss and unassigned aturon Nov 11, 2017
@bors
Copy link
Collaborator

bors commented Nov 12, 2017

☔ The latest upstream changes (presumably #45902) made this pull request unmergeable. Please resolve the merge conflicts.

@kennytm
Copy link
Member

kennytm commented Nov 22, 2017

Hi @gnzlbg, as that #45514 has been merged (which contains #45512), can this PR be resumed?

@gnzlbg
Copy link
Contributor Author

gnzlbg commented Nov 22, 2017

@kennytm Not yet. The approach @arthurprs suggested was to use the _excess API of the Alloc trait, but it is at least missing alloc_zeroed_excess and grow_in_place_excess which are needed here to implement reserve on top of it. I'll try to get to it.

@kennytm
Copy link
Member

kennytm commented Nov 22, 2017

Okay thanks for the update :)

@arthurprs
Copy link
Contributor

We can probably shoehorn usable_size after each of those for now and try to get more mem usage metrics around the growth factor.

@gnzlbg
Copy link
Contributor Author

gnzlbg commented Nov 22, 2017

@arthurprs we can do that in parallel.

I (personally) prefer to:

  • improve the Alloc API and jemallocs implementation of it:

    • alloc_zeroed_excess is missing
    • grow_in_place_excess/shrink_in_place_excess are missing
    • grow_in_place/shrink_in_place currently gets the excess for free (it is returned by xallocx) and throws it away
    • grow_in_place currently returns false (that is, growing failed) if xallocx returns more memory than requested, which does not make sense to me (am I missing something here)?
    • shrink_in_place returns false if the new layout does not exactly fit the old layout, that is, if the allocation can be shrink, but not to the exact value the user requested (but e.g., to a little bit more, because xallocx returns the real size), then it fails.
  • try again to implement Vec and String on top of the Alloc API properly handling the excess with the current growth policy -> measure memory consumption in e.g. servo

  • try to improve the current growth policy -> measure memory consumption in e.g. servo

@arthurprs
Copy link
Contributor

It's sad, but I don't think we'll get all methods returning excess anytime soon. 😞

Rest looks reasonable to me. Let me know if you need help or just wanna discuss things.

@aidanhs
Copy link
Member

aidanhs commented Nov 30, 2017

Hi @gnzlbg, just checking in to see how things are going! Are there PRs/issues to follow the progress of the additional excess methods you want?

@gnzlbg
Copy link
Contributor Author

gnzlbg commented Nov 30, 2017

@aidanhs a lot has happened in the jemallocator crate, I have a PR more or less ready that adds the methods to the Alloc trait and implements specialized versions of them for liballoc_jemalloc, but I am finishing with using those in Vec and wanted to do so for String as well. Once that is done and merged then this PR can proceed with changing the allocation strategy. I can ping you in those PRs if you want when I send them.

@aidanhs
Copy link
Member

aidanhs commented Nov 30, 2017

If you just keep this PR updated with the latest that'd be great :)

@shepmaster
Copy link
Member

Hey @gnzlbg! It's been over a week since we last heard from you on this issue. You have a number of merge conflicts which need to be addressed. If this PR is blocked on other PRs please let us know.

@alexcrichton
Copy link
Member

Ok it's been a few extra weeks now, I'm going to close this due to inactivity but we can of course resubmit with a rebase and such!

@kennytm kennytm added S-inactive Status: Inactive and waiting on the author. This is often applied to closed PRs. and removed S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. labels Jan 4, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
S-inactive Status: Inactive and waiting on the author. This is often applied to closed PRs. T-libs-api Relevant to the library API team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.