[vec] growth-strategy optimization #45434

gnzlbg · 2017-10-21T16:35:37Z

This commits introduces a growth-strategy optimization for RawVec (and indirectly Vec and VecDeque). It introduces a method grow_by(capacity_increase) that tells the RawVec by how much the user would like to increase its capacity (e.g. 1 on vec.push(val)). It then uses following growth strategy:

If the RawVec is empty: it allocates at least 64 bytes.
If the RawVec is not empty:
- it uses a growth-factor of 2 for small (<4096 bytes) and large (>4096*32 bytes) vectors, and 1.5 otherwise
- it uses this growth factor to compute a suitable capacity
- it takes the max between this capacity and the desired capacity increase (e.g. by using the desired capacity increase of self.cap one can force this method to double the capacity)
- it passes the result to the usable_size function of the allocator to obtain the max usable size

The commit also refactors the logic of Vec's growth test into a is_full function, and uses the core::intrinsic::unlikely on the result of both Vec's and VecDeque's test to indicate that growth is an unlikely event. The grow_by function is not #[inline(never)] but #[inline] + #[cold]. That is, the function can be inlined, but the function author expects it to not be called often. Combined with the unlikely annotation on the call site, the compiler should have enough information to decide when eliding the call is worth it.

rust-highfive · 2017-10-21T16:35:42Z

Thanks for the pull request, and welcome! The Rust team is excited to review your changes, and you should hear from @aturon (or someone else) soon.

If any changes to this PR are deemed necessary, please add them as extra commits. This ensures that the reviewer can see what has changed since they last reviewed the code. Due to the way GitHub handles out-of-date commits, this should also make it reasonably obvious what issues have or haven't been addressed. Large or tricky changes may require several passes of review and changes.

Please see the contribution instructions for more information.

leonardo-m · 2017-10-21T19:03:52Z

Do you have benchmark that show before/after?

bluss · 2017-10-21T19:20:33Z

I think we're going to request benchmarks from bors on this PR. Other benchmarks are good as well.

kennytm · 2017-10-21T19:25:16Z

src/liballoc/raw_vec.rs

+                // `from_size_align_unchecked`).
+                let new_cap = Self::suitable_capacity(self.cap, capacity_increase);
+                let new_size = new_cap * elem_size;
+                let new_layout = unsafe { Layout::from_size_align_unchecked(new_size, cur.align()) };


These lines are too long.

[00:04:18] tidy error: /checkout/src/liballoc/raw_vec.rs:262: line longer than 100 chars [00:04:18] tidy error: /checkout/src/liballoc/raw_vec.rs:264: line longer than 100 chars [00:04:18] tidy error: /checkout/src/liballoc/raw_vec.rs:354: trailing whitespace [00:04:18] tidy error: /checkout/src/liballoc/raw_vec.rs:378: line longer than 100 chars [00:04:18] tidy error: /checkout/src/liballoc/raw_vec.rs:380: line longer than 100 chars [00:04:18] tidy error: /checkout/src/liballoc/raw_vec.rs:457: line longer than 100 chars

@kennytm I've fixed these except the two links, AFAIK rustdoc does not support splitting links over multiple lines: how should I handle these?

@gnzlbg You could use the syntax:

blah blah blah [link][x] blah blah blah [x]: https://www.example.com/extremely/long/link?can=use&syntax=like#this

@kenny but what happens when the link is longer than 100 chars ? How do I break it so that the docs still work ? (We also have this problem in stdsimd, no solution AFAIK).

@gnzlbg tidy will ignore long lines in that form. Not sure about rustfmt.

gnzlbg · 2017-10-21T21:25:05Z

@leonardo-m I have some synthetic benchmarks for the medium sized-vectors but I am more interested into seeing if this is an overall win. So I'd like to request some benchmarks on rustc with the proposed strategy. And then do that again but without special casing large vectors (that is, using 1.5x growth factor for large vectors as well). I only have a laptop and its hard for me to run all of servo's benchmarks but before merging this I'd like to benchmark servo as well.

Mark-Simulacrum · 2017-10-21T22:13:12Z

I'm happy to run perf.rlo against this PR once Travis passes, just ping me. That won't include servo benchmarks, but if we see a net win in rustc then we can look into running against servo as well.

bluss · 2017-10-21T22:24:58Z

src/liballoc/vec_deque.rs

@@ -1754,7 +1755,7 @@ impl<T> VecDeque<T> {
    fn grow_if_necessary(&mut self) {
        if self.is_full() {
            let old_cap = self.cap();
-            self.buf.double();
+            self.buf.grow_by(1);


Ah like travis points out, VecDeque relies on a power of two capacity. So it should continue using double.

bluss · 2017-10-21T22:25:46Z

src/liballoc/vec_deque.rs

@@ -1754,7 +1755,7 @@ impl<T> VecDeque<T> {
    fn grow_if_necessary(&mut self) {
        if self.is_full() {


This seems like the right place to use unlikely, if anywhere (and not inside the is_full method)

@bluss indeed, see #45440.

kennytm · 2017-10-22T06:14:23Z

Failed to build stage1-std on CI:

[00:19:10] error: internal compiler error: unexpected panic
[00:19:10]
[00:19:10] note: the compiler unexpectedly panicked. this is a bug.
[00:19:10]
[00:19:10] note: we would appreciate a bug report: https://github.com/rust-lang/rust/blob/master/CONTRIBUTING.md#bug-reports
[00:19:10]
[00:19:10] note: rustc 1.22.0-dev running on x86_64-unknown-linux-gnu
[00:19:10]
[00:19:10] thread 'rustc' panicked at 'assertion failed: self.cap().count_ones() == 1', /checkout/src/liballoc/vec_deque.rs:364:8
[00:19:10] note: Run with `RUST_BACKTRACE=1` for a backtrace.
[00:19:10]
[00:19:10] error: Could not compile `compiler_builtins`.

kennytm · 2017-10-22T08:41:02Z

@bors try

Prepare for perf.

bors · 2017-10-22T08:41:12Z

⌛ Trying commit 0963f11 with merge 4e26722...

[vec] growth-strategy optimization This commits introduces a growth-strategy optimization for `RawVec` (and indirectly `Vec` and `VecDeque`). It introduces a method `grow_by(capacity_increase)` that tells the `RawVec` by how much the user would like to increase its capacity (e.g. `1` on `vec.push(val)`). It then uses following growth strategy: - If the `RawVec` is empty: it allocates at least 64 bytes. - If the `RawVec` is not empty: - it uses a growth-factor of 2 for small (<4096 bytes) and large (>4096*32 bytes) vectors, and 1.5 otherwise - it uses this growth factor to compute a suitable capacity - it takes the max between this capacity and the desired capacity increase (e.g. by using the desired capacity increase of `self.cap` one can force this method to double the capacity) - it passes the result to the `usable_size` function of the allocator to obtain the max usable size The commit also refactors the logic of `Vec`'s growth test into a `is_full` function, and uses the `core::intrinsic::unlikely` on the result of both `Vec`'s and `VecDeque`'s test to indicate that growth is an `unlikely` event. The `grow_by` function is not `#[inline(never)]` but `#[inline] + #[cold]`. That is, the function can be inlined, but the function author expects it to not be called often. Combined with the `unlikely` annotation on the call site, the compiler should have enough information to decide when eliding the call is worth it.

bors · 2017-10-22T10:08:21Z

☀️ Test successful - status-travis
State: approved= try=True

kennytm · 2017-10-22T11:04:25Z

Ping @Mark-Simulacrum, build is ready for perf.rlo.

Mark-Simulacrum · 2017-10-22T14:45:54Z

Looks like a rather large increase in memory usage, which probably explains the otherwise somewhat minor regressions in wall time and instructions: http://perf.rust-lang.org/compare.html?start=548109827454f759e9c88b4a1f724c4cdbff8bfa&end=4e26722f7780a52602644592cbc2bb8e30ad0ff0&stat=instructions%3Au.

gnzlbg · 2017-10-22T15:39:40Z

@Mark-Simulacrum yes, this was a spectacular fail. There is a bug where I set the capacity to usable_size instead of usable_size / elem_size... I am actually surprised the memory usage wasn't bigger :D

I've fixed it but am adding some tests for suitable_capacity and grow_by so that this doesn't happen again.

Mark-Simulacrum · 2017-10-22T15:43:29Z

Let me know when you're ready for another perf run.

bluss · 2017-10-22T15:44:08Z

src/liballoc/raw_vec.rs

+            }
+        };
+        self.ptr = uniq;
+        self.cap = new_cap;


(Whole function) Tightly cuddled unsafe { } doesn't make the code just outside of that "safe": it all needs to add up — every calculation and comparison in this function is critical for memory safety.

I'd prefer to keep the original style of a large unsafe block around it all. In particular, inline unsafe { ... } expressions are not really idiomatic.

This function needs to check for overflow in the capacity number. It's a harder job now that it no longer doubles and we can't rely on the isize::MAX trick documented in the double function.

If the capacity number overflows in suitable_capacity then it will panic. If the elem_size * capacity operation overflows then that will panic. Or what am I missing?

I mean, the new capacity can be bigger than isize::MAX on 32-bit archs, and the usable size of the allocator might make it even bigger, but if for some reason none of these operations panic before an actual allocation is attempted there is an alloc_guard check that will make it panic anyways.

I don't see where the check or explicit panic is for that? I mean, plain a + b panics on overflow only when debug assertions are enabled. We need overflow checks that are active all the time.

I thought unsigned integer wraparound always panic'ed... does it only panic with debug_assert enabled? I'll need to add the checks then.

Yes, they are not there in release mode. Old version of this uses quite elaborate arguments about isize::MAX and maximum allocation size, and usize::MAX to avoid most of those checks.

Maybe the API can be changed so that it doesn't need to take into account exactly arbitrary growth numbers? Otherwise we're just reimplementing reserve.

Look at reserve for what it does for overflow checks.

Why is grow_by a new function, and how is it different from reserve? 😄

@bluss indeed, I think I should leave double and double_in_place as they were, add this optimizations to reserve and then make Vec use reserve instead of double on insertion/push. This is basically the reason why I wanted a mentor.

The overflow arguments apply to suitable_capacity as well. if cap < isize::MAX, then cap * 1.5 and cap * 2 cannot overflow usize.

As long as the additional capacity is a parameter to a safe function, that's an arbitrary value and it needs to be checked, not trusted.

bluss · 2017-10-22T15:47:26Z

src/liballoc/raw_vec.rs

@@ -235,7 +235,66 @@ impl<T, A: Alloc> RawVec<T, A> {
        }
    }

-    /// Doubles the size of the type's backing allocation. This is common enough
+    /// Grows the vector capacity by `capacity_increase`.


Doc should probably say it's growing for at least this capacity_increase, adjusted by the growth policy.

A postcondition is that if the function returns, at least capacity + capacity_increase memory is allocated and that sum does not wrap around.

bluss · 2017-10-22T15:49:04Z

src/liballoc/raw_vec.rs

+                // maintained by `alloc_guard`; the alignment will never be too
+                // large as to "not be specifiable" (so we can use
+                // `from_size_align_unchecked`).
+                let new_cap = Self::suitable_capacity(self.cap, capacity_increase);


A wraparound/overflow check is needed here, or in that method, or somewhere close. Or an argument why it's not needed.

(The old code used the argument that the new capacity was <= 2 * old_capacity)

The new capacity was always 2 * old_capacity.

I'll check this. I am adding some tests close/at the i32::MAX boundary for RawVec<u8>. If there are any problems all the 32bit architectures should fail these tests.

gnzlbg · 2017-10-22T15:57:29Z

Everytime I change something in raw_vec I need to compile rustc stage0 to run the tests, is there a faster way than that? I am running ./x.py test src/liballoc --stage 1.

kennytm · 2017-10-22T16:04:18Z

@gnzlbg Try to add --keep-stage 1 as well.

shepmaster · 2017-11-11T03:57:00Z

Reassigning from aturon; looks like bluss has some context.

r? @bluss

bors · 2017-11-12T14:46:27Z

☔ The latest upstream changes (presumably #45902) made this pull request unmergeable. Please resolve the merge conflicts.

kennytm · 2017-11-22T10:45:27Z

Hi @gnzlbg, as that #45514 has been merged (which contains #45512), can this PR be resumed?

gnzlbg · 2017-11-22T11:57:41Z

@kennytm Not yet. The approach @arthurprs suggested was to use the _excess API of the Alloc trait, but it is at least missing alloc_zeroed_excess and grow_in_place_excess which are needed here to implement reserve on top of it. I'll try to get to it.

kennytm · 2017-11-22T12:00:43Z

Okay thanks for the update :)

arthurprs · 2017-11-22T12:19:26Z

We can probably shoehorn usable_size after each of those for now and try to get more mem usage metrics around the growth factor.

gnzlbg · 2017-11-22T12:37:31Z

@arthurprs we can do that in parallel.

I (personally) prefer to:

improve the Alloc API and jemallocs implementation of it:
- alloc_zeroed_excess is missing
- grow_in_place_excess/shrink_in_place_excess are missing
- grow_in_place/shrink_in_place currently gets the excess for free (it is returned by xallocx) and throws it away
- grow_in_place currently returns false (that is, growing failed) if xallocx returns more memory than requested, which does not make sense to me (am I missing something here)?
- shrink_in_place returns false if the new layout does not exactly fit the old layout, that is, if the allocation can be shrink, but not to the exact value the user requested (but e.g., to a little bit more, because xallocx returns the real size), then it fails.
try again to implement Vec and String on top of the Alloc API properly handling the excess with the current growth policy -> measure memory consumption in e.g. servo
try to improve the current growth policy -> measure memory consumption in e.g. servo

arthurprs · 2017-11-22T12:58:02Z

It's sad, but I don't think we'll get all methods returning excess anytime soon. 😞

Rest looks reasonable to me. Let me know if you need help or just wanna discuss things.

aidanhs · 2017-11-30T16:39:05Z

Hi @gnzlbg, just checking in to see how things are going! Are there PRs/issues to follow the progress of the additional excess methods you want?

gnzlbg · 2017-11-30T17:02:44Z

@aidanhs a lot has happened in the jemallocator crate, I have a PR more or less ready that adds the methods to the Alloc trait and implements specialized versions of them for liballoc_jemalloc, but I am finishing with using those in Vec and wanted to do so for String as well. Once that is done and merged then this PR can proceed with changing the allocation strategy. I can ping you in those PRs if you want when I send them.

aidanhs · 2017-11-30T17:33:25Z

If you just keep this PR updated with the latest that'd be great :)

shepmaster · 2017-12-09T15:20:01Z

Hey @gnzlbg! It's been over a week since we last heard from you on this issue. You have a number of merge conflicts which need to be addressed. If this PR is blocked on other PRs please let us know.

alexcrichton · 2018-01-04T16:01:17Z

Ok it's been a few extra weeks now, I'm going to close this due to inactivity but we can of course resubmit with a rebase and such!

rust-highfive assigned aturon Oct 21, 2017

kennytm added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Oct 21, 2017

kennytm reviewed Oct 21, 2017

View reviewed changes

bluss reviewed Oct 21, 2017

View reviewed changes

bluss reviewed Oct 22, 2017

View reviewed changes

kennytm added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Oct 22, 2017

gnzlbg added 8 commits November 10, 2017 11:00

vecdeque grow_by(old_cap)

a52f06e

move unlikely intrinsics out of is_full

ebab7b3

improve reserve instead

30aae7b

all tests pass

9da3665

old heuristic for empty vectors

088127a

tiny_alloc_4_e_4_elems_or_at_least_32_bytes

f9e2dce

original_tiny_original_2x

42e751a

original_tiny_1.5x_growth_otherwise

3639c16

gnzlbg force-pushed the vecopt branch from f47efbf to 43aa00c Compare November 10, 2017 10:00

use alloc_excess and realloc_excess

18a82aa

gnzlbg force-pushed the vecopt branch from 43aa00c to 18a82aa Compare November 10, 2017 10:03

shepmaster added the T-libs-api Relevant to the library API team, which will review and decide on the PR/issue. label Nov 11, 2017

rust-highfive assigned bluss and unassigned aturon Nov 11, 2017

alexcrichton closed this Jan 4, 2018

kennytm added S-inactive Status: Inactive and waiting on the author. This is often applied to closed PRs. and removed S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. labels Jan 4, 2018

		@@ -1754,7 +1755,7 @@ impl<T> VecDeque<T> {
		fn grow_if_necessary(&mut self) {
		if self.is_full() {

[vec] growth-strategy optimization #45434

[vec] growth-strategy optimization #45434

Uh oh!

Conversation

gnzlbg commented Oct 21, 2017

Uh oh!

rust-highfive commented Oct 21, 2017

Uh oh!

leonardo-m commented Oct 21, 2017

Uh oh!

bluss commented Oct 21, 2017

Uh oh!

Choose a reason for hiding this comment

Uh oh!

gnzlbg Oct 21, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

gnzlbg Oct 22, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

gnzlbg commented Oct 21, 2017

Uh oh!

Mark-Simulacrum commented Oct 21, 2017

Uh oh!

bluss Oct 21, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kennytm commented Oct 22, 2017

Uh oh!

kennytm commented Oct 22, 2017

Uh oh!

bors commented Oct 22, 2017

Uh oh!

bors commented Oct 22, 2017

Uh oh!

kennytm commented Oct 22, 2017

Uh oh!

Mark-Simulacrum commented Oct 22, 2017

Uh oh!

gnzlbg commented Oct 22, 2017

Uh oh!

Mark-Simulacrum commented Oct 22, 2017

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

gnzlbg Oct 22, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

gnzlbg Oct 22, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

gnzlbg Oct 21, 2017 •

edited

Loading

gnzlbg Oct 22, 2017 •

edited

Loading

bluss Oct 21, 2017 •

edited

Loading

gnzlbg Oct 22, 2017 •

edited

Loading

gnzlbg Oct 22, 2017 •

edited

Loading

gnzlbg commented Nov 22, 2017 •

edited

Loading

gnzlbg commented Nov 22, 2017 •

edited

Loading