-
Notifications
You must be signed in to change notification settings - Fork 13.4k
[vec] growth-strategy optimization #45434
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Thanks for the pull request, and welcome! The Rust team is excited to review your changes, and you should hear from @aturon (or someone else) soon. If any changes to this PR are deemed necessary, please add them as extra commits. This ensures that the reviewer can see what has changed since they last reviewed the code. Due to the way GitHub handles out-of-date commits, this should also make it reasonably obvious what issues have or haven't been addressed. Large or tricky changes may require several passes of review and changes. Please see the contribution instructions for more information. |
Do you have benchmark that show before/after? |
I think we're going to request benchmarks from bors on this PR. Other benchmarks are good as well. |
src/liballoc/raw_vec.rs
Outdated
// `from_size_align_unchecked`). | ||
let new_cap = Self::suitable_capacity(self.cap, capacity_increase); | ||
let new_size = new_cap * elem_size; | ||
let new_layout = unsafe { Layout::from_size_align_unchecked(new_size, cur.align()) }; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These lines are too long.
[00:04:18] tidy error: /checkout/src/liballoc/raw_vec.rs:262: line longer than 100 chars
[00:04:18] tidy error: /checkout/src/liballoc/raw_vec.rs:264: line longer than 100 chars
[00:04:18] tidy error: /checkout/src/liballoc/raw_vec.rs:354: trailing whitespace
[00:04:18] tidy error: /checkout/src/liballoc/raw_vec.rs:378: line longer than 100 chars
[00:04:18] tidy error: /checkout/src/liballoc/raw_vec.rs:380: line longer than 100 chars
[00:04:18] tidy error: /checkout/src/liballoc/raw_vec.rs:457: line longer than 100 chars
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kennytm I've fixed these except the two links, AFAIK rustdoc does not support splitting links over multiple lines: how should I handle these?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@gnzlbg You could use the syntax:
blah blah blah [link][x] blah blah blah
[x]: https://www.example.com/extremely/long/link?can=use&syntax=like#this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kenny but what happens when the link is longer than 100 chars ? How do I break it so that the docs still work ? (We also have this problem in stdsimd, no solution AFAIK).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@gnzlbg tidy
will ignore long lines in that form. Not sure about rustfmt.
@leonardo-m I have some synthetic benchmarks for the medium sized-vectors but I am more interested into seeing if this is an overall win. So I'd like to request some benchmarks on rustc with the proposed strategy. And then do that again but without special casing large vectors (that is, using 1.5x growth factor for large vectors as well). I only have a laptop and its hard for me to run all of servo's benchmarks but before merging this I'd like to benchmark servo as well. |
I'm happy to run perf.rlo against this PR once Travis passes, just ping me. That won't include servo benchmarks, but if we see a net win in rustc then we can look into running against servo as well. |
src/liballoc/vec_deque.rs
Outdated
@@ -1754,7 +1755,7 @@ impl<T> VecDeque<T> { | |||
fn grow_if_necessary(&mut self) { | |||
if self.is_full() { | |||
let old_cap = self.cap(); | |||
self.buf.double(); | |||
self.buf.grow_by(1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah like travis points out, VecDeque relies on a power of two capacity. So it should continue using double.
src/liballoc/vec_deque.rs
Outdated
@@ -1754,7 +1755,7 @@ impl<T> VecDeque<T> { | |||
fn grow_if_necessary(&mut self) { | |||
if self.is_full() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems like the right place to use unlikely
, if anywhere (and not inside the is_full method)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Failed to build
|
@bors try Prepare for perf. |
[vec] growth-strategy optimization This commits introduces a growth-strategy optimization for `RawVec` (and indirectly `Vec` and `VecDeque`). It introduces a method `grow_by(capacity_increase)` that tells the `RawVec` by how much the user would like to increase its capacity (e.g. `1` on `vec.push(val)`). It then uses following growth strategy: - If the `RawVec` is empty: it allocates at least 64 bytes. - If the `RawVec` is not empty: - it uses a growth-factor of 2 for small (<4096 bytes) and large (>4096*32 bytes) vectors, and 1.5 otherwise - it uses this growth factor to compute a suitable capacity - it takes the max between this capacity and the desired capacity increase (e.g. by using the desired capacity increase of `self.cap` one can force this method to double the capacity) - it passes the result to the `usable_size` function of the allocator to obtain the max usable size The commit also refactors the logic of `Vec`'s growth test into a `is_full` function, and uses the `core::intrinsic::unlikely` on the result of both `Vec`'s and `VecDeque`'s test to indicate that growth is an `unlikely` event. The `grow_by` function is not `#[inline(never)]` but `#[inline] + #[cold]`. That is, the function can be inlined, but the function author expects it to not be called often. Combined with the `unlikely` annotation on the call site, the compiler should have enough information to decide when eliding the call is worth it.
☀️ Test successful - status-travis |
Ping @Mark-Simulacrum, build is ready for perf.rlo. |
Looks like a rather large increase in memory usage, which probably explains the otherwise somewhat minor regressions in wall time and instructions: http://perf.rust-lang.org/compare.html?start=548109827454f759e9c88b4a1f724c4cdbff8bfa&end=4e26722f7780a52602644592cbc2bb8e30ad0ff0&stat=instructions%3Au. |
@Mark-Simulacrum yes, this was a spectacular fail. There is a bug where I set the capacity to I've fixed it but am adding some tests for |
Let me know when you're ready for another perf run. |
src/liballoc/raw_vec.rs
Outdated
} | ||
}; | ||
self.ptr = uniq; | ||
self.cap = new_cap; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(Whole function) Tightly cuddled unsafe { }
doesn't make the code just outside of that "safe": it all needs to add up — every calculation and comparison in this function is critical for memory safety.
I'd prefer to keep the original style of a large unsafe block around it all. In particular, inline unsafe { ... }
expressions are not really idiomatic.
This function needs to check for overflow in the capacity number. It's a harder job now that it no longer doubles and we can't rely on the isize::MAX trick documented in the double function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the capacity number overflows in suitable_capacity
then it will panic. If the elem_size * capacity
operation overflows then that will panic. Or what am I missing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mean, the new capacity can be bigger than isize::MAX
on 32-bit archs, and the usable size of the allocator might make it even bigger, but if for some reason none of these operations panic before an actual allocation is attempted there is an alloc_guard
check that will make it panic anyways.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see where the check or explicit panic is for that? I mean, plain a + b panics on overflow only when debug assertions are enabled. We need overflow checks that are active all the time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought unsigned integer wraparound always panic'ed... does it only panic with debug_assert
enabled? I'll need to add the checks then.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, they are not there in release mode. Old version of this uses quite elaborate arguments about isize::MAX and maximum allocation size, and usize::MAX to avoid most of those checks.
Maybe the API can be changed so that it doesn't need to take into account exactly arbitrary growth numbers? Otherwise we're just reimplementing reserve.
Look at reserve for what it does for overflow checks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is grow_by a new function, and how is it different from reserve? 😄
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bluss indeed, I think I should leave double
and double_in_place
as they were, add this optimizations to reserve
and then make Vec
use reserve instead of double
on insertion/push. This is basically the reason why I wanted a mentor.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The overflow arguments apply to suitable_capacity
as well. if cap < isize::MAX
, then cap * 1.5
and cap * 2
cannot overflow usize
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As long as the additional capacity is a parameter to a safe function, that's an arbitrary value and it needs to be checked, not trusted.
src/liballoc/raw_vec.rs
Outdated
@@ -235,7 +235,66 @@ impl<T, A: Alloc> RawVec<T, A> { | |||
} | |||
} | |||
|
|||
/// Doubles the size of the type's backing allocation. This is common enough | |||
/// Grows the vector capacity by `capacity_increase`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doc should probably say it's growing for at least this capacity_increase, adjusted by the growth policy.
A postcondition is that if the function returns, at least capacity + capacity_increase
memory is allocated and that sum does not wrap around.
src/liballoc/raw_vec.rs
Outdated
// maintained by `alloc_guard`; the alignment will never be too | ||
// large as to "not be specifiable" (so we can use | ||
// `from_size_align_unchecked`). | ||
let new_cap = Self::suitable_capacity(self.cap, capacity_increase); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A wraparound/overflow check is needed here, or in that method, or somewhere close. Or an argument why it's not needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(The old code used the argument that the new capacity was <= 2 * old_capacity)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The new capacity was always 2 * old_capacity
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll check this. I am adding some tests close/at the i32::MAX
boundary for RawVec<u8>
. If there are any problems all the 32bit architectures should fail these tests.
Everytime I change something in |
@gnzlbg Try to add |
Reassigning from aturon; looks like bluss has some context. r? @bluss |
☔ The latest upstream changes (presumably #45902) made this pull request unmergeable. Please resolve the merge conflicts. |
@kennytm Not yet. The approach @arthurprs suggested was to use the |
Okay thanks for the update :) |
We can probably shoehorn |
@arthurprs we can do that in parallel. I (personally) prefer to:
|
It's sad, but I don't think we'll get all methods returning excess anytime soon. 😞 Rest looks reasonable to me. Let me know if you need help or just wanna discuss things. |
Hi @gnzlbg, just checking in to see how things are going! Are there PRs/issues to follow the progress of the additional excess methods you want? |
@aidanhs a lot has happened in the jemallocator crate, I have a PR more or less ready that adds the methods to the |
If you just keep this PR updated with the latest that'd be great :) |
Hey @gnzlbg! It's been over a week since we last heard from you on this issue. You have a number of merge conflicts which need to be addressed. If this PR is blocked on other PRs please let us know. |
Ok it's been a few extra weeks now, I'm going to close this due to inactivity but we can of course resubmit with a rebase and such! |
This commits introduces a growth-strategy optimization for
RawVec
(and indirectlyVec
andVecDeque
). It introduces a methodgrow_by(capacity_increase)
that tells theRawVec
by how much the user would like to increase its capacity (e.g.1
onvec.push(val)
). It then uses following growth strategy:If the
RawVec
is empty: it allocates at least 64 bytes.If the
RawVec
is not empty:self.cap
one can force this method to double the capacity)usable_size
function of the allocator to obtain the max usable sizeThe commit also refactors the logic of
Vec
's growth test into ais_full
function, and uses thecore::intrinsic::unlikely
on the result of bothVec
's andVecDeque
's test to indicate that growth is anunlikely
event. Thegrow_by
function is not#[inline(never)]
but#[inline] + #[cold]
. That is, the function can be inlined, but the function author expects it to not be called often. Combined with theunlikely
annotation on the call site, the compiler should have enough information to decide when eliding the call is worth it.