-
Notifications
You must be signed in to change notification settings - Fork 13.6k
Make Rc<T>::deref
zero-cost
#141348
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Make Rc<T>::deref
zero-cost
#141348
Conversation
This comment has been minimized.
This comment has been minimized.
df34f84
to
d3a7429
Compare
This comment has been minimized.
This comment has been minimized.
bc84ec6
to
19fb34b
Compare
The Miri subtree was changed cc @rust-lang/miri |
19fb34b
to
f5245ba
Compare
@bors try @rust-timer queue |
This comment has been minimized.
This comment has been minimized.
Make `Rc<T>::deref` zero-cost This PR makes `Rc::deref` zero-cost by changing the internal pointer so that it points to the value directly instead of the allocation. This is split out from #132553, which will also make `Arc::deref` zero-cost.
☀️ Try build successful - checks-actions |
This comment has been minimized.
This comment has been minimized.
Finished benchmarking commit (8ef4a25): comparison URL. Overall result: ❌✅ regressions and improvements - please read the text belowBenchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf. Next Steps: If you can justify the regressions found in this try perf run, please indicate this with @bors rollup=never Instruction countThis is the most reliable metric that we have; it was used to determine the overall result at the top of this comment. However, even this metric can sometimes exhibit noise.
Max RSS (memory usage)Results (primary -0.3%, secondary -0.0%)This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
CyclesResults (primary -0.5%, secondary -1.9%)This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
Binary sizeResults (primary 0.2%, secondary 1.6%)This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
Bootstrap: 775.728s -> 776.531s (0.10%) |
Reminder, once the PR becomes ready for a review, use |
f5245ba
to
e345c2f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some first-pass comments about the basic structure before looking in depth at the rest.
As a note, I'm having the same problem Jonas and Scott were: this is >4k total changed lines of mission-critical code in a single diff. I'll get through it one way or another, but this is the kind of situation where having >1 commit in the PR would make a huge difference in reviewability. That also lets you build up the design with a commit description for each part of the change, explaining the "why"/"how" each step is done (that context is missing here).
E.g. this could be split into orthogonal patches like:
- Add basic traits
- Introduce
RawRc
and its most basic methods - Introduce
RawWeak
and its most basic methods - Introduce
RcLayout
and its most basic methods - Introduce
RawUniqueRc
and its most basic methods
Somewhere around here, the basic brand new part should "work", at least conceptually (possibly with a bunch of todo!()
s).
- More complicated
RawRc
andRawWeak
methods and traits, split further for anything that makes sense to get its own description - Replace the
Rc
implementation - Test changes and updates
- Debuginfo changes & test updates
(If you're open to this but need workflow advice, happy to help)
/// Defines the `RefCounts` struct to store reference counts. The reference counters have suitable | ||
/// alignment for atomic operations. | ||
macro_rules! define_ref_counts { | ||
($($target_pointer_width:literal => $align:literal,)*) => { | ||
$( | ||
/// Stores reference counts. | ||
#[cfg(target_pointer_width = $target_pointer_width)] | ||
#[repr(C, align($align))] | ||
pub(crate) struct RefCounts { | ||
/// Weak reference count (plus one if there are non-zero strong reference counts). | ||
pub(crate) weak: UnsafeCell<usize>, | ||
/// Strong reference count. | ||
pub(crate) strong: UnsafeCell<usize>, | ||
} | ||
)* | ||
}; | ||
} | ||
|
||
// This ensures reference counters have correct alignment so that they can be treated as atomic | ||
// reference counters for `Arc`. | ||
define_ref_counts! { | ||
"16" => 2, | ||
"32" => 4, | ||
"64" => 8, | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't really need a macro:
#[repr(C)]
#[cfg_attr(target_pointer_width = "16", align(16))]
#[cfg_attr(target_pointer_width = "32", align(32))]
#[cfg_attr(target_pointer_width = "64", align(64))]
pub(crate) struct RefCounts {
// ...
}
(we need align(usize)
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you mean this?
#[cfg_attr(target_pointer_width = "16", repr(C, align(2)))]
#[cfg_attr(target_pointer_width = "32", repr(C, align(4)))]
#[cfg_attr(target_pointer_width = "64", repr(C, align(8)))]
pub(crate) struct RefCounts {
// ...
}
There doesn’t seem to be an align
attribute。
pub(crate) unsafe trait RcOps: Sized { | ||
/// Increments a reference counter managed by `RawRc` and `RawWeak`. Currently, both strong and | ||
/// weak reference counters are incremented by this method. | ||
/// | ||
/// # Safety | ||
/// | ||
/// - `count` should only be handled by the same `RcOps` implementation. | ||
/// - The value of `count` should be non-zero. | ||
unsafe fn increment_ref_count(count: &UnsafeCell<usize>); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would something like this work?
unsafe trait RcOps: Sized {
type Count;
fn count_from_unsafe_cell(&UnsafeCell<usize>) -> &Self::Count;
fn increment_ref_count(count: &Self::Count);
}
Arc
defines Count
as Atomic<usize>
and Rc
as Cell<usize>
. Then the unsafety of working with UnsafeCell
is contained to that single function, and the rest of the functions here can take &Self:::Count
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, is UnsafeCell
needed at all? If this trait were implemented on the counter (rather than a dummy type) then I think that can be eliminated:
trait RcCounter {
fn from_usize(usize) -> Self;
fn increment_count(&self);
fn decrement_count(&self);
fn is_unique(counts: &RefCounts<Self>);
// ...
}
struct RefCounts<Count: RcCounter> {
weak: Count,
strong: Count,
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can see that would be a problem for the const
constructor. For that specific case, It is probably reasonable to make RcCounter
an unsafe trait, with the safety requirement that it must be transmutable from UnsafeCell<usize>
, then do a transmute in RefCounts::new
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, is
UnsafeCell
needed at all? If this trait were implemented on the counter (rather than a dummy type) then I think that can be eliminated:
Usage of UnsafeCell
is to avoid different counter types for Rc
and Arc
, so that they can share some codes that operates on the counter types. If I use different counter types for Rc
and Arc
, some functions needs to be changed to generic functions which involves monomorphization overheads, I have to reason about the overhead is negligible or can be optimized away, so I decided that using a single UnsafeCell
type is the safer choice.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Was this done in response to perf results? I'm thinking that at least the first option I mentioned (type Count
) wouldn't be any different since the relevant parts are type-specific anyway.
Even with the second method (RefCounts
generic over Count
) I wouldn't expect there to be much of a difference: the functions are either generic anyway, or #[inline]
, or trivially inlineable so it's probably happening anyway. Most of the parts that are large enough to share code are the allocation methods, which don't use RefCounts
anyway. We could handle specific cases where it's helpful by turning both versions into a RefCounts<UnsafeCell>
. (And I wouldn't expect any of this to be worse than the current impl; though of course perf could show differently)
My concern here is just that it's more difficult to follow the soundness invariants through the code compared to the current version.
/// - `count` should only be handled by the same `RcOps` implementation. | ||
/// - The value of `count` should be non-zero. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Safety docs should probably be using "must" rather than "should"; as written, it sounds optional.
@@ -0,0 +1,23 @@ | |||
//@ compile-flags: -Z merge-functions=disabled |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's reasonably obvious from the name, but could you add a comment saying what the test does? We're trying to get a bit better with that. There's also the codegen-llvm/lib-optimizations
directory, which might be a good fit for these tests.
(Same for the other test)
@tgross35: I can further split my commit, but I can only do this in my free time, so it will take some time for me to finish it. Also, I think it’s better for me to investigate the perf result on #132553 first, and do some optimizations, then I can do the splitting base on a known-to-be-good commit. Also, can I have your opinion on the overall design of introducing |
I love the approach and have wanted to do something similar for a while! The |
/// Allocates uninitialized memory for a reference-counted allocation with allocator `alloc` and | ||
/// layout `RcLayout`. Returns a pointer to the value location. | ||
#[inline] | ||
fn allocate_uninit_raw_bytes<A>(alloc: &A, rc_layout: RcLayout) -> Result<NonNull<()>, AllocError> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are a lot of "Returns a pointer to the value location" in the docs, and functions that expect a pointer to the value. What about a new struct RcDataPointer(NonNull<()>)
to provide some type safety about what's being pointed to?
The -> NonNull<RefCounts>
method would just be a method on this
This PR makes
Rc::deref
zero-cost by changing the internal pointer to point directly to the value instead of to the allocation.This PR is split from #132553, which will also make
Arc::deref
zero-cost.