Skip to content

[jemalloc] set correct excess in realloc_excess #45514

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Nov 4, 2017

Conversation

gnzlbg
Copy link
Contributor

@gnzlbg gnzlbg commented Oct 25, 2017

No description provided.

@rust-highfive
Copy link
Contributor

Thanks for the pull request, and welcome! The Rust team is excited to review your changes, and you should hear from @aturon (or someone else) soon.

If any changes to this PR are deemed necessary, please add them as extra commits. This ensures that the reviewer can see what has changed since they last reviewed the code. Due to the way GitHub handles out-of-date commits, this should also make it reasonably obvious what issues have or haven't been addressed. Large or tricky changes may require several passes of review and changes.

Please see the contribution instructions for more information.

@gnzlbg gnzlbg force-pushed the jemalloc_realloc2 branch from 3d1bf45 to d16c140 Compare October 25, 2017 12:18

let flags = align_to_flags(new_align);
let ptr = rallocx(ptr as *mut c_void, new_size, flags) as *mut u8;
let alloc_size = sallocx(ptr as *mut c_void, flags);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sallocx and *excess = need to go into a !null branch.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

!!! indeed, should be fixed now

@kennytm kennytm added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Oct 25, 2017
@gnzlbg
Copy link
Contributor Author

gnzlbg commented Oct 25, 2017

@kennytm can you try this?

@gnzlbg
Copy link
Contributor Author

gnzlbg commented Oct 25, 2017

@Mark-Simulacrum would it be possible to run a perf of this?

@arthurprs
Copy link
Contributor

Is this function used anywhere right now?

@gnzlbg
Copy link
Contributor Author

gnzlbg commented Oct 25, 2017

@arthurprs no idea


FYI jemalloc devs say that nallocx should give the correct result here.

@kennytm
Copy link
Member

kennytm commented Oct 25, 2017

@bors try

@bors
Copy link
Collaborator

bors commented Oct 25, 2017

⌛ Trying commit f39594d with merge 4b59c82...

bors added a commit that referenced this pull request Oct 25, 2017
[jemalloc] set correct excess in realloc_excess
@arthurprs
Copy link
Contributor

arthurprs commented Oct 25, 2017

nallocx seems to be faster. I didn't dig into the code to find why but I guess guess it has to find the "type" of allocation ptr points to. Probably the same "fast path" that makes sdalloc faster than dalloc.

Edit: there's also precedent in folly https://github.com/facebook/folly/blob/e8252dd7b5590427839eae4e4b2d76d2c4d9cff5/folly/memory/Malloc.h#L211

running 3 tests
test roundtrip           ... bench:          13 ns/iter (+/- 0)
test roundtrip_w_nallocx ... bench:          15 ns/iter (+/- 0)
test roundtrip_w_sallocx ... bench:          19 ns/iter (+/- 1)



#[bench]
fn roundtrip_w_nallocx(b: &mut Bencher) {
    b.iter(|| unsafe {
        let ptr = jemalloc_sys::mallocx(100, 0);
        let rsz = jemalloc_sys::nallocx(100, 0);
        jemalloc_sys::sdallocx(ptr, rsz, 0);
        // rsz
    });
}

#[bench]
fn roundtrip_w_sallocx(b: &mut Bencher) {
    b.iter(|| unsafe {
        let ptr = jemalloc_sys::mallocx(100, 0);
        let rsz = sallocx(ptr as usize, 0);
        jemalloc_sys::sdallocx(ptr, rsz, 0);
        // rsz
    });
}

#[bench]
fn roundtrip(b: &mut Bencher) {
    b.iter(|| unsafe {
        let ptr = jemalloc_sys::mallocx(100, 0);
        // let rsz = sallocx(ptr as usize, 0);
        jemalloc_sys::sdallocx(ptr, 100, 0);
    });
}

@bors
Copy link
Collaborator

bors commented Oct 25, 2017

☀️ Test successful - status-travis
State: approved= try=True

@gnzlbg
Copy link
Contributor Author

gnzlbg commented Oct 25, 2017

nallocx seems to be faster.

This is what I expected, IIUC nallocx just needs to compute a size, while sallocx might need to traverse some internal data-structures in some cases.

As soon as @Mark-Simulacrum does a perf run, I will commit the version with nallocx, so that we can compare them, and probably right after just merge the nallocx version. I just don't want to be walking over "blind" assumptions when working on RawVec.

@Mark-Simulacrum
Copy link
Member

You can push and run try while you wait for perf; I just need the artifacts from try builds which stick around for 30 days.

@gnzlbg gnzlbg force-pushed the jemalloc_realloc2 branch from 4f2d8de to 45ef012 Compare October 25, 2017 18:26
@gnzlbg
Copy link
Contributor Author

gnzlbg commented Oct 26, 2017

@kennytm could you try the last commit ?

@kennytm
Copy link
Member

kennytm commented Oct 26, 2017

@bors try

@bors
Copy link
Collaborator

bors commented Oct 26, 2017

⌛ Trying commit 45ef012 with merge 9b02a2d...

bors added a commit that referenced this pull request Oct 26, 2017
[jemalloc] set correct excess in realloc_excess
@bors
Copy link
Collaborator

bors commented Oct 26, 2017

☀️ Test successful - status-travis
State: approved= try=True

@gnzlbg
Copy link
Contributor Author

gnzlbg commented Oct 30, 2017

@bluss a similar effect to the one in #45512 can be observed here.

The micro-benchmarks of @arthurprs show that sallocx is slower, but perf shows that it is a bit faster. The benches with sallocx use less memory and are faster. The only explanation I can think of here is that sallocx is maybe more accurate than nallocx. I am going to ping jemalloc devs on these issues.

@arthurprs
Copy link
Contributor

I'm a bit confused. What are we measuring? Is this function even used in the current codebase?

@gnzlbg
Copy link
Contributor Author

gnzlbg commented Oct 30, 2017

@arthurprs it is not used anywhere in std or core but I don't know if those crates on the perf run use it.

If they don't, this change should have no effect on the benchmarks but e.g. memory consumption in some cases varies up to 10-15%, that would be a lot of noise. The same applies to the other PR.

@arthurprs
Copy link
Contributor

I don't think it's used, can you double check?

@gnzlbg
Copy link
Contributor Author

gnzlbg commented Oct 30, 2017

@arthurprs one of the regex benchmarks shows a 15% difference. I've followed all dependencies of the regex crate and couldn't find any that uses alloc_excess or realloc_excess. That would mean that at least up to 19% differences in memory consumption (shown by the other PR) are just noise.

@Mark-Simulacrum thoughts? I think perf runs should come with an error estimate for each test.

@arthurprs
Copy link
Contributor

I think some of these tests are showing a high variance even in unrelated changes, example graph http://perf.rust-lang.org/index.html?start=2017-10-25&end=2017-10-29&stat=max-rss&absolute=false

@shepmaster
Copy link
Member

Moving to another libs team member...

r? @sfackler

@rust-highfive rust-highfive assigned sfackler and unassigned aturon Nov 3, 2017
@gnzlbg
Copy link
Contributor Author

gnzlbg commented Nov 3, 2017

I think this is good to go, ideally, I (or @arthurprs) will commit @arthurprs synthetic benchmarks to rustc (currently liballoc_jemalloc has no benchmarks), but we can do that later.

@shepmaster shepmaster added the T-libs-api Relevant to the library API team, which will review and decide on the PR/issue. label Nov 3, 2017
@sfackler
Copy link
Member

sfackler commented Nov 3, 2017

I may be missing some background here, but why are we optimizing a function that isn't called anywhere?

@gnzlbg
Copy link
Contributor Author

gnzlbg commented Nov 3, 2017

Because we want to be able to call it from RawVec. Right now RawVec just reallocs dropping the excess storage returned by the allocator. We can use usable_size to hack around this there, but... this is actually why realloc_excess exists in the first place.

The fact that realloc_excess drops the excess storage is a bug.

@sfackler
Copy link
Member

sfackler commented Nov 3, 2017

But were the perf comparisons above performed with that RawVec change?

@gnzlbg
Copy link
Contributor Author

gnzlbg commented Nov 3, 2017

@sfackler No. The perf comparisons above are completely irrelevant because the compiler doesn't use realloc_excess anywhere but they only test for the performance of the compiler. All the differences one is seeing is... pure noise (up to 20% in memory consumption, up to 15% in cpu-cycles...).

@arthurprs did some synthetic benchmarks: #45514 (comment) where the cost of calling nallocx is of about ~2 ns on its machine.

Once we enable this in RawVec we'll need to add synthetic benchmarks for it because the perf results have too much noise.

@sfackler
Copy link
Member

sfackler commented Nov 3, 2017

Ok cool, that makes sense. Is there any reason not to land both this and the alloc_excess change or do you want to leave one open for now?

@gnzlbg
Copy link
Contributor Author

gnzlbg commented Nov 3, 2017

They can be merged.

The reason for the split was that I wanted to run perf on both separately because... 1) I thought perf benchmarked something more than the compiler (which it doesn't; external crates could have been using ralloc/alloc_excess), and 2) I wanted to check how these impacted memory consumption in real programs (which perf doesn't show because it only test the compiler and the variance in memory consumption is way too high).

Had I knew all of this this would have been merged in @arthurprs together one week ago.

@sfackler
Copy link
Member

sfackler commented Nov 3, 2017

Ok, let's merge both into this PR and land it to avoid having to wait on bors twice.

@gnzlbg
Copy link
Contributor Author

gnzlbg commented Nov 3, 2017

@sfackler done, I am closing #45512 .

@sfackler
Copy link
Member

sfackler commented Nov 3, 2017

@bors r+

@bors
Copy link
Collaborator

bors commented Nov 3, 2017

📌 Commit 549ab77 has been approved by sfackler

@kennytm kennytm added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Nov 3, 2017
@bors
Copy link
Collaborator

bors commented Nov 4, 2017

⌛ Testing commit 549ab77 with merge a454152...

bors added a commit that referenced this pull request Nov 4, 2017
[jemalloc] set correct excess in realloc_excess
@bors
Copy link
Collaborator

bors commented Nov 4, 2017

☀️ Test successful - status-appveyor, status-travis
Approved by: sfackler
Pushing a454152 to master...

@bors bors merged commit 549ab77 into rust-lang:master Nov 4, 2017
@gnzlbg gnzlbg mentioned this pull request Nov 21, 2017
12 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. T-libs-api Relevant to the library API team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants