Dramatic slowdown in rust performance from the serialization benchmarks #19281

erickt · 2014-11-24T21:40:54Z

I've been doing a lot of benchmarking recently (1, 2, 3), and I've seen a pretty dramatic drop in performance over the past couple weeks. While some of it might be explained from upgrading from OSX Mavericks to Yosemite, I still saw a 40% drop in performance between 2014-11-13 and 2014-11-23. I haven't been able to dig into what's going on yet, but I did see that our current implementation of Writer for &mut [u8]:

impl Writer for Vec<u8> {
    #[inline]
    fn write(&mut self, buf: &[u8]) -> IoResult<()> {
        self.push_all(buf);
        Ok(())
    }
}

impl Writer for Vec<u8> {
    #[inline]
    fn write(&mut self, buf: &[u8]) -> IoResult<()> {
        self.push_all(buf);
        Ok(())
    }
}

#[bench]
fn bench_std_vec_writer(b: &mut test::Bencher) {
    let mut dst = Vec::with_capacity(BATCHES * SRC_LEN);
    let src = &[1, .. SRC_LEN];

    b.iter(|| {
        dst.clear();

        do_std_writes(&mut dst, src, BATCHES);
    })
}

Does not appear to be inlining well for some reason:

test writer::bench_std_vec_writer                           ... bench: 1000 | [----*****#*****--------]             | 2000:      1248 ns/iter (+/- 588)
test writer::bench_std_vec_writer_inline_always             ... bench: 900 |   [----*#***--]                        | 2000:      1125 ns/iter (+/- 282)
test writer::bench_std_vec_writer_inline_never              ... bench: 1000 |  [----***#*****--------]              | 2000:      1227 ns/iter (+/- 516)

Rewriting to this makes it 10 times faster (and yes, I realize I'm not updating the length of the Vec<u8>. Could that be a problem?):

struct VecWriter1<'a> {
    dst: &'a mut Vec<u8>,
}

impl<'a> MyWriter for VecWriter1<'a> {
    #[inline]
    fn my_write(&mut self, src: &[u8]) -> IoResult<()> {
        let src_len = src.len();

        self.dst.reserve(src_len);

        let dst = self.dst.as_mut_slice();

        unsafe {
            // we reserved enough room in `dst` to store `src`.
            ptr::copy_nonoverlapping_memory(
                dst.as_mut_ptr(),
                src.as_ptr(),
                src_len);
        }

        Ok(())
    }
}

with this performance:

test writer::bench_vec_writer_1                             ... bench: 100 |         [------*********#*****--------] | 200:       160 ns/iter (+/- 68)
test writer::bench_vec_writer_1_inline_always               ... bench: 100 |     [--------****#**--]                 | 300:       182 ns/iter (+/- 79)
test writer::bench_vec_writer_1_inline_never                ... bench: 600 |   [---****#**--]                       | 2000:       952 ns/iter (+/- 399)

Furthermore, commenting out the self.dst.reserve(src_len) made it just as fast as BufWriter and directly using the unsafe ptr::copy_nonoverlapping_memory.

The text was updated successfully, but these errors were encountered:

erickt · 2014-12-02T01:40:27Z

It looks like this is coming from cf3b2e4, where Vec::reserve* was rewritten. It appears that the old Vec::reserve_additional behavior was about 2-10 times faster according to this micro-benchmark.

erickt · 2014-12-02T06:40:45Z

This is definitely getting out of my area of expertise, but apparently switching Vec::reserve from this:

    pub fn reserve(&mut self, additional: uint) {
        if self.cap - self.len < additional {
            match self.len.checked_add(additional) {
                None => panic!("Vec::reserve: `uint` overflow"),
                // if the checked_add
                Some(new_cap) => {
                    let amort_cap = new_cap.next_power_of_two();
                    // next_power_of_two will overflow to exactly 0 for really big capacities
                    if amort_cap == 0 {
                        self.grow_capacity(new_cap);
                    } else {
                        self.grow_capacity(amort_cap);
                    }
                }
            }
        }
    }

To this:

    pub fn reserve(&mut self, additional: uint) {
        if self.cap - self.len < additional {
            match self.len.checked_add(additional) {
                None => panic!("Vec::reserve: `uint` overflow"),
                Some(new_cap) => {
                    let amort_cap = new_cap.next_power_of_two();
                    // next_power_of_two will overflow to exactly 0 for really big capacities
                    let capacity = if amort_cap == 0 { new_cap } else { amort_cap };
                    self.grow_capacity(capacity)
                }
            }
        }
    }

Is sufficient to get my micro-benchmark's Vec::write to perform like it did a week ago. I don't know enough about low-level optimization to know why this seemingly inconsequential change would have such an impact on performance. @pcwalton or @thestinger, you're pretty familiar with low-level stuff, do you know what could be going on here?

Somehow llvm is able to optimize this version of Vec::reserve into dramatically faster than the old version. In micro-benchmarks this was 2-10 times faster. It also shaved 14 minutes off of rust's compile times. Closes rust-lang#19281.

(I don't understand why this works, and so I don't quite trust this yet. I'm pushing it up to see if anyone else can replicate this performance increase) Somehow llvm is able to optimize this version of Vec::reserve into dramatically faster than the old version. In micro-benchmarks this was 2-10 times faster. It also reduce my Rust compile time from 41 minutes to 27 minutes. Closes #19281.

emberian added I-slow Issue: Problems and improvements with respect to performance of generated code. regression-from-stable-to-nightly Performance or correctness regression from stable to nightly. labels Nov 24, 2014

erickt mentioned this issue Dec 5, 2014

collections: dramatically speed up Vec::reserve with magic #19574

Merged

bors closed this as completed in #19574 Dec 8, 2014

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Dramatic slowdown in rust performance from the serialization benchmarks #19281

Dramatic slowdown in rust performance from the serialization benchmarks #19281

erickt commented Nov 24, 2014

erickt commented Dec 2, 2014

erickt commented Dec 2, 2014

Dramatic slowdown in rust performance from the serialization benchmarks #19281

Dramatic slowdown in rust performance from the serialization benchmarks #19281

Comments

erickt commented Nov 24, 2014

erickt commented Dec 2, 2014

erickt commented Dec 2, 2014