`black_box(f())` in scope 'b gets optimized away, unless `f`is `||:'a -> T` and `'a > 'b` #14367

japaric · 2014-05-22T22:58:15Z

Script to reproduce:

extern crate test;
extern crate time;

use test::black_box;
use time::precise_time_ns;

fn fib(n: uint) -> uint {
    match n {
        0 | 1 => 1,
        n => fib(n - 1) + fib(n - 2),
    }
}

fn bench(fun: || -> uint) -> f64 {
    // KILL_SWITCH
    //let fun = || fib(20);

    let iters = 1_000;
    let start = precise_time_ns();
    for _ in range(0, iters) {
        black_box(fun());
    }
    let end = precise_time_ns();

    (end - start) as f64 / iters as f64
}

fn main() {
    println!("{}", bench(|| fib(20)))
}

$ rustc -O bench.rs && ./bench  # any `--opt-level` can be used instead of `-O`
31190.276
$ rustc -O bench.rs && ./bench  # change `iters` to `10_000`, output doesn't change too much as expected
30723.2899
$ rustc -O bench.rs && ./bench  # toggle KILL_SWITCH
4.0624
$ rustc -O bench.rs && ./bench  # change `iters` back to `1_000`
45.602

$ rustc --version
rustc 0.11.0-pre (e7e5e9c 2014-05-22 02:51:21 -0700)

I found this issue because I tried to benchmark directly in main without the auxiliar bench function, by calling black_box(fib(20)) in a loop, but black_box got optimized away and I got results like shown above.

My hypothesis is that black_box only seems to only work with closures whose captured env outlive the scope where black_box is used. The above example supports this hypothesis.

This issue doesn't (seem to) affect the Bencher framework right now. ButIi/when we get unboxed closures, the example above would pass an unboxed closure that captures nothing (a unit-like struct?), would black_box still work in that case?

cc @huonw

The text was updated successfully, but these errors were encountered:

huonw · 2014-05-22T23:30:20Z

I don't think this is a bug, at least, the only "bug" here is the non-KILL_SWITCH version not being optimised.

In particular, calling black_box(fib(20)) is the same as writing let n = fib(20); black_box(n), the fib call can be moved around freely, as long as the n value passed into black_box is the correct one. Looking at the IR, you can see this, the "correct" (i.e. without KILL_SWITCH) IR looks like (the match_else.i block is the body of the loop):

  %1 = call i64 @_ZN15precise_time_ns20h53df8191405a33f50pa11v0.11.0.preE()
  %2 = bitcast i64* %dummy.i.i to i8*
  br label %match_else.i

match_else.i:                                     ; preds = %match_else.i, %entry-block
  %3 = phi i64 [ 0, %entry-block ], [ %4, %match_else.i ]
  %4 = add i64 %3, 1
  %5 = call fastcc i64 @_ZN3fib20hebff6027c68c44a7iaa4v0.0E(i64 20)
  call void @llvm.lifetime.start(i64 8, i8* %2) #3
  store i64 %5, i64* %dummy.i.i, align 8
  call void asm "", "r,~{dirflag},~{fpsr},~{flags}"(i64* %dummy.i.i) #3
  call void @llvm.lifetime.end(i64 8, i8* %2) #3
  %exitcond.i = icmp eq i64 %4, 1000
  br i1 %exitcond.i, label %_ZN5bench20h269af04030408b87Jaa4v0.0E.exit, label %match_else.i

_ZN5bench20h269af04030408b87Jaa4v0.0E.exit:       ; preds = %match_else.i
  %6 = call i64 @_ZN15precise_time_ns20h53df8191405a33f50pa11v0.11.0.preE()

but the "bad" KILL_SWITCH version looks like

  %1 = call i64 @_ZN15precise_time_ns20h53df8191405a33f50pa11v0.11.0.preE()
  %2 = call fastcc i64 @_ZN3fib20hc787ae08bc88b138iaa4v0.0E(i64 20) #3
  %3 = bitcast i64* %dummy.i.i to i8*
  br label %match_else.i

match_else.i:                                     ; preds = %match_else.i, %entry-block
  %4 = phi i64 [ 0, %entry-block ], [ %5, %match_else.i ]
  %5 = add i64 %4, 1
  call void @llvm.lifetime.start(i64 8, i8* %3) #3
  store i64 %2, i64* %dummy.i.i, align 8
  call void asm "", "r,~{dirflag},~{fpsr},~{flags}"(i64* %dummy.i.i) #3
  call void @llvm.lifetime.end(i64 8, i8* %3) #3
  %exitcond.i = icmp eq i64 %5, 1000
  br i1 %exitcond.i, label %_ZN5bench20hd83cb69b9731384fJaa4v0.0E.exit, label %match_else.i

_ZN5bench20hd83cb69b9731384fJaa4v0.0E.exit:       ; preds = %match_else.i
  %6 = call i64 @_ZN15precise_time_ns20h53df8191405a33f50pa11v0.11.0.preE()

In particular, the black_box call is always there (the call void asm line), and the only real difference is the position of the fib call (call fastcc i64 @_ZN3fib20hc787ae08bc88b138iaa4v0.0E(i64 20) #3), since LLVM recognised that the fib function is "pure" (i.e. just arithmetic) and so doesn't need to be recalled each time. However, this doesn't explain why it's not optimising the "correct" case.

To get this to reliably do what you want, without us disabling optimisations (which is a nonstarter), you can pass the argument to fib through a black box, e.g.

|| {
    let mut n = 20;
    black_box(&mut n); // doesn't change it, but LLVM can't tell.
    fib(n)
}

This makes the IR slightly less nice, but it preserves the fib call in the loop.

japaric · 2014-05-23T00:20:12Z

@huonw Thanks a lot for the detailed explanation! This cleared my doubts about how black_box works, and also explains why f = precise_time_ns also "works" (precise_time_ns is an 'impure" ffi call).

I'm closing this because my hypothesis have been invalidated.

…rent-types, r=HKalbasi Fixup path fragments upon MBE transcription Fixes rust-lang#14367 There are roughly two types of paths: paths in expression context, where a separator `::` between an identifier and its following generic argument list is mandatory, and paths in type context, where `::` can be omitted. Unlike rustc, we need to transform the parsed fragments back into tokens during transcription. When the matched path fragment is a type-context path and is transcribed as an expression-context path, verbatim transcription would cause a syntax error. This PR fixes up path fragments by inserting `::` to make sure they are syntactically correct in all contexts. Note that this works because expression-context paths are a strict superset of type-context paths.

…ust-lang#14367) There were two bugs here. Let's assume `T` is a singleton type implementing `Default` and that `f()` takes a `T`: - `f(<T>::default())` cannot be replaced by `f(<T)` as it was (incorrect spans – this is tricky because the type relative path uses a base span covering only `T`, not `<T>`) (third commit) - The argument of `f(<_>::default())` is inferred correctly, but cannot be replaced by `<_>` or `_`, as this cannot be used to infer an instance of a singleton type (first commit). The second commit offers better error messages by pointing at the whole expression. Fix rust-lang#12654 changelog: [`default_constructed_unit_struct`]: do not suggest incorrect fix when using a type surrounded by brackets

japaric closed this as completed May 23, 2014

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`black_box(f())` in scope 'b gets optimized away, unless `f`is `||:'a -> T` and `'a > 'b` #14367

`black_box(f())` in scope 'b gets optimized away, unless `f`is `||:'a -> T` and `'a > 'b` #14367

japaric commented May 22, 2014

huonw commented May 22, 2014

japaric commented May 23, 2014

black_box(f()) in scope 'b gets optimized away, unless fis ||:'a -> T and 'a > 'b #14367

black_box(f()) in scope 'b gets optimized away, unless fis ||:'a -> T and 'a > 'b #14367

Comments

japaric commented May 22, 2014

huonw commented May 22, 2014

japaric commented May 23, 2014

`black_box(f())` in scope 'b gets optimized away, unless `f`is `||:'a -> T` and `'a > 'b` #14367

`black_box(f())` in scope 'b gets optimized away, unless `f`is `||:'a -> T` and `'a > 'b` #14367