Skip to content

Rust doesn't unroll a loop with constant data #38694

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
RReverser opened this issue Dec 29, 2016 · 10 comments
Closed

Rust doesn't unroll a loop with constant data #38694

RReverser opened this issue Dec 29, 2016 · 10 comments
Labels
C-enhancement Category: An issue proposing an enhancement or a PR with one. I-slow Issue: Problems and improvements with respect to performance of generated code.

Comments

@RReverser
Copy link
Contributor

RReverser commented Dec 29, 2016

I was playing with http://rust.godbolt.org/ and noticed one weird thing where Rust (all of stable, beta and nightly versions) seems to prevent loop unroll and/or constant propagation optimizations.

Here is original minimal code using which I can reproduce an issue:

pub fn g0() -> bool {
  vec![1,2,3].contains(&2)
}

So it basically creates a vector out of constant values with constant length & capacity, and then searches for a constant value within it, which is a perfect case for constant propagation. And yet the generated assembly looks like:

example::g0:
        push    rbp
        mov     rbp, rsp
        push    rbx
        push    rax
        mov     edi, 12
        mov     esi, 4
        call    __rust_allocate@PLT
        test    rax, rax
        je      .LBB0_6
        movabs  rcx, 8589934593
        mov     qword ptr [rax], rcx
        mov     dword ptr [rax + 8], 3
        xor     ecx, ecx
.LBB0_2:
        cmp     rcx, 12
        je      .LBB0_3
        mov     bl, 1
        cmp     dword ptr [rax + rcx], 2
        lea     rcx, [rcx + 4]
        jne     .LBB0_2
        jmp     .LBB0_5
.LBB0_3:
        xor     ebx, ebx
.LBB0_5:
        mov     esi, 12
        mov     edx, 4
        mov     rdi, rax
        call    __rust_deallocate@PLT
        mov     eax, ebx
        add     rsp, 8
        pop     rbx
        pop     rbp
        ret
.LBB0_6:
        call    alloc::oom::oom@PLT

I tried writing down a plain "dumb" loop instead, with an assert + static loop bounds:

pub fn g1() -> bool {
  let v = vec![1,2,3];
  assert!(v.len() == 3);
  for i in 0..3 {
    if v[i] == 2 {
      return true
    }
  }
  return false
}

But output assembly is almost 100% the same (even though assert was successfully removed by DCE, so apparently constant propagation works as expected).

However, unrolling this same loop by hand at the next step seems to suddenly enable optimization:

pub fn g2() -> bool {
  let v = vec![1,2,3];
  assert!(v.len() == 3);
  if v[0] == 2 {
    return true
  }
  if v[1] == 2 {
    return true
  }
  if v[2] == 2 {
    return true
  }
  return false
}

compiles to

example::g2:
        push    rbp
        mov     rbp, rsp
        mov     al, 1
        pop     rbp
        ret

as originally expected.

Initially I though this is a missing attribute on Rust allocation functions that doesn't allow LLVM to reason about pointer contents, but after playing with replacing allocator with system one, replacing vectors with pure static-length slices ([i32; 3]) and finally unrolling loop by hand as above, figured it's not an issue, otherwise 1) slices would be still optimized or 2) unrolled loop would still be not.

I would be happy to help to fix whatever is preventing those optimizations, but so far I don't really understand what's happening here or where to look for the potential bug.

Online playground URL with all these examples (plus slices): https://godbolt.org/g/TRQgg8

@steveklabnik steveklabnik added the I-slow Issue: Problems and improvements with respect to performance of generated code. label Dec 30, 2016
@RReverser
Copy link
Contributor Author

Oh, and FWIW, analogous C code with Clang does constant-evaluate in all cases.

@jrmuizel
Copy link
Contributor

Interestingly if I take g2 and build without -O and then pass the result to llvm-opt -O2 I get

define zeroext i1 @_ZN8unrolled2g217h3aa4d16eeface44bE() unnamed_addr #0 personality i32 (i32, i32, i64, %"unwind::libunwind::_Unwind_Exception"*, %"unwind::libunwind::_Unwind_Context"*)* @rust_eh_personality {
entry-block:
  %0 = tail call i8* @__rust_allocate(i64 12, i64 4) #3
  %1 = icmp eq i8* %0, null
  br i1 %1, label %bb5.i, label %_ZN4drop17h3bfa678d62ddc26bE.exit

bb5.i:                                            ; preds = %entry-block
  tail call void @_ZN5alloc3oom3oom17h087c259ea35c365cE()
  unreachable

_ZN4drop17h3bfa678d62ddc26bE.exit:                ; preds = %entry-block
  %2 = bitcast i8* %0 to i32*
  store i32 1, i32* %2, align 4
  %3 = getelementptr inbounds i8, i8* %0, i64 4
  %4 = bitcast i8* %3 to i32*
  store i32 2, i32* %4, align 4
  %5 = getelementptr inbounds i8, i8* %0, i64 8
  %6 = bitcast i8* %5 to i32*
  store i32 3, i32* %6, align 4
  tail call void @__rust_deallocate(i8* nonnull %0, i64 12, i64 4) #3
  ret i1 true
}

Instead of

define zeroext i1 @_ZN8unrolled2g217h3aa4d16eeface44bE() unnamed_addr #0 personality i32 (i32, i32, i64, %"unwind::libunwind::_Unwind_Exception"*, %"unwind::libunwind::_Unwind_Context"*)* @rust_eh_personality {
_ZN4drop17h3bfa678d62ddc26bE.exit:
  ret i1 true
}

I don't know what causes this difference. I'm actually surprised the calls to __rust_allocate are being eliminated as it seems like OOM could still happen.

@jrmuizel
Copy link
Contributor

What was the C code that you were using?

@RReverser
Copy link
Contributor Author

RReverser commented Dec 30, 2016

@jrmuizel Something like this seems to be the closest to what Rust is supposed to do in g1: https://godbolt.org/g/FamXce

@RReverser
Copy link
Contributor Author

Hmm this is getting even weirder. I played a bit more with constant slices (not vector) versions of functions here: https://godbolt.org/g/FzynPE

As you can see, I've added f3 and f4 as below

pub fn f3() -> bool {
  let v = [1,2,3];
  v.iter().any(|&x| x == 2)
}

pub fn f4() -> bool {
  [1,2,3].iter().any(|&x| x == 2)
}

And first one is not optimized to a constant, but second one is... Not sure if it's a different issue or sheds a light on what's happening in vector one too.

@bluss
Copy link
Member

bluss commented Dec 30, 2016

Is this simply the following? “LLVM cannot unroll loops with multiple exits“ https://llvm.org/bugs/show_bug.cgi?id=27360

I've looked at in context of #37972 which aimed to unroll some searching loops manually for this reason.

@RReverser
Copy link
Contributor Author

RReverser commented Dec 30, 2016

@bluss Might be related, but how come Clang constant-evaluates a very similar implementation of g1 with early return inside of the loop (I've posted a link above) just fine?

@RReverser
Copy link
Contributor Author

Also, doesn't explain that last f3 vs f4 example (unless it's unrelated issue with liveness analysis).

@Mark-Simulacrum Mark-Simulacrum added the C-enhancement Category: An issue proposing an enhancement or a PR with one. label Jul 26, 2017
@Emerentius
Copy link
Contributor

Re-testing with Rust 1.26 shows the original example to have been resolved.

pub fn g0() -> bool {
  vec![1,2,3].contains(&2)
}

results in

example::g0:
  push rbp
  mov rbp, rsp
  mov al, 1
  pop rbp
  ret

@RReverser
Copy link
Contributor Author

@Emerentius Indeed, I suppose I can close this one.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-enhancement Category: An issue proposing an enhancement or a PR with one. I-slow Issue: Problems and improvements with respect to performance of generated code.
Projects
None yet
Development

No branches or pull requests

6 participants