Skip to content

RangeToInclusive code-generation much worse than equivilent RangeTo #63646

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
gamozolabs opened this issue Aug 16, 2019 · 8 comments · Fixed by #134871
Closed

RangeToInclusive code-generation much worse than equivilent RangeTo #63646

gamozolabs opened this issue Aug 16, 2019 · 8 comments · Fixed by #134871
Labels
A-codegen Area: Code generation C-bug Category: This is a bug. E-needs-test Call for participation: An issue has been fixed and does not reproduce, but no test has been added. I-slow Issue: Problems and improvements with respect to performance of generated code. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.

Comments

@gamozolabs
Copy link

I have two copies of the exact same code, one using for _ in 0..65 and the other using for _ in 0..=64. The 0..65 one optimizes down to very simple code, however the 0..=64 version ends up emitting a large amount of unnecessary code.


RangeTo example

pub fn and_stuff(a: i32, mut b: i32) -> i32 {
    for _ in 0..65 {
        b &= a;
    }

    b
}

Emits (https://rust.godbolt.org/z/pcscMB):

example::and_stuff:
        mov     eax, edi
        and     eax, esi
        ret

RangeToInclusive example

pub fn and_stuff(a: i32, mut b: i32) -> i32 {
    for _ in 0..=64 {
        b &= a;
    }

    b
}

Emits (https://rust.godbolt.org/z/22JaO2):

.LCPI0_0:
        .zero   4
        .long   4294967295
        .long   4294967295
        .long   4294967295
example::and_stuff:
        movd    xmm0, esi
        movaps  xmm1, xmmword ptr [rip + .LCPI0_0]
        movss   xmm1, xmm0
        movd    xmm0, edi
        pshufd  xmm0, xmm0, 0
        pand    xmm0, xmm1
        pshufd  xmm1, xmm0, 78
        pand    xmm1, xmm0
        pshufd  xmm0, xmm1, 229
        pand    xmm0, xmm1
        movd    eax, xmm0
        and     eax, edi
        ret
@tesuji
Copy link
Contributor

tesuji commented Aug 17, 2019

If you use -C opt-level=3, two snippets will output the same assembly: https://rust.godbolt.org/z/tfvYvi

@jonas-schievink jonas-schievink added A-codegen Area: Code generation C-bug Category: This is a bug. I-slow Issue: Problems and improvements with respect to performance of generated code. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. labels Aug 17, 2019
@falk-hueffner
Copy link

I'm having the same issue, my code takes about twice as long with "..=", even with -C opt-level=3: https://rust.godbolt.org/z/yrPRd3
Here's a complete benchmark program: https://rust.godbolt.org/z/6hjAAZ

@miggazElquez
Copy link

The problem is fixed on 1.42, but is back on the beta 1.43 and on nightly :
https://rust.godbolt.org/z/vJHfta

@alex
Copy link
Member

alex commented Dec 29, 2024

This no longer reproduces on a recent rustc. Should this be closed, or is it preferable to add a codegen test?

@clubby789
Copy link
Contributor

Should have a test
@rustbot label +E-needs-test

@rustbot rustbot added the E-needs-test Call for participation: An issue has been fixed and does not reproduce, but no test has been added. label Dec 29, 2024
alex added a commit to alex/rust that referenced this issue Dec 29, 2024
alex added a commit to alex/rust that referenced this issue Dec 29, 2024
@bors bors closed this as completed in 6c12546 Dec 30, 2024
rust-timer added a commit to rust-lang-ci/rust that referenced this issue Dec 30, 2024
Rollup merge of rust-lang#134871 - clubby789:test-63646, r=compiler-errors

Add codegen test for issue 63646

Closes rust-lang#63646
@falk-hueffner
Copy link

I still see a slowdown with the inclusive range with the test I posted above, although not as bad as before:

/tmp% rustup run nightly rustc --version                                               
rustc 1.86.0-nightly (9a1d156f3 2025-01-19)
/tmp% rustup run nightly rustc -C opt-level=2 -C target-cpu=native tmp.rs && time ./tmp
./tmp  7.96s user 0.00s system 99% cpu 7.964 total
/tmp% rustup run nightly rustc -C opt-level=2 -C target-cpu=native tmp.rs && time ./tmp
./tmp  10.64s user 0.00s system 99% cpu 10.645 total

@ChrisDenton
Copy link
Member

Did you try opt-level=3?

@falk-hueffner
Copy link

I'm getting very strange (but reproducible) results:

       O2    O3
..    7.7  12.0
..=  10.3  10.3

That is, O3 is slower for the exclusive ranges. However, if I look at the disassembly, it seems the code is actually identical – looks like this is some strange effect triggered by alignment or something? I'm not really sure what to make of it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-codegen Area: Code generation C-bug Category: This is a bug. E-needs-test Call for participation: An issue has been fixed and does not reproduce, but no test has been added. I-slow Issue: Problems and improvements with respect to performance of generated code. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
9 participants