Skip to content

Missed optimization on array comparison #62531

@tesuji

Description

@tesuji
Contributor

The godbolt link: https://godbolt.org/z/dc9o3x

I think this snippet should just return true:

pub fn compare() -> bool {
    let bytes = 12.5f32.to_ne_bytes();
    bytes == if cfg!(target_endian = "big") {
        [0x41, 0x48, 0x00, 0x00]
    } else {
        [0x00, 0x00, 0x48, 0x41]
    }
}

The generated asm with opt-level=3:

example::compare:
        push    rax
        mov     al, 1
        pop     rcx
        ret

Activity

mati865

mati865 commented on Jul 9, 2019

@mati865
Member

I think there are 2 issues:

  • to_ne_bytes is not const fn
  • if cfg! is not const fn

See this modified example: https://godbolt.org/z/86WRHF

tesuji

tesuji commented on Jul 9, 2019

@tesuji
ContributorAuthor

I may misunderstand but in the C version, does clang require similar conditions to optimize code?
@rustbot modify labels: A-codegen A-const-eval

mati865

mati865 commented on Jul 9, 2019

@mati865
Member

I may misunderstand but in the C version, does clang require similar condition to optimize code?

#if __BYTE_ORDER__ == __ORDER_BIG_ENDIAN__
    return *(uint32_t *)gf_u.bytes == *(uint32_t *)big;
#elif __BYTE_ORDER__ == __ORDER_LITTLE_ENDIAN__
    return *(uint32_t *)gf_u.bytes == *(uint32_t *)lit;
#endif

Those are preprocessor directives and are handled at compile time so their Rust equivalent would be:
https://godbolt.org/z/Ou1bTH

#![feature(float_to_from_bytes)]
#![feature(stmt_expr_attributes)]
pub fn compare() -> bool {
    let gf_u = 12.5f32;
    #[cfg(target_endian = "big")]
    return gf_u.to_ne_bytes() == [0x41, 0x48, 0x00, 0x00];
    #[cfg(not(target_endian = "big"))]
    return gf_u.to_ne_bytes() == [0x00, 0x00, 0x48, 0x41];
}
tesuji

tesuji commented on Jul 9, 2019

@tesuji
ContributorAuthor

https://doc.rust-lang.org/nightly/std/macro.cfg.html states that cfg! macro:

Evaluates boolean combinations of configuration flags at compile-time.

mati865

mati865 commented on Jul 9, 2019

@mati865
Member

Good point, those two are equal:
https://godbolt.org/z/V7Iuoo

#![feature(float_to_from_bytes)]
pub fn compare() -> bool {
    let gf_u = 12.5f32;
    #[cfg(target_endian = "big")]
    return gf_u.to_ne_bytes() == [0x41, 0x48, 0x00, 0x00];
    #[cfg(not(target_endian = "big"))]
    return gf_u.to_ne_bytes() == [0x00, 0x00, 0x48, 0x41];
}

pub fn compare2() -> bool {
    let bytes = 12.5f32.to_ne_bytes();
    if cfg!(target_endian = "big") {
        bytes == [0x41, 0x48, 0x00, 0x00]
    } else {
        bytes == [0x00, 0x00, 0x48, 0x41]
    }
}

I don't know enough about const and I could be totally wrong but I think in the first case if cfg! expands to something like:

#![feature(float_to_from_bytes)]
pub fn compare() -> bool {
    let bytes = 12.5f32.to_ne_bytes();
    let arr = if cfg!(target_endian = "big") {
         [0x41, 0x48, 0x00, 0x00]
    } else {
        [0x00, 0x00, 0x48, 0x41]
    };

    bytes == arr
}

It cannot make arr constant because #49146 is not yet complete.

added
A-codegenArea: Code generation
A-const-evalArea: Constant evaluation, covers all const contexts (static, const fn, ...)
on Jul 10, 2019
tesuji

tesuji commented on Jul 10, 2019

@tesuji
ContributorAuthor

Interesting, If I look at MIR output, I think rustc already have enough information:

MIR
// WARNING: This output format is intended for human consumers only
// and is subject to change without notice. Knock yourself out.
fn  compare() -> bool {
    let mut _0: bool;                    // return place in scope 0 at src/lib.rs:2:21: 2:25
    let _1: [u8; 4];                     // "bytes" in scope 0 at src/lib.rs:3:9: 3:14
    let mut _2: &[u8; 4];                // in scope 0 at src/lib.rs:4:5: 4:10
    let mut _3: &[u8; 4];                // in scope 0 at src/lib.rs:4:14: 8:6
    let _4: [u8; 4];                     // in scope 0 at src/lib.rs:4:14: 8:6
    let mut _5: bool;                    // in scope 0 at src/lib.rs:4:17: 4:44
    scope 1 {
    }

    bb0: {
        StorageLive(_1);                 // bb0[0]: scope 0 at src/lib.rs:3:9: 3:14
        _1 = const core::f32::<impl f32>::to_ne_bytes(const 12.5f32) -> bb1; // bb0[1]: scope 0 at src/lib.rs:3:17: 3:38
                                         // ty::Const
                                         // + ty: fn(f32) -> [u8; _] {core::f32::<impl f32>::to_ne_bytes}
                                         // + val: Scalar(<ZST>)
                                         // mir::Constant
                                         // + span: src/lib.rs:3:25: 3:36
                                         // + ty: fn(f32) -> [u8; _] {core::f32::<impl f32>::to_ne_bytes}
                                         // + literal: Const { ty: fn(f32) -> [u8; _] {core::f32::<impl f32>::to_ne_bytes}, val: Scalar(<ZST>) }
                                         // ty::Const
                                         // + ty: f32
                                         // + val: Scalar(0x41480000)
                                         // mir::Constant
                                         // + span: src/lib.rs:3:17: 3:24
                                         // + ty: f32
                                         // + literal: Const { ty: f32, val: Scalar(0x41480000) }
    }

    bb1: {
        StorageLive(_2);                 // bb1[0]: scope 1 at src/lib.rs:4:5: 4:10
        _2 = &_1;                        // bb1[1]: scope 1 at src/lib.rs:4:5: 4:10
        StorageLive(_3);                 // bb1[2]: scope 1 at src/lib.rs:4:14: 8:6
        StorageLive(_4);                 // bb1[3]: scope 1 at src/lib.rs:4:14: 8:6
        StorageLive(_5);                 // bb1[4]: scope 1 at src/lib.rs:4:17: 4:44
        _5 = const false;                // bb1[5]: scope 1 at src/lib.rs:4:17: 4:44
                                         // ty::Const
                                         // + ty: bool
                                         // + val: Scalar(0x00)
                                         // mir::Constant
                                         // + span: src/lib.rs:4:17: 4:44
                                         // + ty: bool
                                         // + literal: Const { ty: bool, val: Scalar(0x00) }
        switchInt(_5) -> [false: bb2, otherwise: bb3]; // bb1[6]: scope 1 at src/lib.rs:4:14: 8:6
    }

    bb2: {
        _4 = [const 0u8, const 0u8, const 72u8, const 65u8]; // bb2[0]: scope 1 at src/lib.rs:7:9: 7:33
                                         // ty::Const
                                         // + ty: u8
                                         // + val: Scalar(0x00)
                                         // mir::Constant
                                         // + span: src/lib.rs:7:10: 7:14
                                         // + ty: u8
                                         // + literal: Const { ty: u8, val: Scalar(0x00) }
                                         // ty::Const
                                         // + ty: u8
                                         // + val: Scalar(0x00)
                                         // mir::Constant
                                         // + span: src/lib.rs:7:16: 7:20
                                         // + ty: u8
                                         // + literal: Const { ty: u8, val: Scalar(0x00) }
                                         // ty::Const
                                         // + ty: u8
                                         // + val: Scalar(0x48)
                                         // mir::Constant
                                         // + span: src/lib.rs:7:22: 7:26
                                         // + ty: u8
                                         // + literal: Const { ty: u8, val: Scalar(0x48) }
                                         // ty::Const
                                         // + ty: u8
                                         // + val: Scalar(0x41)
                                         // mir::Constant
                                         // + span: src/lib.rs:7:28: 7:32
                                         // + ty: u8
                                         // + literal: Const { ty: u8, val: Scalar(0x41) }
        goto -> bb4;                     // bb2[1]: scope 1 at src/lib.rs:4:14: 8:6
    }

    bb3: {
        _4 = [const 65u8, const 72u8, const 0u8, const 0u8]; // bb3[0]: scope 1 at src/lib.rs:5:9: 5:33
                                         // ty::Const
                                         // + ty: u8
                                         // + val: Scalar(0x41)
                                         // mir::Constant
                                         // + span: src/lib.rs:5:10: 5:14
                                         // + ty: u8
                                         // + literal: Const { ty: u8, val: Scalar(0x41) }
                                         // ty::Const
                                         // + ty: u8
                                         // + val: Scalar(0x48)
                                         // mir::Constant
                                         // + span: src/lib.rs:5:16: 5:20
                                         // + ty: u8
                                         // + literal: Const { ty: u8, val: Scalar(0x48) }
                                         // ty::Const
                                         // + ty: u8
                                         // + val: Scalar(0x00)
                                         // mir::Constant
                                         // + span: src/lib.rs:5:22: 5:26
                                         // + ty: u8
                                         // + literal: Const { ty: u8, val: Scalar(0x00) }
                                         // ty::Const
                                         // + ty: u8
                                         // + val: Scalar(0x00)
                                         // mir::Constant
                                         // + span: src/lib.rs:5:28: 5:32
                                         // + ty: u8
                                         // + literal: Const { ty: u8, val: Scalar(0x00) }
        goto -> bb4;                     // bb3[1]: scope 1 at src/lib.rs:4:14: 8:6
    }

    bb4: {
        _3 = &_4;                        // bb4[0]: scope 1 at src/lib.rs:4:14: 8:6
        _0 = const <[u8; 4] as std::cmp::PartialEq>::eq(move _2, move _3) -> bb5; // bb4[1]: scope 1 at src/lib.rs:4:5: 8:6
                                         // ty::Const
                                         // + ty: for<'r, 's> fn(&'r [u8; 4], &'s [u8; 4]) -> bool {<[u8; 4] as std::cmp::PartialEq>::eq}
                                         // + val: Scalar(<ZST>)
                                         // mir::Constant
                                         // + span: src/lib.rs:4:5: 8:6
                                         // + ty: for<'r, 's> fn(&'r [u8; 4], &'s [u8; 4]) -> bool {<[u8; 4] as std::cmp::PartialEq>::eq}
                                         // + literal: Const { ty: for<'r, 's> fn(&'r [u8; 4], &'s [u8; 4]) -> bool {<[u8; 4] as std::cmp::PartialEq>::eq}, val: Scalar(<ZST>) }
    }

    bb5: {
        StorageDead(_3);                 // bb5[0]: scope 1 at src/lib.rs:8:5: 8:6
        StorageDead(_2);                 // bb5[1]: scope 1 at src/lib.rs:8:5: 8:6
        StorageDead(_1);                 // bb5[2]: scope 0 at src/lib.rs:9:1: 9:2
        StorageDead(_5);                 // bb5[3]: scope 0 at src/lib.rs:9:1: 9:2
        StorageDead(_4);                 // bb5[4]: scope 0 at src/lib.rs:9:1: 9:2
        return;                          // bb5[5]: scope 0 at src/lib.rs:9:2: 9:2
    }
}
ecstatic-morse

ecstatic-morse commented on Jul 10, 2019

@ecstatic-morse
Contributor

@mati865 Whether a function is marked const at the source level has no effect on whether or not it can be optimized by LLVM. Even if to_ne_bytes was not const, inlining and const propagation could conceivably reduce this to a constant.

added
A-LLVMArea: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues.
I-slowIssue: Problems and improvements with respect to performance of generated code.
on Jul 10, 2019
ecstatic-morse

ecstatic-morse commented on Oct 23, 2019

@ecstatic-morse
Contributor

Rust now generates the following (almost optimal) assembly for the OP's code:

example::compare:
  push rax
  mov al, 1
  pop rcx
  ret

I would guess that recent improvements to const propagation in the MIR are responsible for this (thanks @wesleywiser!), but it might also have been a change in LLVM. This seems like a decent candidate for a codegen regression test, although an equivalent one might already exist.

tesuji

tesuji commented on Oct 23, 2019

@tesuji
ContributorAuthor

It is better now. But are push rax and pop rcx instructions necessary?

ecstatic-morse

ecstatic-morse commented on Oct 23, 2019

@ecstatic-morse
Contributor

I assume it is due to a difference in calling convention, but it does seem kind of silly. You'll need to ask someone more knowledgeable unfortunately.

ecstatic-morse

ecstatic-morse commented on Oct 23, 2019

@ecstatic-morse
Contributor

@lzutao A quick google leads me to believe that the push is used to align the stack to a 16-byte boundary.

4 remaining items

added
T-compilerRelevant to the compiler team, which will review and decide on the PR/issue.
C-enhancementCategory: An issue proposing an enhancement or a PR with one.
on Mar 14, 2020
tesuji

tesuji commented on Feb 24, 2021

@tesuji
ContributorAuthor

This issue appears to be fixed on beta. Maybe we need a codegen test for it.

Edit: It seems that the llvm-ir generated by rustc are the same between beta and stable.
Only the optimized asm for x86 is different. Should I just write assembly codegen or just close this issue.

added
E-needs-testCall for participation: An issue has been fixed and does not reproduce, but no test has been added.
on Feb 24, 2021
nagisa

nagisa commented on May 9, 2021

@nagisa
Member

The issue seems to have re-appeared in master :(

removed
E-needs-testCall for participation: An issue has been fixed and does not reproduce, but no test has been added.
on May 9, 2021
added
C-optimizationCategory: An issue highlighting optimization opportunities or PRs implementing such
on Oct 8, 2023
JOE1994

JOE1994 commented on Feb 19, 2024

@JOE1994
Contributor

Both rustc 1.76.0 & rustc nightly (as of Feb 19 2024) generates the following asm (with -C opt-level=3)

example::compare:
        mov     al, 1
        ret
added
E-needs-testCall for participation: An issue has been fixed and does not reproduce, but no test has been added.
on Feb 20, 2024
added a commit that references this issue on May 20, 2024
65dff32
added a commit that references this issue on May 21, 2024
1d0e4af
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

No one assigned

    Labels

    A-LLVMArea: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues.A-codegenArea: Code generationA-const-evalArea: Constant evaluation, covers all const contexts (static, const fn, ...)C-enhancementCategory: An issue proposing an enhancement or a PR with one.C-optimizationCategory: An issue highlighting optimization opportunities or PRs implementing suchE-needs-testCall for participation: An issue has been fixed and does not reproduce, but no test has been added.I-slowIssue: Problems and improvements with respect to performance of generated code.T-compilerRelevant to the compiler team, which will review and decide on the PR/issue.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

      Participants

      @nikic@nagisa@mati865@jonas-schievink@JOE1994

      Issue actions

        Missed optimization on array comparison · Issue #62531 · rust-lang/rust