Skip to content

[RISC-V] Try using c.addi+c.beqz/c.bnez to reduce size of immediate-compare branches #158728

@arichardson

Description

@arichardson

Prompted by the recent change to support the proposed Zibi extension (PR #146858 and PR #127463), I looked into whether the same code size reduction could be achieved with compressed add+branches. At least for some cases we should be able to emit
c.addi a0, -CMP; c.beqz a0, ... (4 bytes) instead of the current c.li a1, CMP, beq a0, a1 (6 bytes).
Interestingly this allows representing all immeidates that Zibi will handle but of course comes with register pressure and codegen challenges that Zibi avoids.
Looking at the most common immediates in spec2017 and 2017, the distribution does seem quite random but it seems like a 6 bit immediate covers many of them (https://gist.github.com/arichardson/b88cc3d3cac1a7fec85ee1d24b463d99)

I made a draft change in https://github.com/arichardson/upstream-llvm-project/tree/2025-compressed-branch-imm, but doing this in tablegen does not seem to be particularly useful.
If I emmit the pattern unconditionally, we end up needed a c.mv $TMP, a0 in many cases since the c.addi requires same in and output registers. In that case we are better off with the c.li.

However, in cases where the register is dead after the comparison, using c.addi+c.beqz avoids the need for an extra register (same as Zibi), so it would be good to have an optimization that uses this pattern whenever possible to asses how much Zibi actually helps code size+reg pressure. Of course Zibi can still be beneficial even with identical code size for simple cores since there is no need to modify a register and/or do somewhat complex macro-op fusion.

Given the following code (https://godbolt.org/z/fn4rx1vr8):

int foo();
int bar();

int test(int num) {
    if (num == 11) {
        return foo();
    } else if (num == 16) {
        return bar();
    }
    return 1;
}

Both clang and GCC emit two li+bne/beq:

test:
        li      a1, 16
        beq     a0, a1, .LBB0_3
        li      a1, 11
        bne     a0, a1, .LBB0_4
        tail    foo
.LBB0_3:
        tail    bar
.LBB0_4:
        li      a0, 1
        ret

But since a0 is dead after the branch we should be able to save at least two bytes for the final branch quite easily:

test:
        c.li    a1,11
        beq     a0,a5,.LBB0_3
        c.addi  a0,-16
        c.beqz  a0,.LBB0_4
        c.li    a0,1
        ret
.LBB0_3:
        tail    foo
.LBB0_4:
        tail    bar

With some more effort we could track the value of a0 and save another 2 bytes to generate:

test:
        c.addi  a0,-11
        c.beqz  a0,.LBB0_3
        c.addi  a0,-5. # subtract another 5 to get to the expected -16
        c.beqz  a0,.LBB0_4
        c.li    a0,1
        ret
.LBB0_3:
        tail    foo
.LBB0_4:
        tail    bar

CC: @BoyaoWang430 @topperc @wangpc-pp

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions