Skip to content

Seemingly inefficient code generated to forward a parameter to a function #22891

Closed
@rprichard

Description

@rprichard
Contributor

The generated code for passing arguments larger than a machine word looks inefficient.

Test case:

#[inline(never)]
pub fn bar(x: &str) { println!("{}", x) }
pub fn foo(x: &str) { bar(x); bar(x); }

On x86_64-unknown-linux-gnu, compiling with rustc test.rs -O -C no-stack-check --crate-type dylib --emit asm, I see this code for foo:

    .section    .text._ZN3foo20hb6f131ac36a30532PaaE,"ax",@progbits
    .globl  _ZN3foo20hb6f131ac36a30532PaaE
    .align  16, 0x90
    .type   _ZN3foo20hb6f131ac36a30532PaaE,@function
_ZN3foo20hb6f131ac36a30532PaaE:
    .cfi_startproc
    pushq   %rbx
.Ltmp4:
    .cfi_def_cfa_offset 16
    subq    $16, %rsp
.Ltmp5:
    .cfi_def_cfa_offset 32
.Ltmp6:
    .cfi_offset %rbx, -16
    movq    %rdi, %rbx
    movups  (%rbx), %xmm0
    movaps  %xmm0, (%rsp)
    leaq    (%rsp), %rdi
    callq   _ZN3bar20hf21270c370b3427feaaE@PLT
    movups  (%rbx), %xmm0
    movaps  %xmm0, (%rsp)
    leaq    (%rsp), %rdi
    callq   _ZN3bar20hf21270c370b3427feaaE@PLT
    addq    $16, %rsp
    popq    %rbx
    retq
.Ltmp7:
    .size   _ZN3foo20hb6f131ac36a30532PaaE, .Ltmp7-_ZN3foo20hb6f131ac36a30532PaaE
    .cfi_endproc

foo receives the address of the &str in %rdi. It copies it into a new stack location for each call, then passes the address of that location to bar.

Could foo forward the address of the &str along without making stack copies?

If I remove one of the bar calls from foo, then the function also ought to become a tail call, but it doesn't. Tail call optimization does occur if I replace the &str types with &&str.

The calling convention for passing &str (and other arguments larger than a machine word?) seems to be:

  1. Make a copy of the argument on the stack.
  2. Pass the address of the copy in the conventional manner (in a register or on the stack).
  3. The callee may modify the copy.

i.e. We seem to be passing values both by-value and by-reference.

With the current convention, I think we could get smaller code by eliding some of the copies. If the copies were instead immutable, I think we could elide more copies.

Compiler version:

rustc 1.0.0-nightly (b47aebe3f 2015-02-26) (built 2015-02-27)
binary: rustc
commit-hash: b47aebe3fc2da06c760fd8ea19f84cbc41d34831
commit-date: 2015-02-26
build-date: 2015-02-27
host: x86_64-unknown-linux-gnu
release: 1.0.0-nightly

Activity

dotdash

dotdash commented on Feb 28, 2015

@dotdash
Contributor

I'm currently working on a PR that makes use of LLVM's byval attribute. That would at least make the code a bit shorter because the address of the copy is implicitly passed because it's a fixed offset from the stack pointer. With that, foo is translated to:

    subq    $24, %rsp
.Ltmp4:
    .cfi_def_cfa_offset 32
    movaps  32(%rsp), %xmm0
    movups  %xmm0, (%rsp)
    callq   _ZN3bar20h6b5879ee29f32fdeeaaE@PLT
    movaps  32(%rsp), %xmm0
    movups  %xmm0, (%rsp)
    callq   _ZN3bar20h6b5879ee29f32fdeeaaE@PLT
    addq    $24, %rsp
    retq

And some other improvements also happen, I'll detail them with the PR, once I get the last few bugs fixed.

In a later step, we could definitely look into additional improvements like just dropping the byval flag for functions that are known not to modify their argument.

rprichard

rprichard commented on Feb 28, 2015

@rprichard
ContributorAuthor

Good to know. Does dropping byval mean that the value is passed by-reference?

dotdash

dotdash commented on Feb 28, 2015

@dotdash
Contributor

Since the byval attribute replaces the calls to memcpy that we currently manually emit, yes, that would then completely remove the copy and make it a pass-by-reference.

rprichard

rprichard commented on Mar 1, 2015

@rprichard
ContributorAuthor

@dotdash Does your PR also fix #22924?

dotdash

dotdash commented on Mar 4, 2015

@dotdash
Contributor

The byval stuff is blocked on #5016

comex

comex commented on Mar 12, 2015

@comex
Contributor

As mentioned in a recent Discourse thread - why does it have to go through a double indirection (pointer to fat pointer to string data) at all, whether or not that can be optimized in forwarding situations? Normally a small structure passed by value would be placed in registers.

dotdash

dotdash commented on Mar 12, 2015

@dotdash
Contributor

I'm working on a patch that does that (pass fat pointers in registers), and a first iteration shows good results, but I'm still passing the fat pointers as a single struct { i8*, i64 } instead of splitting it into two arguments which probably isn't a good idea. Unfortunately, to split a single rust argument into two llvm arguments needs some refactorings, so it might take some time to be ready.

comex

comex commented on Mar 12, 2015

@comex
Contributor

Why isn't it a good idea? I think ABIs are normally good at dealing with structs passed by value, although ARM has a hazard in the case of small structs returned by value (it's slower than returning an equivalently sized scalar).

dotdash

dotdash commented on Mar 12, 2015

@dotdash
Contributor

The rust ABI isn't good at it, yet. That's why I'm working on the patch to change it.

The thing with the first iteration of my patch is that it translates fn foo(s: &str) to declare void @foo({ i8*, i64 }) instead of declare void @foo(i8*, i64). It works to some degree, because LLVM happens to pass each struct field as if it was a distinct argument, so at the asm level you get the desired results (at least on x86 and x86_64), but the LLVM function passes get almost 40% slower, which I suspect to happen due to LLVM not handling FCA arguments that well. We might also miss some optimizations that way. We dropped FCA loads/stores a while ago because of that.

comex

comex commented on Mar 13, 2015

@comex
Contributor

LLVM 'happening to pass' (i.e. must pass using standard ABIs on most architectures) structs like multiple values is what I meant by ABIs being good at it. Quite interesting that it slows down LLVM, though.

dotdash

dotdash commented on Mar 13, 2015

@dotdash
Contributor

Oh, now I see. No, LLVM doesn't automatically handle ABI compliance. It only handles calling conventions. For example, we used to pass a struct consisting of 4 u8 values (think RGBA) directly as the struct type. LLVM did pass the 4 values as 4 individual arguments. But the x86_64 SysV ABI (and by now the rust ABI, too) wants that struct to be passed as a single u32 value. And it's the frontend's (i.e. our) job to handle that.

comex

comex commented on Mar 13, 2015

@comex
Contributor

Oh... wow, I'm really surprised by that. I'd love to know what motivated that decision, considering how close LLVM IR is semantically to C.

10 remaining items

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

No one assigned

    Labels

    A-codegenArea: Code generation

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

      Development

      No branches or pull requests

        Participants

        @steveklabnik@comex@dotdash@kmcallister@rprichard

        Issue actions

          Seemingly inefficient code generated to forward a parameter to a function · Issue #22891 · rust-lang/rust