Description
The generated code for passing arguments larger than a machine word looks inefficient.
Test case:
#[inline(never)]
pub fn bar(x: &str) { println!("{}", x) }
pub fn foo(x: &str) { bar(x); bar(x); }
On x86_64-unknown-linux-gnu, compiling with rustc test.rs -O -C no-stack-check --crate-type dylib --emit asm
, I see this code for foo
:
.section .text._ZN3foo20hb6f131ac36a30532PaaE,"ax",@progbits
.globl _ZN3foo20hb6f131ac36a30532PaaE
.align 16, 0x90
.type _ZN3foo20hb6f131ac36a30532PaaE,@function
_ZN3foo20hb6f131ac36a30532PaaE:
.cfi_startproc
pushq %rbx
.Ltmp4:
.cfi_def_cfa_offset 16
subq $16, %rsp
.Ltmp5:
.cfi_def_cfa_offset 32
.Ltmp6:
.cfi_offset %rbx, -16
movq %rdi, %rbx
movups (%rbx), %xmm0
movaps %xmm0, (%rsp)
leaq (%rsp), %rdi
callq _ZN3bar20hf21270c370b3427feaaE@PLT
movups (%rbx), %xmm0
movaps %xmm0, (%rsp)
leaq (%rsp), %rdi
callq _ZN3bar20hf21270c370b3427feaaE@PLT
addq $16, %rsp
popq %rbx
retq
.Ltmp7:
.size _ZN3foo20hb6f131ac36a30532PaaE, .Ltmp7-_ZN3foo20hb6f131ac36a30532PaaE
.cfi_endproc
foo
receives the address of the &str
in %rdi
. It copies it into a new stack location for each call, then passes the address of that location to bar
.
Could foo
forward the address of the &str
along without making stack copies?
If I remove one of the bar
calls from foo
, then the function also ought to become a tail call, but it doesn't. Tail call optimization does occur if I replace the &str
types with &&str
.
The calling convention for passing &str
(and other arguments larger than a machine word?) seems to be:
- Make a copy of the argument on the stack.
- Pass the address of the copy in the conventional manner (in a register or on the stack).
- The callee may modify the copy.
i.e. We seem to be passing values both by-value and by-reference.
With the current convention, I think we could get smaller code by eliding some of the copies. If the copies were instead immutable, I think we could elide more copies.
Compiler version:
rustc 1.0.0-nightly (b47aebe3f 2015-02-26) (built 2015-02-27)
binary: rustc
commit-hash: b47aebe3fc2da06c760fd8ea19f84cbc41d34831
commit-date: 2015-02-26
build-date: 2015-02-27
host: x86_64-unknown-linux-gnu
release: 1.0.0-nightly
Activity
dotdash commentedon Feb 28, 2015
I'm currently working on a PR that makes use of LLVM's
byval
attribute. That would at least make the code a bit shorter because the address of the copy is implicitly passed because it's a fixed offset from the stack pointer. With that,foo
is translated to:And some other improvements also happen, I'll detail them with the PR, once I get the last few bugs fixed.
In a later step, we could definitely look into additional improvements like just dropping the
byval
flag for functions that are known not to modify their argument.rprichard commentedon Feb 28, 2015
Good to know. Does dropping
byval
mean that the value is passed by-reference?dotdash commentedon Feb 28, 2015
Since the
byval
attribute replaces the calls tomemcpy
that we currently manually emit, yes, that would then completely remove the copy and make it a pass-by-reference.rprichard commentedon Mar 1, 2015
@dotdash Does your PR also fix #22924?
dotdash commentedon Mar 4, 2015
The
byval
stuff is blocked on #5016comex commentedon Mar 12, 2015
As mentioned in a recent Discourse thread - why does it have to go through a double indirection (pointer to fat pointer to string data) at all, whether or not that can be optimized in forwarding situations? Normally a small structure passed by value would be placed in registers.
dotdash commentedon Mar 12, 2015
I'm working on a patch that does that (pass fat pointers in registers), and a first iteration shows good results, but I'm still passing the fat pointers as a single struct
{ i8*, i64 }
instead of splitting it into two arguments which probably isn't a good idea. Unfortunately, to split a single rust argument into two llvm arguments needs some refactorings, so it might take some time to be ready.comex commentedon Mar 12, 2015
Why isn't it a good idea? I think ABIs are normally good at dealing with structs passed by value, although ARM has a hazard in the case of small structs returned by value (it's slower than returning an equivalently sized scalar).
dotdash commentedon Mar 12, 2015
The rust ABI isn't good at it, yet. That's why I'm working on the patch to change it.
The thing with the first iteration of my patch is that it translates
fn foo(s: &str)
todeclare void @foo({ i8*, i64 })
instead ofdeclare void @foo(i8*, i64)
. It works to some degree, because LLVM happens to pass each struct field as if it was a distinct argument, so at the asm level you get the desired results (at least on x86 and x86_64), but the LLVM function passes get almost 40% slower, which I suspect to happen due to LLVM not handling FCA arguments that well. We might also miss some optimizations that way. We dropped FCA loads/stores a while ago because of that.comex commentedon Mar 13, 2015
LLVM 'happening to pass' (i.e. must pass using standard ABIs on most architectures) structs like multiple values is what I meant by ABIs being good at it. Quite interesting that it slows down LLVM, though.
dotdash commentedon Mar 13, 2015
Oh, now I see. No, LLVM doesn't automatically handle ABI compliance. It only handles calling conventions. For example, we used to pass a struct consisting of 4 u8 values (think RGBA) directly as the struct type. LLVM did pass the 4 values as 4 individual arguments. But the x86_64 SysV ABI (and by now the rust ABI, too) wants that struct to be passed as a single u32 value. And it's the frontend's (i.e. our) job to handle that.
comex commentedon Mar 13, 2015
Oh... wow, I'm really surprised by that. I'd love to know what motivated that decision, considering how close LLVM IR is semantically to C.
10 remaining items