-
Notifications
You must be signed in to change notification settings - Fork 13.3k
Use explicit lifetimes for LLVM allocas #12004
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Currently, stack allocated variables live for the whole duration of the function they're allocated in. This means that LLVM can't put two allocas into the same stack space, even if they're never used at the same time. By using the LLVM intrinsics to declare the lifetime of allocas, we allow LLVM to use the same stack space for allocas that have non-overlapping lifetimes, reducing stack usage. In certain situations this even allows LLVM to remove duplicated code that previously only differed in which part of the stack it used.
This is more of an RFC at this time. I'm not exactly happy with the concrete implementation, because it introduces lots of extra cleanups and LLVM IR, increasing the memory usage when compiling librustc by about 15% and the compile time by maybe 3% as well. Both of whic seem excessive. I guess we could drop the lifetime calls when we're in the toplevel scope of a function, but I couldn't figure out how to do that in a clean way. That said, given this code (which is extremly favorable to this change): pub fn foo (x: bool) {
match x {
true => println!("Foo{}", 5),
false => println!("Foo{}", 6),
};
} This is how the resulting assembly differs: --- before.s 2014-02-02 23:33:09.706317828 +0100
+++ after.s 2014-02-02 23:33:14.650273534 +0100
@@ -1,51 +1,40 @@
- .globl _ZN3foo19h81a28b9b9b1b3681ad4v0.0E
+ .globl _ZN3foo19h81ff6a854ff46f5cad4v0.0E
.align 16, 0x90
- .type _ZN3foo19h81a28b9b9b1b3681ad4v0.0E,@function
-_ZN3foo19h81a28b9b9b1b3681ad4v0.0E:
+ .type _ZN3foo19h81ff6a854ff46f5cad4v0.0E,@function
+_ZN3foo19h81ff6a854ff46f5cad4v0.0E:
.cfi_startproc
cmpq %fs:112, %rsp
ja .LBB0_2
- movabsq $120, %r10
+ movabsq $56, %r10
movabsq $0, %r11
callq __morestack
retq
.LBB0_2:
- subq $120, %rsp
+ subq $56, %rsp
.Ltmp1:
- .cfi_def_cfa_offset 128
+ .cfi_def_cfa_offset 64
movzbl %dil, %eax
cmpl $1, %eax
jne .LBB0_4
- movq $5, 112(%rsp)
- leaq _ZN3fmt11secret_show19h1cfae34b462b5821a24v0.0E(%rip), %rax
- movq %rax, 96(%rsp)
- leaq 112(%rsp), %rax
- movq %rax, 104(%rsp)
- leaq _ZN3foo15__STATIC_FMTSTR19h1e2ce5d40ae038c7ah4v0.0E(%rip), %rax
- movq %rax, 64(%rsp)
- movq $2, 72(%rsp)
- leaq 96(%rsp), %rax
- movq %rax, 80(%rsp)
- movq $1, 88(%rsp)
- leaq 64(%rsp), %rdi
+ movq $5, 48(%rsp)
jmp .LBB0_5
.LBB0_4:
- movq $6, 56(%rsp)
- leaq _ZN3fmt11secret_show19h1cfae34b462b5821a24v0.0E(%rip), %rax
- movq %rax, 40(%rsp)
- leaq 56(%rsp), %rax
- movq %rax, 48(%rsp)
- leaq _ZN3foo15__STATIC_FMTSTR19h1e2ce5d40ae038c7ah4v0.0E(%rip), %rax
- movq %rax, 8(%rsp)
- movq $2, 16(%rsp)
- leaq 40(%rsp), %rax
- movq %rax, 24(%rsp)
- movq $1, 32(%rsp)
- leaq 8(%rsp), %rdi
+ movq $6, 48(%rsp)
.LBB0_5:
+ leaq _ZN3fmt11secret_show19h49d00ba23b26f207a24v0.0E(%rip), %rax
+ movq %rax, 32(%rsp)
+ leaq 48(%rsp), %rax
+ movq %rax, 40(%rsp)
+ leaq _ZN3foo15__STATIC_FMTSTR19h96bb7cc571f17624ah4v0.0E(%rip), %rax
+ movq %rax, (%rsp)
+ movq $2, 8(%rsp)
+ leaq 32(%rsp), %rax
+ movq %rax, 16(%rsp)
+ movq $1, 24(%rsp)
+ leaq (%rsp), %rdi
callq _ZN2io5stdio12println_args19h7932d545acb66ab6ak9v0.10.preE@PLT
- addq $120, %rsp
+ addq $56, %rsp
retq
.Ltmp2:
- .size _ZN3foo19h81a28b9b9b1b3681ad4v0.0E, .Ltmp2-_ZN3foo19h81a28b9b9b1b3681ad4v0.0E
+ .size _ZN3foo19h81ff6a854ff46f5cad4v0.0E, .Ltmp2-_ZN3foo19h81ff6a854ff46f5cad4v0.0E
.cfi_endproc |
Neat results! Since this has a non-trivial expense I wonder if we might put it behind an experimental optimizations flag. |
@@ -80,6 +80,8 @@ pub fn get_simple_intrinsic(ccx: @CrateContext, item: &ast::ForeignItem) -> Opti | |||
"bswap16" => Some(ccx.intrinsics.get_copy(&("llvm.bswap.i16"))), | |||
"bswap32" => Some(ccx.intrinsics.get_copy(&("llvm.bswap.i32"))), | |||
"bswap64" => Some(ccx.intrinsics.get_copy(&("llvm.bswap.i64"))), | |||
"lifetime_start" => Some(ccx.intrinsics.get_copy(&("llvm.lifetime.start"))), | |||
"lifetime_end" => Some(ccx.intrinsics.get_copy(&("llvm.lifetime.end"))), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think you'd need to modify this location, these are the intrinsics we expose under extern "rust-intrinsic" { ... }
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, right. I initially thought about exposing the intrinsics, then decided against it and forgot to clean up. Thanks!
I really like the result, though I find the side effects on compiler performance unfortunate. |
Maybe we can do it at --opt-level=3? |
This needs a lot more work. Array allocas are currently wrong (size only accounts for a single argument), and (for example) allocas for function arguments (which probably account for a lot of stack usage) don't get a proper marker for the end of their lifetime, yet. I don't expect to get enough time during the week due to $DAYJOB, but I'll revisit this during the weekend. |
Closing due to inactivity, but I'd love to see this as an optimization pass! |
New lints `iter_filter_is_some` and `iter_filter_is_ok` Adds a pair of lints that check for cases of an iterator over `Result` and `Option` followed by `filter` without being followed by `map` as that is covered already by a different, specialized lint. Fixes #11843 PS, I also made some minor documentations fixes in a case where a double tick (`) was included. --- * changelog: New Lint: [`iter_filter_is_some`] [#12004](rust-lang/rust#12004) * changelog: New Lint: [`iter_filter_is_ok`] [#12004](rust-lang/rust#12004)
New lints `iter_filter_is_some` and `iter_filter_is_ok` Adds a pair of lints that check for cases of an iterator over `Result` and `Option` followed by `filter` without being followed by `map` as that is covered already by a different, specialized lint. Fixes #11843 PS, I also made some minor documentations fixes in a case where a double tick (`) was included. --- changelog: New Lint: [`iter_filter_is_some`] [#12004](rust-lang/rust#12004) changelog: New Lint: [`iter_filter_is_ok`] [#12004](rust-lang/rust#12004)
Issue: in rust-lang#12004, we emit a lint for `filter(Option::is_some)`. If the parent expression is a `.map` we don't emit that lint as there exists a more specialized lint for that. The ICE introduced in rust-lang#12004 is a consequence of the assumption that a parent expression after a filter would be a method call with the filter call being the receiver. However, it is entirely possible to have a closure of the form ``` || { vec![Some(1), None].into_iter().filter(Option::is_some) } ``` The previous implementation looked at the parent expression; namely the closure, and tried to check the parameters by indexing [0] on an empty list. This commit is an overhaul of the lint with significantly more FP tests and checks. Impl details: 1. We verify that the filter method we are in is a proper trait method to avoid FPs. 2. We check that the parent expression is not a map by checking whether it exists; if is a trait method; and then a method call. 3. We check that we don't have comments in the span. 4. We verify that we are in an Iterator of Option and Result. 5. We check the contents of the filter. 1. For closures we peel it. If it is not a single expression, we don't lint. 2. For paths, we do a typecheck to avoid FPs for types that impl functions with the same names. 3. For calls, we verify the type, via the path, and that the param of the closure is the single argument to the call. 4. For method calls we verify that the receiver is the parameter of the closure. Since we handle single, non-block exprs, the parameter can't be shadowed, so no FP. This commit also adds additional FP tests.
Fixed ICE introduced in rust-lang#12004 Issue: in rust-lang/rust-clippy#12004, we emit a lint for `filter(Option::is_some)`. If the parent expression is a `.map` we don't emit that lint as there exists a more specialized lint for that. The ICE introduced in rust-lang/rust-clippy#12004 is a consequence of the assumption that a parent expression after a filter would be a method call with the filter call being the receiver. However, it is entirely possible to have a closure of the form ``` || { vec![Some(1), None].into_iter().filter(Option::is_some) } ``` The previous implementation looked at the parent expression; namely the closure, and tried to check the parameters by indexing [0] on an empty list. This commit is an overhaul of the lint with significantly more FP tests and checks. Impl details: 1. We verify that the filter method we are in is a proper trait method to avoid FPs. 2. We check that the parent expression is not a map by checking whether it exists; if is a trait method; and then a method call. 3. We check that we don't have comments in the span. 4. We verify that we are in an Iterator of Option and Result. 5. We check the contents of the filter. 1. For closures we peel it. If it is not a single expression, we don't lint. We then try again by checking the peeled expression. 2. For paths, we do a typecheck to avoid FPs for types that impl functions with the same names. 3. For calls, we verify the type, via the path, and that the param of the closure is the single argument to the call. 4. For method calls we verify that the receiver is the parameter of the closure. Since we handle single, non-block exprs, the parameter can't be shadowed, so no FP. This commit also adds additional FP tests. Fixes: rust-lang#12058 Adding `@xFrednet` as you've the most context for this as you reviewed it last time. `@rustbot` r? `@xFrednet` --- changelog: none (Will be backported and therefore don't effect stable)
Currently, stack allocated variables live for the whole duration of the
function they're allocated in. This means that LLVM can't put two
allocas into the same stack space, even if they're never used at the
same time.
By using the LLVM intrinsics to declare the lifetime of allocas, we
allow LLVM to use the same stack space for allocas that have
non-overlapping lifetimes, reducing stack usage. In certain situations
this even allows LLVM to remove duplicated code that previously only
differed in which part of the stack it used.