You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I know it's been discussed that we wanted to keep the stack trace the same, but if gc only does self recursion call optimisation the stack would roughly stay the same, the only thing that would change is the number of frames for the recursive function call.
The go team was very clear that they wanted to keep the stack traces clean and accurate, complete TCO would completely break that. So that's why self-recursino only makes sense, because 1000 000 call to Foo doesn't really help anybody anyway
It's not really a dup, I was asking there for all TCO, which there was indeed good reason not to implement. I've looked and other language will at least have self-recursion or even cyclic recursion TCOed. It could make sense for Go. I get that the whole Go2 is basically shutting this conversation down and that's fine.
It does help a lot when the args differ from frame to frame.
If you write a loop you're not getting that frame by frame information back anyway (from the stack trace)
If you write a loop you're not getting that frame by frame information back anyway (from the stack trace)
If tail recursion is rewritten as a loop there are 2 cases: stackless (eg. n!) or the loop keeps explicit stack (of something, eg. naive fib), replacing the (CPU) frames. So in either case the debug info about the state is at most one fmt.Printf away.
What I wanted to point out is that in a language w/o TCO, one can always rewrite the code to not depend on TCO.
This change alone is enough to make me want to use go (again) instead of erlang for many things. Do it and I will love go forever. Do it do it do it. Really. Do it.
Was going to comment (related to unsafe, elsewhere) that removing recursion is exactly the sort of error-prone hand-optimization that can also be used to hide introduced vulnerabilities. self-TCO ought to be something that a programmer can at least request -- after all, if it's done by hand, those stack frames are just as gone, and debugging is just as impeded, but it also has the chance of human error added.
Note that one of the late-caught bugs in the new SSA back-end was a mistake in the hand-de-recursing of a clever recursive algorithm. The hand-optimized code usually worked.
Many functional programming languages have excellent support for (general) TCO and it is not an impediment at all for debugging. Code compiled in debug mode can still be instrumented to keep count of self-calls and informations about lost stackframes. In functional languages like Haskell, purity from side-effects helps debugging since only the initial values are needed to know the state of every subsequent call; in Go it's not guaranteed so you lose intermediate stackframes and there's no way to recover them, but since most uses for recursive calls are in algorithms, which I hope are implemented in a mathematically sound manner, the issue is greatly reduced. Everything is moving in this direction, and Go should not be left behind.
Also, as dr2chase noted, I often find myself rewriting beautifully recursive code from their mathematical description to ugly hackish iterative style and it is not easy at all to prove they are completely equivalent until your application in production decides to run an infinite loop because the terminating condition is not met with certain inputs.
@ianlancetaylor I can see why the compiler hint would not be the best idea. Thanks for that info!
Adding a become () statement sounds good to me.
I can't find a current proposal for adding a become () statement. Perhaps, we could add a proposal for that?
TCO is necessity for people interested in writing functional programming (FP) style code, that is by its nature recursive. This might be good indicator of how much interest exists for writing FP style code.
Tail-call is not a complete slam-dunk with the existing calling conventions.
Suppose main() calls f(x) tail-calls g(x,y).
The current calling convention has main allocating a single, fixed-size frame large enough for all main's locals and large enough to pass all the parameters to all the things called by main.
In this cases, f, which takes a single parameter. F cannot tail-call g within the parameter list supplied by main because it is simply not large enough; it could extend main's frame before calling g, but after f-really-g returns to main, main expects its original frame.
It is not as simple as you might hope to "just change the calling convention" because Go runs on small stacks and checks their bounds at each stack-growth; we'd need to be sure that we got the growth and check handshakes right. There are also stack maps to consider, which I believe are currently referenced to SP -- which will change, if we modify frame sizes to accommodate changing numbers parameters.
@dr2chase What if we were to limit TCO for only functions that accept a single argument? We could use currying to translate the evaluation of a function that takes multiple arguments into a sequence of single argument functions, right?
F cannot tail-call g within the parameter list supplied by main because it is simply not large enough; it could extend main's frame before calling g, but after f-really-g returns to main, main expects its original frame.
It could leave another frame on the stack if necessary, of course, and you could use the return address to detect it. (The returned-to blocks for “tail call” frames could all have the same return address, since all they need to do is pop the frame and update the frame pointer. The size of the tail-call frame could be stored at a known offset.)
The first “tail call” in a chain of calls wouldn't be able to reuse the caller's argument space, but subsequent calls would — that is, we would still have the desired O(1) stack usage for a chain of N tail calls.
Suppose we have f that calls g that then tail calls h. Just before the tail call, we have:
+---------+
| |
| f |
| |
+---------+
|rets of g|
+---------+
|args of g|
+---------+
| |
| g |
| |
+---------+
So how do we modify the stack to have h at the bottom instead? I see two problems:
There may not be enough room to put arguments to h, as it may have larger arguments than g.
h doesn't put its results in the right place (where f expects them), if its argument size is different than g's.
Any stub would have to solve both of those problems. So we might introduce a glue routine (invisible to users):
+---------+
| |
| f |
| |
+---------+
|rets of g|
+---------+
|args of g|
+---------+
| glue |
+---------+
|rets of h|
+---------+
|args of h|
+---------+
| |
| h |
| |
+---------+
That gives us a place to put the arguments to h, and the glue code can copy back the values that h returned to the appropriate place where f expects them.
It all works, but now a sequence of tail calls grows the stack indefinitely. The question becomes: how to avoid using a glue routine most of the time? Especially in mutual co-tail-calls, I don't see how to do it. If g tail calls h, which tail calls g, which tail calls h, etc, and g and h have different arg sizes, you're going to need the glue routine on at least one of the two tail calls.
We could compile specialized versions of every function that might be tail called, taking arguments by pointer instead of directly on the stack (or some other scheme, like extra arg/return offsets). But that's a lot of extra code, especially because any tail call of a function variable means almost every function might need one of these specialized versions.
The question becomes: how to avoid using a glue routine most of the time?
That's exactly what I'm talking about: if we're about to make a tail call, we check whether the return address for the current frame is the glue handler itself. If it is, we can wipe out the stack up to and including the glue frame and replace it with a new glue frame appropriate to the function we're about to call.
If the return address is not the glue handler, then we need to allocate one and the stack will grow. But only the first argument-growing call in a chain of mutual tail-calls should lack such a frame.
I see, you want to introspect one frame up the stack to see if you can redo the glue frame.
That might work. A few complications:
All frames in Go are constant sized, so we might need a few of them, power-of-two sized like we do for reflect.call. Or maybe it can be variable-sized, with the size recorded just above the return value area?
The glue routine is now polymorphic, so it needs to be dynamically typed. The glue routine can have a few fields where it can record GC info, and the stack scanning & stack copying needs to recognize the glue routine and use its recorded GC info.
If we go for a variable-sized glue frame (instead of the powers-of-two set of fixed-frame glues), then the glue code itself needs to figure out how big its own frame is. We have frame pointers on x86 at least, so maybe that works. Not sure if we have them on all architectures. If not, we'd need to modify the calling convention to leave space that the glue frame can use (similar to how we store the link register on some architectures).
Ok, you've convinced me it is at least plausible. Certainly not easy.
I think that the tail call optimization is only useful when the programmer can reliably know that the tail call will occur. Conversely I don't think we should do tail calls unpredictably, because of the effect on stack traces. I'm going to close this in favor of #22624 which suggests one possible mechanism for deciding whether to do a tail call or not.
Activity
docmerlin commentedon Aug 19, 2016
While I would prefer full-on TCO, this would be a good start!
ghost commentedon Aug 19, 2016
The go team was very clear that they wanted to keep the stack traces clean and accurate, complete TCO would completely break that. So that's why self-recursino only makes sense, because 1000 000 call to Foo doesn't really help anybody anyway
bradfitz commentedon Aug 19, 2016
See also dup closed bug #15304
cznic commentedon Aug 19, 2016
It does help a lot when the args differ from frame to frame.
cznic commentedon Aug 19, 2016
Also, any tail self recursion can, and most of the time should, be rewritten as a loop.
ghost commentedon Aug 19, 2016
It's not really a dup, I was asking there for all TCO, which there was indeed good reason not to implement. I've looked and other language will at least have self-recursion or even cyclic recursion TCOed. It could make sense for Go. I get that the whole Go2 is basically shutting this conversation down and that's fine.
If you write a loop you're not getting that frame by frame information back anyway (from the stack trace)
cznic commentedon Aug 19, 2016
If tail recursion is rewritten as a loop there are 2 cases: stackless (eg. n!) or the loop keeps explicit stack (of something, eg. naive fib), replacing the (CPU) frames. So in either case the debug info about the state is at most one
fmt.Printf
away.What I wanted to point out is that in a language w/o TCO, one can always rewrite the code to not depend on TCO.
ghost commentedon Aug 19, 2016
I 100% agree, but some algorithms are so much cleaner when written recursively
sovietspaceship2 commentedon Aug 19, 2016
This change alone is enough to make me want to use go (again) instead of erlang for many things. Do it and I will love go forever. Do it do it do it. Really. Do it.
dr2chase commentedon Aug 19, 2016
Was going to comment (related to unsafe, elsewhere) that removing recursion is exactly the sort of error-prone hand-optimization that can also be used to hide introduced vulnerabilities. self-TCO ought to be something that a programmer can at least request -- after all, if it's done by hand, those stack frames are just as gone, and debugging is just as impeded, but it also has the chance of human error added.
Note that one of the late-caught bugs in the new SSA back-end was a mistake in the hand-de-recursing of a clever recursive algorithm. The hand-optimized code usually worked.
sovietspaceship2 commentedon Aug 19, 2016
Many functional programming languages have excellent support for (general) TCO and it is not an impediment at all for debugging. Code compiled in debug mode can still be instrumented to keep count of self-calls and informations about lost stackframes. In functional languages like Haskell, purity from side-effects helps debugging since only the initial values are needed to know the state of every subsequent call; in Go it's not guaranteed so you lose intermediate stackframes and there's no way to recover them, but since most uses for recursive calls are in algorithms, which I hope are implemented in a mathematically sound manner, the issue is greatly reduced. Everything is moving in this direction, and Go should not be left behind.
Also, as dr2chase noted, I often find myself rewriting beautifully recursive code from their mathematical description to ugly hackish iterative style and it is not easy at all to prove they are completely equivalent until your application in production decides to run an infinite loop because the terminating condition is not met with certain inputs.
Do it.
7 remaining items
l3x commentedon Nov 7, 2017
@ianlancetaylor I can see why the compiler hint would not be the best idea. Thanks for that info!
Adding a
become ()
statement sounds good to me.I can't find a current proposal for adding a
become ()
statement. Perhaps, we could add a proposal for that?TCO is necessity for people interested in writing functional programming (FP) style code, that is by its nature recursive. This might be good indicator of how much interest exists for writing FP style code.
ianlancetaylor commentedon Nov 7, 2017
Filed issue #22624 to record the idea for Go 2.
dr2chase commentedon Nov 7, 2017
Tail-call is not a complete slam-dunk with the existing calling conventions.
Suppose main() calls f(x) tail-calls g(x,y).
The current calling convention has main allocating a single, fixed-size frame large enough for all main's locals and large enough to pass all the parameters to all the things called by main.
In this cases, f, which takes a single parameter. F cannot tail-call g within the parameter list supplied by main because it is simply not large enough; it could extend main's frame before calling g, but after f-really-g returns to main, main expects its original frame.
It is not as simple as you might hope to "just change the calling convention" because Go runs on small stacks and checks their bounds at each stack-growth; we'd need to be sure that we got the growth and check handshakes right. There are also stack maps to consider, which I believe are currently referenced to SP -- which will change, if we modify frame sizes to accommodate changing numbers parameters.
l3x commentedon Nov 8, 2017
@dr2chase What if we were to limit TCO for only functions that accept a single argument? We could use currying to translate the evaluation of a function that takes multiple arguments into a sequence of single argument functions, right?
randall77 commentedon Nov 8, 2017
@t3x The same problem occurs for single arguments, as that argument can be of arbitrary size.
bcmills commentedon Nov 8, 2017
It could leave another frame on the stack if necessary, of course, and you could use the return address to detect it. (The returned-to blocks for “tail call” frames could all have the same return address, since all they need to do is pop the frame and update the frame pointer. The size of the tail-call frame could be stored at a known offset.)
The first “tail call” in a chain of calls wouldn't be able to reuse the caller's argument space, but subsequent calls would — that is, we would still have the desired O(1) stack usage for a chain of N tail calls.
dr2chase commentedon Nov 8, 2017
We'd need to get the return values sent to the right place, too, but this is an interesting approach.
randall77 commentedon Nov 8, 2017
@bcmills Kind of, but it's not that simple.
Suppose we have
f
that callsg
that then tail callsh
. Just before the tail call, we have:So how do we modify the stack to have
h
at the bottom instead? I see two problems:h
, as it may have larger arguments thang
.h
doesn't put its results in the right place (wheref
expects them), if its argument size is different thang
's.Any stub would have to solve both of those problems. So we might introduce a
glue
routine (invisible to users):That gives us a place to put the arguments to
h
, and the glue code can copy back the values thath
returned to the appropriate place wheref
expects them.It all works, but now a sequence of tail calls grows the stack indefinitely. The question becomes: how to avoid using a glue routine most of the time? Especially in mutual co-tail-calls, I don't see how to do it. If
g
tail callsh
, which tail callsg
, which tail callsh
, etc, andg
andh
have different arg sizes, you're going to need the glue routine on at least one of the two tail calls.We could compile specialized versions of every function that might be tail called, taking arguments by pointer instead of directly on the stack (or some other scheme, like extra arg/return offsets). But that's a lot of extra code, especially because any tail call of a function variable means almost every function might need one of these specialized versions.
bcmills commentedon Nov 8, 2017
That's exactly what I'm talking about: if we're about to make a tail call, we check whether the return address for the current frame is the
glue
handler itself. If it is, we can wipe out the stack up to and including theglue
frame and replace it with a newglue
frame appropriate to the function we're about to call.If the return address is not the
glue
handler, then we need to allocate one and the stack will grow. But only the first argument-growing call in a chain of mutual tail-calls should lack such a frame.randall77 commentedon Nov 8, 2017
I see, you want to introspect one frame up the stack to see if you can redo the glue frame.
That might work. A few complications:
Ok, you've convinced me it is at least plausible. Certainly not easy.
ianlancetaylor commentedon Jan 16, 2018
I think that the tail call optimization is only useful when the programmer can reliably know that the tail call will occur. Conversely I don't think we should do tail calls unpredictably, because of the effect on stack traces. I'm going to close this in favor of #22624 which suggests one possible mechanism for deciding whether to do a tail call or not.