proposal: cmd/compile: add tail call optimization for self-recursion only #16798

New issue

Closed

proposal: cmd/compile: add tail call optimization for self-recursion only#16798

Labels

FrozenDueToAgeLanguageChangeProposalv2

Milestone

Unplanned

ghost

I know it's been discussed that we wanted to keep the stack trace the same, but if gc only does self recursion call optimisation the stack would roughly stay the same, the only thing that would change is the number of frames for the recursive function call.

docmerlin

While I would prefer full-on TCO, this would be a good start!

ghost

The go team was very clear that they wanted to keep the stack traces clean and accurate, complete TCO would completely break that. So that's why self-recursino only makes sense, because 1000 000 call to Foo doesn't really help anybody anyway

added

Contributor

See also dup closed bug #15304

bradfitz

added this to the Unplanned milestone

on Aug 19, 2016

cznic

Contributor

because 1000 000 call to Foo doesn't really help anybody anyway

It does help a lot when the args differ from frame to frame.

cznic

Contributor

Also, any tail self recursion can, and most of the time should, be rewritten as a loop.

ghost

It's not really a dup, I was asking there for all TCO, which there was indeed good reason not to implement. I've looked and other language will at least have self-recursion or even cyclic recursion TCOed. It could make sense for Go. I get that the whole Go2 is basically shutting this conversation down and that's fine.

It does help a lot when the args differ from frame to frame.

If you write a loop you're not getting that frame by frame information back anyway (from the stack trace)

cznic

Contributor

If you write a loop you're not getting that frame by frame information back anyway (from the stack trace)

If tail recursion is rewritten as a loop there are 2 cases: stackless (eg. n!) or the loop keeps explicit stack (of something, eg. naive fib), replacing the (CPU) frames. So in either case the debug info about the state is at most one fmt.Printf away.

What I wanted to point out is that in a language w/o TCO, one can always rewrite the code to not depend on TCO.

ghost

What I wanted to point out is that in a language w/o TCO, one can always rewrite the code to not depend on TCO.

I 100% agree, but some algorithms are so much cleaner when written recursively

sovietspaceship2

This change alone is enough to make me want to use go (again) instead of erlang for many things. Do it and I will love go forever. Do it do it do it. Really. Do it.

dr2chase

Contributor

Was going to comment (related to unsafe, elsewhere) that removing recursion is exactly the sort of error-prone hand-optimization that can also be used to hide introduced vulnerabilities. self-TCO ought to be something that a programmer can at least request -- after all, if it's done by hand, those stack frames are just as gone, and debugging is just as impeded, but it also has the chance of human error added.

Note that one of the late-caught bugs in the new SSA back-end was a mistake in the hand-de-recursing of a clever recursive algorithm. The hand-optimized code usually worked.

sovietspaceship2

Many functional programming languages have excellent support for (general) TCO and it is not an impediment at all for debugging. Code compiled in debug mode can still be instrumented to keep count of self-calls and informations about lost stackframes. In functional languages like Haskell, purity from side-effects helps debugging since only the initial values are needed to know the state of every subsequent call; in Go it's not guaranteed so you lose intermediate stackframes and there's no way to recover them, but since most uses for recursive calls are in algorithms, which I hope are implemented in a mathematically sound manner, the issue is greatly reduced. Everything is moving in this direction, and Go should not be left behind.

Also, as dr2chase noted, I often find myself rewriting beautifully recursive code from their mathematical description to ugly hackish iterative style and it is not easy at all to prove they are completely equivalent until your application in production decides to run an infinite loop because the terminating condition is not met with certain inputs.

Do it.

7 remaining items

l3x

@ianlancetaylor I can see why the compiler hint would not be the best idea. Thanks for that info!

Adding a become () statement sounds good to me.

I can't find a current proposal for adding a become () statement. Perhaps, we could add a proposal for that?

TCO is necessity for people interested in writing functional programming (FP) style code, that is by its nature recursive. This might be good indicator of how much interest exists for writing FP style code.

ianlancetaylor

mentioned this

on Nov 7, 2017

proposal: Go 2: add become statement to support tail calls #22624

ianlancetaylor

Contributor

Filed issue #22624 to record the idea for Go 2.

dr2chase

Contributor

Tail-call is not a complete slam-dunk with the existing calling conventions.
Suppose main() calls f(x) tail-calls g(x,y).
The current calling convention has main allocating a single, fixed-size frame large enough for all main's locals and large enough to pass all the parameters to all the things called by main.

In this cases, f, which takes a single parameter. F cannot tail-call g within the parameter list supplied by main because it is simply not large enough; it could extend main's frame before calling g, but after f-really-g returns to main, main expects its original frame.

It is not as simple as you might hope to "just change the calling convention" because Go runs on small stacks and checks their bounds at each stack-growth; we'd need to be sure that we got the growth and check handshakes right. There are also stack maps to consider, which I believe are currently referenced to SP -- which will change, if we modify frame sizes to accommodate changing numbers parameters.

l3x

@dr2chase What if we were to limit TCO for only functions that accept a single argument? We could use currying to translate the evaluation of a function that takes multiple arguments into a sequence of single argument functions, right?

randall77

Contributor

@t3x The same problem occurs for single arguments, as that argument can be of arbitrary size.

bcmills

Contributor

F cannot tail-call g within the parameter list supplied by main because it is simply not large enough; it could extend main's frame before calling g, but after f-really-g returns to main, main expects its original frame.

It could leave another frame on the stack if necessary, of course, and you could use the return address to detect it. (The returned-to blocks for “tail call” frames could all have the same return address, since all they need to do is pop the frame and update the frame pointer. The size of the tail-call frame could be stored at a known offset.)

The first “tail call” in a chain of calls wouldn't be able to reuse the caller's argument space, but subsequent calls would — that is, we would still have the desired O(1) stack usage for a chain of N tail calls.

dr2chase

Contributor

We'd need to get the return values sent to the right place, too, but this is an interesting approach.

randall77

Contributor

@bcmills Kind of, but it's not that simple.

Suppose we have f that calls g that then tail calls h. Just before the tail call, we have:

+---------+
|         |
|    f    |
|         |
+---------+
|rets of g|
+---------+
|args of g|
+---------+
|         |
|    g    |
|         |
+---------+

So how do we modify the stack to have h at the bottom instead? I see two problems:

There may not be enough room to put arguments to h, as it may have larger arguments than g.
h doesn't put its results in the right place (where f expects them), if its argument size is different than g's.

Any stub would have to solve both of those problems. So we might introduce a glue routine (invisible to users):

+---------+
|         |
|    f    |
|         |
+---------+
|rets of g|
+---------+
|args of g|
+---------+
|  glue   |
+---------+
|rets of h|
+---------+
|args of h|
+---------+
|         |
|    h    |
|         |
+---------+

That gives us a place to put the arguments to h, and the glue code can copy back the values that h returned to the appropriate place where f expects them.

It all works, but now a sequence of tail calls grows the stack indefinitely. The question becomes: how to avoid using a glue routine most of the time? Especially in mutual co-tail-calls, I don't see how to do it. If g tail calls h, which tail calls g, which tail calls h, etc, and g and h have different arg sizes, you're going to need the glue routine on at least one of the two tail calls.

We could compile specialized versions of every function that might be tail called, taking arguments by pointer instead of directly on the stack (or some other scheme, like extra arg/return offsets). But that's a lot of extra code, especially because any tail call of a function variable means almost every function might need one of these specialized versions.

bcmills

Contributor

The question becomes: how to avoid using a glue routine most of the time?

That's exactly what I'm talking about: if we're about to make a tail call, we check whether the return address for the current frame is the glue handler itself. If it is, we can wipe out the stack up to and including the glue frame and replace it with a new glue frame appropriate to the function we're about to call.

If the return address is not the glue handler, then we need to allocate one and the stack will grow. But only the first argument-growing call in a chain of mutual tail-calls should lack such a frame.

randall77

Contributor

I see, you want to introspect one frame up the stack to see if you can redo the glue frame.
That might work. A few complications:

All frames in Go are constant sized, so we might need a few of them, power-of-two sized like we do for reflect.call. Or maybe it can be variable-sized, with the size recorded just above the return value area?
The glue routine is now polymorphic, so it needs to be dynamically typed. The glue routine can have a few fields where it can record GC info, and the stack scanning & stack copying needs to recognize the glue routine and use its recorded GC info.
If we go for a variable-sized glue frame (instead of the powers-of-two set of fixed-frame glues), then the glue code itself needs to figure out how big its own frame is. We have frame pointers on x86 at least, so maybe that works. Not sure if we have them on all architectures. If not, we'd need to modify the calling convention to leave space that the glue frame can use (similar to how we store the link register on some architectures).

Ok, you've convinced me it is at least plausible. Certainly not easy.

ianlancetaylor

Contributor

I think that the tail call optimization is only useful when the programmer can reliably know that the tail call will occur. Conversely I don't think we should do tail calls unpredictably, because of the effect on stack traces. I'm going to close this in favor of #22624 which suggests one possible mechanism for deciding whether to do a tail call or not.

ianlancetaylor

closed this as completed

on Jan 16, 2018

golang

locked and limited conversation to collaborators

added

to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Assignees

No one assigned

Labels

FrozenDueToAgeLanguageChangeProposalv2

Type

No type

Projects

No projects

Milestone

Unplanned
Due by December 31, 2099, 58% complete

Relationships

None yet

Development

No branches or pull requests

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!