Description
TL/DR
Make panic
stop "bubbling up" in the call stack when it finds a function with an error
as last result type and return the panic content as that error.
This will reduce a lot of crashes due to "nil-pointer dereferences" (among others), making the application more stable and suitable for long running tasks (like servers).
Disclaimer
The entire proposal is based on my experience and point of view. The main goal is to validate it against the Go community to see if it makes sense and to check if other people share the concerns this proposal try to address.
Motivation of this proposal
To avoid (or reduce to the minimum) unexpected application crashes.
In Go, it is pretty easy to crash the entire application by an implicit panic
(like a nil pointer dereference). It is even easier than in other languages because of the implicit dereferencing of pointers-to-structs variables.
This adds a feeling of insecurity when writing code, forcing the developer to be extra cautious and add checks in the code that increase verbosity and reduce readability.
Hopefully, this proposal will resolve that.
Proposal
When a panic
occurs, it starts "bubbling up" until it finds a function that contains a defer
with a recover
. If it is not found, the application crashes.
This crash is (almost?) always not the desired result (from my experience), as you can't recover from it. The only way is to restart the application.
To avoid crashing, if you know your function can panic, you can add the "defer-recover" construct.
However, it is not always evident whether the function you are calling can panic (even if you look at the source code), so it is really easy to forget adding the "defer-recover" construct.
This is especially true with "nil pointer dereference" errors, as they are hidden in apparantly good code and can be dormant for a long time until they appear. This can happen with any Go "pointer type": pointers, functions, maps, channel, and interfaces (not with slices).
For example:
func RefreshProduct(currentProduct *Product, stockIncrementer func(*Product) int) (*Product, error) {
if currentProduct.ID == "" {
return nil, errors.New("Product does not exists. ProductID: " + currentProduct.ID)
}
return &Product{
ID: currentProduct.ID,
Stock: stockIncrementer(currentProduct),
}, nil
}
This code could panic and make the application crash. You can follow two approaches to avoid the crash:
- The safest: use named return values and the "defer-recover" construct:
func RefreshProduct(currentProduct *Product, stockIncrementer func(*Product) int) (newProduct *Product, err error) {
defer func() {
if recoveredPanic := recover(); recoveredPanic != nil {
err = fmt.Errorf("something bad happened: %v", recoveredPanic)
}
}()
if currentProduct.ID == "" {
return nil, errors.New("Product does not exists. ProductID: " + currentProduct.ID)
}
return &Product{
ID: currentProduct.ID,
Stock: stockIncrementer(currentProduct),
}, nil
}
- The "I-know-my-panics" way: check everything that can panic:
func RefreshProduct(currentProduct *Product, stockIncrementer func(*Product) int) (newProduct *Product, err error) {
if currentProduct == nil {
return nil, errors.New("currentProduct is nil")
}
if stockIncrement == nil {
return nil, errors.New("stockIncrement is nil")
}
if currentProduct.ID == "" {
return nil, errors.New("Product does not exists. ProductID: " + currentProduct.ID)
}
return &Product{
ID: currentProduct.ID,
Stock: stockIncrementer(currentProduct),
}, nil
}
This is just a very simple example. Things get worse when you are working with interfaces holding pointers.
This proposal's goal is to be able to get the safe version of the code but without any change in the logic (as long as the function already has an error
return type)
How? With the following new rules
Rule 1: Whenever there is a panic
(implicit or explicit), follow the current "bubbling-up" rules plus the addition of rule number 2.
Rule 2: When a function with an error
as last return type is found, stop panicking and return the error with the contents that would have returned recover()
. The rest of the return values are filled with the zero value as expected.
With this addition, when we see a function like this (for example):
func storeTo(store io.Writer, contents io.Reader) error {
// ...
}
We are completely sure that the only way that function can finish is by returning nil
or an error
. No unexpected panics
are leaked.
Implementation details
I don't have enough knowledge about the Go source code to provide implementation details.
My guesses are that implementing this proposal should not be too hard. It could be summarized as the following:
Whenever a function with an
error
type as the last result type is found, add the following implicitdefer
:
defer func() {
if recoveredPanic := recover(); recoveredPanic != nil {
err = fmt.Errorf("something bad happened: %v", recoveredPanic)
}
}()
I guess there could be complications around the err
variable, as it should be the named error
return value.
Surely there are better ways to implement this, but the goal now is to know the opinion of the community to first verify if the proposal makes sense.
Backward compatibility
I would say that, for 99% of the cases, this change will be backward compatible.
That 1% of the cases are those programs that are using the "panic-defer-recover" construct for doing actual logic control flow.
Of course, those numbers are totally made up. Maybe there are more cases, but I prefer first to focus on validating the idea with the community first,
FAQ
1.- Would this only work for "nil-pointer dereference" panics
?
No, it would work for all kind of panics, like index out of bounds
panics.
2.- What to do when the user explicitly panics?
This will work for both implicit and explicit panics. It is true that when your function can return an error, doing:
panic("bad thing")
would behave in the same way as:
return errors.New("bad thing")
But the question here is, if your function can already return an error, Why would you panic instead of returning the error?
I'm really asking. I don't know if there is a case where it is preferable to panic than to return an error
3.- What happens if there is a defer that calls recover
and the function has an error return type too?
It would behave in the same way as today: the defer will be executed, the panic will be recovered, and the function will return as if no panic happened.
The "conversion" from panic
to an error only happens if the panics "leaks" the function.
4.- What if no function with an error result type is found?
The application crashes 😞 . But this is what should happen, as no "error recovery code" was found in the whole call stack, so the only thing the application can do is to kill itself.
Activity
bcmills commentedon Jul 1, 2019
A
panic
generally indicates a programmer error that should be detected during testing and code review.Turning that
panic
into a run-time error makes the panic, reducing the likelihood that tests will reveal it: tests that panic fail, whereas tests that provoke an error-return within some (possibly deeply nested) internal function call may or may not surface that error in any way that is visible to thetesting
package.Moreover, unlike other run-time errors, a captured nil-panic won't include additional details that may be helpful in diagnosing the bug. And there are some cases where a function with an
error
in the return signature is documented to always benil
(#20803 (comment)), so call sites for those cases would in many cases silently swallow panics — and, especially in such cases, can result in unexpectedly corrupted data (see also #28150).I would certainly love to see improvements that address the root causes of nil-panics in Go, but not at the expense of detecting actual bugs. Instead, I would rather see improvements to fuzzing (#19109), more nil-safety in the type system (#28133 and others), and/or static analysis tools.
ianlancetaylor commentedon Jul 1, 2019
Apart from whether this change is a good idea or not, I don't think we can make this kind of change in the Go language today. It would cause too many existing, correct, programs to silently break in cases where they occasionally, but rarely, panic.
alvaroloes commentedon Jul 2, 2019
@bcmills
Fair enough. To be honest, I agree with you about addressing the root cause.
This proposal was the end result of a mental process to try to come up with an idea to do some changes in the type system to make it impossible to access the content of a type that can be nil (like "Optional"s in other languages).
You can see some tries in the following comments (and replies):
After the discussion, my feeling was that there is a lot of resistance to doing changes (like the one discussed in the first comments about the zero value), so I tried to come with a "compromise solution" to "nil-pointer dereferences" errors to reduce crashes.
But, yeah, I'd rather prefer to see a solution where the compiler ensures there won't be any nil-pointer dereference.
In any case, I would say that nil-safety should be addressed sooner rather than later. I understand that testing and code-reviews can catch nil errors, but they are not foolproof (unlike a compiler-based solution). If they were, there wouldn't be nil errors in production code, and there are.
I'll keep giving this a thought to see if we can get to a good solution 🙂
@ianlancetaylor
Thanks for your comment. Could you please elaborate a bit? (I'm genuinely asking)
I understood some cases that Bryan wrote, but I'm having a hard time to understand why a program that could crash with a
panic
due to a nil-pointer dereference is more correct than a program that handles that error through the normal error-handling flow.Could you please put an example?
ianlancetaylor commentedon Jul 2, 2019
@alvaroloes Interesting, as I have the exact opposite intuition. I don't think I can show a useful small example, as I am explicitly talking about cases that happen rarely. In case it wasn't clear, I'm talking about large existing programs that were written before this change to the language. For some of those programs, a
nil
pointer dereference will indicate that some data structure was built incorrectly. Those programs naturally have many functions that return anerror
value. If anil
pointer panic converts into an error return, then some caller will see anil
pointer dereference turn into a normal error. But this may be a case where errors are expected, say because it is contacting some other server over a network. The program will log the error and continue. But now it is working with an invalid data structure, and some operation on the data structure was interrupted midstream and so that overall data structure may now be completely corrupt. Before this change the program would have crashed, permitting it to be analyzed and restarted. After this change, the program's behavior will silently change with no warning. It will now continue running with a corrupt data structure, leading to unpredictable future chaos.alvaroloes commentedon Jul 3, 2019
Thank you, Ian, for the explanation. I still don't see it clear why an application would continue normally after an error has occurred (and just log it).
In any case, I prefer to close this issue and dedicate the time and effort to another one that fixes the root cause of nil panic.
Thank you all for your time and energy for reading and commenting on this issue!
lootch commentedon Jul 3, 2019
alvaroloes commentedon Jul 3, 2019
I agree with you @lootch, and I understand that better or worse code can be written. For example, one could use panic-defer-recover as logic control flow and create a correct program that does its job.
The real deal of all changes to the language that are not fully backward compatible is that the ratio "value/(downsides * affected applications)" must be pretty high. It is a bit difficult to demonstrate that for this proposal, so the justification only comes from subjective opinions (which is not enough).
As I said, I'll keep thinking of another better proposal. Thanks for your comment.
[thinking out-loud]
[/thinking out-loud]
lootch commentedon Jul 3, 2019
ianlancetaylor commentedon Jul 3, 2019
@alvaroloes We can aim for a better paradigm, if we believe it is better, but my point is that we can't silently change the behavior of existing code. We can introduce new mechanisms, we can remove old mechanisms, but we can't silently change old mechanisms to new ones. There is too much existing code for that.
alvaroloes commentedon Jul 4, 2019
Thank you both for the comments.
Gotcha. Let's keep trying. I'll keep spending a bit of my spare time to find some "mechanism" (with a positive value/downside ratio) to avoid those implicit panics.
4 remaining items