Skip to content

returning a pointer to a local should be a compile error #5725

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
ghost opened this issue Jun 27, 2020 · 9 comments
Closed

returning a pointer to a local should be a compile error #5725

ghost opened this issue Jun 27, 2020 · 9 comments
Labels
proposal This issue suggests modifications. If it also has the "accepted" label then it is planned.
Milestone

Comments

@ghost
Copy link

ghost commented Jun 27, 2020

Maybe duplicate, but I couldn't find it anywhere.

Consider the following code:

const Point = struct {
    x: f32,
    y: f32,
};
// Using #1717 syntax -- this proposal does not depend on #1717
const init = fn (_x: f32, _y: f32) Point {
    var inst = Point { .x = _x, .y = _y };
    return inst;
};

No problems here, copy elision (should) put inst in the right place. Now consider this alternative:

const init = fn (_x: f32, _y: f32) *Point {
    var inst = Point { .x = _x, .y = _y };
    return &inst;
};

Someone who doesn't know all the details of RLS might do this. Problem is, frame is now explicitly stack-allocated, in a frame which is invalidated as soon as init returns. This is comptime-detectable undefined behaviour. Zig should not compile this code.

(There are reasons not to allow this at comptime either -- just look at the headache that is #5718 (comment). Also, breaking from blocks permits a similar problem.)

@ghost ghost mentioned this issue Jun 27, 2020
@Vexu Vexu added the enhancement Solving this issue will likely involve adding new logic or components to the codebase. label Jun 27, 2020
@Vexu Vexu added this to the 0.7.0 milestone Jun 27, 2020
@Vexu
Copy link
Member

Vexu commented Jun 27, 2020

There has already been some work done on this (see ir_analyze_instruction_return in src/ir.cpp).

@ghost
Copy link
Author

ghost commented Jun 27, 2020

Ah, I thought there might have been. I'll keep this open for explication's sake, though. Is the plan to disallow this at comptime as well?

@ghost ghost changed the title Proposal: Returning a pointer to a local should be a compile error Returning a pointer to a local should be a compile error Jul 7, 2020
@ghost ghost changed the title Returning a pointer to a local should be a compile error Compiler intelligence: Returning a pointer to a local should be a compile error Jul 15, 2020
@andrewrk andrewrk modified the milestones: 0.7.0, 0.8.0 Oct 14, 2020
@andrewrk andrewrk added the frontend Tokenization, parsing, AstGen, Sema, and Liveness. label Oct 14, 2020
@mikdusan
Copy link
Member

mikdusan commented Jan 4, 2021

from zig weekly 2020/12/16

Martin: I think there’s another issue that we should consider that might help us decide this.
Compiler intelligence: Returning a pointer to a local should be a compile error (#5725)

This is a case where there is a large spectrum of how smart the compiler could be about detection.  But the problem is that the definition of “what compiles” comes from the language spec, and we’ve said in the past that keeping the spec simple is an important goal.  This is one of the major use cases for warnings in C/C++ - the spec defines what compiles, and warnings do the “smarter” checks that are too complicated to standardize.

Andrew: Idea, what about the following rule:  If a compiler can statically detect illegal behavior, it is allowed to be a compile error.  By definition, illegal behavior represents an invalid Zig program, so there is no valid zig program that this would fail to compile.

Josh: This is not actually very useful.  In the case of returning a stack local, the compiler would need to prove:
- the function is called in all cases
- the function returns a stack local
- the returned pointer is dereferenced

This needs to happen in all cases of running the program in order to statically prove UB.  The `--help` flag in the Zig compiler codebase would effectively prevent this analysis from finding any problems in it, because it can’t prove that most functions are actually executed.

It can’t be done at function level, because this is a totally valid callback:
fn callback() void {
  // this should not be called on my inputs
  unreachable;
}


But maybe things like this should be warnings.

Andrew:  This is a good realization, but I want to think more about it.  If we did include warnings, we should at least have the compiler default treat warnings as errors.  But I’m not prepared to decide on this immediately.  Let’s come back to it in the future.

If the compiler can prove something statically that is an error, let's emit an error. For unused variables, let's reduce the surface:

  • leave globals out of it
  • identify unused locals only if there is no comptime control flow in or of any nested block

How effectual will this be? If it's not a compile-speed killer and relatively straight forward, why not? Whether this is a warning or error, is a different question.

On the subject of returning local pointers:

Josh: This is not actually very useful.  In the case of returning a stack local, the compiler would need to prove:
- the function is called in all cases
- the function returns a stack local
- the returned pointer is dereferenced

Against the first point, it doesn't matter if the function is called in all cases. We already allow bogus things in lazy functions that would blow up if ever compiled. This should not be a prerequisite for detecting errors.

Against the third point, we could knowingly let local pointers escape with attribution:

fn getEndOfStackPtr() *noderef const u8 { ... }

which lets zig know and enforce that pointer can never be dereferenced. Thus it is possible to return but never deref. In the other case, if the compiler can prove escape, that needs to be an error. And again here, it doesn't matter if the compiler can detect 100% escapes. Any % is valuable.

@andrewrk andrewrk modified the milestones: 0.8.0, 0.9.0 Jun 4, 2021
@andrewrk andrewrk added proposal This issue suggests modifications. If it also has the "accepted" label then it is planned. and removed enhancement Solving this issue will likely involve adding new logic or components to the codebase. frontend Tokenization, parsing, AstGen, Sema, and Liveness. labels Nov 21, 2021
@andrewrk andrewrk changed the title Compiler intelligence: Returning a pointer to a local should be a compile error returning a pointer to a local should be a compile error Nov 21, 2021
@andrewrk andrewrk modified the milestones: 0.9.0, 0.10.0 Nov 21, 2021
@andrewrk andrewrk modified the milestones: 0.10.0, 0.11.0 Apr 16, 2022
@renatoathaydes
Copy link

I made a mistake due to this and the program worked for many cases, until a test finally observed a "corrupted" value and it took me several hours to find out this was the problem.

In my case the pointer was a little bit hidden behind a call like this:

var buffer: [3]u8{0,0,0};
/// do stuff
return std.fmt.bufPrint(&buffer, "{d}", .{code});

This made me sad... a language that allows this without warnings cannot be used to write reliable code. Hope Zig can fix this.

@matu3ba
Copy link
Contributor

matu3ba commented Dec 16, 2022

At least for C its not possible in a portable way: https://stackoverflow.com/a/16361335

You cannot do what you want in a portable way, because the C language standard does not specify the stack, 
program area,  and heap as distinct areas. Their location can depend on the processor architecture, the 
operating system, the loader, the linker, and the compiler. Trying to guess where a pointer is pointing is 
breaking the abstraction provided by C, so you probably you shouldn't be doing that.

This makes me suspect, that this should be part of tooling https://en.wikipedia.org/wiki/Dangling_pointer#Dangling_pointer_detection or static analysis with the ability to annotate intended program semantics (without static analysis can only help for simple cases or has significant runtime cost).

@gooncreeper
Copy link
Contributor

gooncreeper commented Jun 10, 2024

I don't see why we cannot make returning a pointer to a local variable undefined behavior. If this behavior is desired then it can be made explicit with @intFromPtr. Additionally, the undefined behavior can be caught at runtime by checking if the returned pointer is within the function's stack frame.

@randomcoder67
Copy link

Has this been implemented?

const std = @import("std");

fn get() *[10]u8 {
    var data : [10]u8 = undefined;
    var j: u8 = 0;
    for (0..10) |i| {
        data[i] = j + 65;
        j += 1;
    }
    return &data;
}

pub fn main() !void {
    const data = get();
    std.debug.print("{s}\n", .{data});
}

Compiles perfectly fine, but it of course broken.

Comparable c code:

#include <stdio.h>

int* get() {
	int thing[10];
	for (int i=0; i<10; i++) {
		thing[i] = i + 65;
	}
	return thing;
}

int main() {
	int* data = get();
	printf("%s\n", data);
}

Gives a warning:
10:16: warning: function returns address of local variable [-Wreturn-local-addr]

@mlugg
Copy link
Member

mlugg commented Nov 12, 2024

Has this been implemented?

No, that's why the issue remains open.

@andrewrk
Copy link
Member

I don't think it's possible to catch at compile-time. Please see this issue for catching it at runtime: #23528

@andrewrk andrewrk closed this as not planned Won't fix, can't repro, duplicate, stale Apr 10, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
proposal This issue suggests modifications. If it also has the "accepted" label then it is planned.
Projects
None yet
Development

No branches or pull requests

8 participants