Skip to content

Make global mutable variables more easily auditable? #4107

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
scottjmaddox opened this issue Jan 8, 2020 · 13 comments
Open

Make global mutable variables more easily auditable? #4107

scottjmaddox opened this issue Jan 8, 2020 · 13 comments
Labels
proposal This issue suggests modifications. If it also has the "accepted" label then it is planned.
Milestone

Comments

@scottjmaddox
Copy link

scottjmaddox commented Jan 8, 2020

The possibility to embed global mutable variables inside of struct's (inside of functions), etc. without any explicit marker that they are global could make it quite difficult (more difficult than in C) to audit Zig libraries for unsound usage of global mutable variables.

One way to improve the situation would be to require global mutable variables to be prefixed by a global keyword (or static?), just as how thread local variables are prefixed by threadlocal. This would make it very easy to grep for global to identify and audit the (hopefully small number of) global mutable variables in a particular library.

Alternatively, perhaps there could be a tool to identify all global mutable variable definitions and usages? Such a tool would be helpful even if definitions were prefixed by global, since the variable name might be overloaded elsewhere.

@hryx
Copy link
Contributor

hryx commented Jan 8, 2020

@scottjmaddox If you could provide some specifics, what use case are you trying to support? What unsound usage are you trying to catch to catch?

quite difficult (more difficult than in C)

How so?

Rather than a new tool or language change, it might already be possible to find all top-level variables with --verbose-ast or --verbose-ir. Maybe that can help you with your audit.

@fengb
Copy link
Contributor

fengb commented Jan 8, 2020

In C, all globals must be at the top scope. Zig allows you to put (effectively) global variables into structs so they can exist anywhere, including inside functions:

fn foo() void {
    const Bar = struct {
        var this_is_equivalent_to_c_static: usize = 0;
    };
}

@JesseRMeyer
Copy link

@fengb Zig's docs showcase namespaced globals as:

const std = @import("std");
const assert = std.debug.assert;

test "namespaced global variable" {
    assert(foo() == 1235);
    assert(foo() == 1236);
}

fn foo() i32 {
    const S = struct {
        var x: i32 = 1234;
    };
    S.x += 1;
    return S.x;
}

which caused my eyes to dilate from shock the first time I saw that. I do not know how a C programmer is expected to know that x is equivalent to static int x in C. What is it about a struct's namespace rules to functions here that maintains x between calls to foo()?

@andrewrk
Copy link
Member

andrewrk commented Jan 8, 2020

What is it about a struct's namespace rules to functions here that maintains x between calls to foo()?

I'm having trouble parsing this question, but maybe this will explain:

The rules are quite consistent: every file is a struct. You can put structs in structs, and you can put structs in functions. The syntax var foo ... in struct top level declarations makes a global variable.

@andrewrk
Copy link
Member

andrewrk commented Jan 8, 2020

In C, all globals must be at the top scope.

Not true!

static int foo(void) {
    static int x = 0;
    return ++x;
}

@JesseRMeyer
Copy link

JesseRMeyer commented Jan 8, 2020

The syntax var foo ... in struct top level declarations makes a global variable.

I'm having trouble reaching this conclusion from your listing of the rules. Let me see if I can uproot and externalize my thinking for you.

Assume S doesn't exist in foo() and all you're left with is little ol' x that essentially resets upon every call to foo(). So foo() always returns 1235. Now wrap x into S. Sure, a function foo() can have a struct defined in it. And foo() is declared at global scope. But what does that have to do with the lifetime of objects foo() defines? I feel like you need another rule to explain this. Why does hoisting x into S grant it persistence between calls when the mother function is declared at global scope? I mean, if that's just what happens, because S is committed to the data segment, then sure. But that's different treatment when x sat alone by itself. There is no straight line from 'wrap a variable in a struct declared in a function declared at global scope' to 'persistent variable storage' in my mind. At an intuitive level those seemed orthogonal.

Not saying the current behavior is flawed and must be changed. Just sharing the experiences of my cognition learning how the language works.

@momumi
Copy link
Contributor

momumi commented Jan 9, 2020

For me it's slightly confusing, because most other languages I'm familiar with use an explicit keyword for variables that have static/global scope while zig overloads the meaning of var based on context.

For example if I write var x = 1; inside a function body, it's a stack allocated variable. However, if I copy var x = 1; and paste inside a struct it gets allocated somewhere else even if the struct is inside a function.

Compare this to the syntax of other languages which make variables with global lifetime explicit:

// C++
void foo() {
   static int counter; // global lifetime variable
   int x; // local variable
}

struct my_struct {
   static int counter; // global lifetime scope
   int x; // member variable
};

// Java
class JavaExample {
   static int counter;
   int x;
}

// Rust
static mut x: i32 = 0; // global variable
let mut x: i32 = 0; // local variable

@scottjmaddox
Copy link
Author

scottjmaddox commented Jan 10, 2020

Having thought a bit more about this, and other commenters have already discussed this above, I think having a global or static keyword would greatly reduce the confusion about global variables. I won't repeat what has already been pointed out above, but I will reiterate that I can very easily see someone, even someone who's read the documentation, not realizing that in

fn foo() {
    struct S {
        var x:  i32 = 0;
    }
    var y:  i32 = 0;
}

x is a global variable, while y is a local variable. This is not that far off from footguns like uint32 x = -1 being accepted without complaint by C compilers (which just caused me and a coworker many hours of debugging over the past couple days).

To answer this question from @hryx:

What unsound usage are you trying to catch to catch?

There are all kinds of unsound usage of global mutable variables, particularly when it comes to multi-threading. Being able to quickly and easily locate global variables and their usage is a good way to quickly disqualify poorly written libraries as potential dependencies, or to audit libraries for potential causes of multi-threading bugs.

Edit: And not just multi-threading, but also single-threaded concurrency such as Zig's async functions.

@daurnimator daurnimator added the proposal This issue suggests modifications. If it also has the "accepted" label then it is planned. label Jan 10, 2020
@fengb
Copy link
Contributor

fengb commented Jan 10, 2020

(Tangentially related) I’m warming up to Rust’s declaration of memory location followed by optional mutability:

  • let [mut] — scope lifetime
  • static [mut] — global lifetime

Compare that to current Zig semantics:

  • const / var at top level = global
  • var in block = local
  • const is probably block local
    • it may be optimized to be global
    • but types have comptime scope
      • these can have top level decls that are global

These rules make sense to a Zig developer, but I wonder if inheriting Rust’s lifetime based declaration would be easier to learn?

@momumi
Copy link
Contributor

momumi commented Jan 10, 2020

Something else I find weird is that member fields of a struct are delimited with , while const/var fields are delimited with ;

const A = struct {
    var x: u32 = 0;
    y: u32 = 0,
}

If we use only ; and the static keyword:

const A = struct {
    static x: u32 = 0;
    y: u32 = 0;
}

which looks better imo, and its more similar to the syntax of other C-like languages. I think the argument for using commas is that it's easier to copy and paste into a struct literal, but you already have to rip of : type = anyway so it doesn't really make much of a difference. In fact I think it would make more sense to use ; in struct literals instead:

var a = A {
    .y = 10;
};

That way assigning a variable and member fields look the same everywhere. Although maybe that breaks the grammar some how?

Edit:

Maybe even get rid of the . before member fields:

var a = A {
    y = 10;
};

@fengb
Copy link
Contributor

fengb commented Jan 10, 2020

Related: #2859

@andrewrk andrewrk added this to the 0.7.0 milestone Jan 17, 2020
@ghost
Copy link

ghost commented Feb 25, 2020

Reduced example below of something I had to debug, related to this issue

I wrongly assumed that N.b would be initialized to zero each time I initialized S. Stupid in retrospect.

const std = @import("std");

const N = struct {
    b: u16 = 0,
};

const S = struct {
    var n: N = N{};

    // This is obviously the wrong way to do things, but something equivalent
    // could sneak into your code without you noticing
    fn getN(self: *@This()) *N {
        return &n;
    }
};

test "check if zero" {
    const s = S{};
    std.testing.expectEqual(@as(u16, 0), s.getN().b);
    s.getN().b = 4; // modify
}

test "check if zero again" {
    const s2 = S{}; // footgun?: behavior of `s2` not independent of prior usage of `s`
    std.testing.expectEqual(@as(u16, 0), s2.getN().b);
}

@andrewrk andrewrk modified the milestones: 0.7.0, 0.8.0 Oct 27, 2020
@andrewrk andrewrk modified the milestones: 0.8.0, 0.9.0 May 19, 2021
@andrewrk andrewrk modified the milestones: 0.9.0, 0.10.0 Nov 23, 2021
@andrewrk andrewrk modified the milestones: 0.10.0, 0.11.0 Apr 16, 2022
@andrewrk andrewrk modified the milestones: 0.11.0, 0.12.0 Apr 9, 2023
@andrewrk andrewrk modified the milestones: 0.13.0, 0.12.0 Jul 9, 2023
@wooster0
Copy link
Contributor

wooster0 commented Jun 6, 2024

Not sure if this has been proposed already but how about not allowing other declarations between global variable definitions? This already applies to container fields:

a: u8,
b: u8,
var x = false;
c: u8,
x.zig:3:1: error: declarations are not allowed between container fields
var x = false;
^~~

So that would reuse an existing solution, like this:

var a: u8 = undefined;
var b: u8 = undefined;
x: bool = false,
var c: u8 = undefined;
x.zig:3:1: error: non-global variable declarations are not allowed between global variable declarations
x: bool = false,
^~~~~~~~~~~~~~~~ 

(error message wording is arguable)

It would basically force global variable definitions to have to be all in one place, which should make them easier to audit.

@andrewrk andrewrk modified the milestones: 0.14.0, 0.15.0 Feb 9, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
proposal This issue suggests modifications. If it also has the "accepted" label then it is planned.
Projects
None yet
Development

No branches or pull requests

8 participants