Skip to content

inline parameters #151

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
thejoshwolfe opened this issue May 9, 2016 · 33 comments
Closed

inline parameters #151

thejoshwolfe opened this issue May 9, 2016 · 33 comments
Labels
enhancement Solving this issue will likely involve adding new logic or components to the codebase.
Milestone

Comments

@thejoshwolfe
Copy link
Contributor

Following up #132's discussion of inline parameters:

Zig currently supports this syntax for function template parameters, which are effectively inline paramters:

fn read_and_reverse(N: isize)(output_buffer: []u8) {
    var buf: [N]u8;
    read_something(&buf);
    reversed(buf, output_buffer);
}

This is equivalent to this:

fn read_and_reverse(inline N: isize, output_buffer: []u8) {
    ...
}

I say we abandon the separate list of template parameters, and only use inline parameters instead.

Here's why the separate list of template parameters was made in the first place:

  1. Have a separate place to declare types or other values that can be used in runtime parameter declarations and return type declaration. (e.g. fn(T: type)(x: T) -> T.
  2. Allow "baking" a function, where you get a pointer to an instantiated function that has the compile-time parameters pre-supplied only needs the runtime parameters.

I argue that neither of these justify the use of a separate parameter list.

Zig already has the concept that declarations are order independent in many contexts. If we apply this rule to parameter declarations, then it naturally follows that some parameter declarations may depend on other parameters, and depending on a parameter requires that it be inline. That allows fn(inline T: type, a: T) -> T and even fn(a: T, inline T: type) -> T.

Baking functions with inline parameters should be no different than baking functions with runtime parameters. You can already do this for any function:

fn max_u32(a: u32, b: u32) -> u32 {
    max(u32)(a, b)
}

or even simply:

fn at_least_10(a: u32) -> u32 {
    max(u32)(a, 10)
}

this is equivalent to "baking" a function, except that it additionally requires declaring a function and duplicating the parameter list and return type. If we really want more convenient syntax for pre-supplying function parameters, we can add a builtin function for this purpose in the future (how about something like: @bake(max, u32, var, 10)).

@thejoshwolfe thejoshwolfe added the enhancement Solving this issue will likely involve adding new logic or components to the codebase. label May 9, 2016
@andrewrk andrewrk added this to the 0.1.0 milestone May 10, 2016
@andrewrk
Copy link
Member

So a generic data structure:

struct List(T: type) {
    items: []T,
    // ...
}

Is kind of syntactic sugar for a function like this:

fn List(T: type) -> type {
    return struct {
        items: []T,
        // ...
    }
}

Assuming that we had anonymous struct type declarations in expressions.

But the reason we might not want these is for a self referencing struct or a circular referencing struct, like this:

struct Node {
    item: i32,
    next: &Node,
}

It's not clear how this would work if it were of the form:

const Node = struct {
    item: i32,
    next: ???,
};

@tsbockman
Copy link

It's not clear how this would work if it were of the form:

const Node = struct {
    item: i32,
    next: ???,
};

You just need the equivalent of D's typeof(this). It's super useful when writing generic data structures.

(You might want to pick a more compact symbol though, like This.)

@andrewrk
Copy link
Member

andrewrk commented Jun 19, 2016

That's a reasonable idea. In Rust it's Self I believe. Having all structs be anonymous might be elegant with how it would work with generics. For example:

const Node = struct {
    item: i32,
    next: &Self,
};

pub const ListOf_i32 = List(i32);

pub inline fn List(inline T: type) -> type {
    SmallList(T, 8)
}

pub inline fn SmallList(inline T: type, inline STATIC_SIZE: isize) -> type {
    struct {
        items: []T,
        length: isize,
        prealloc_items: [STATIC_SIZE]T,
    }
}

fn f() {
    var list: List(i32) = undefined;
    list.length = 10;
}

Compare to generics for functions:

pub fn parse_u32(buf: []u8, radix: u8) -> %u32 {
    parse_unsigned(u32, buf, radix)
}

pub error InvalidChar;
pub error Overflow;

pub fn parse_unsigned(inline T: type, buf: []u8, radix: u8) -> %T {
    var x: T = 0;

    for (buf) |c| {
        const digit = char_to_digit(c);

        if (digit >= radix) {
            return error.InvalidChar;
        }

        // x *= radix
        if (@mul_with_overflow(T, x, radix, &x)) {
            return error.Overflow;
        }

        // x += digit
        if (@add_with_overflow(T, x, digit, &x)) {
            return error.Overflow;
        }
    }

    return x;
}

@andrewrk
Copy link
Member

const A = struct {
    b: &B,
};
const B = struct {
    a: &A,
};

This can work, but we need some advanced dependency resolution strategies. Currently this would result in an error, because top level dependency A depends on itself via B.

To make this work I think what we would do is have a pointer type not trigger a dependency. We would always create a forward decl type for pointer types, and then go back and resolve them in a second pass.

@andrewrk andrewrk changed the title inline parameters inline parameters and all containers are anonymous Jun 19, 2016
@andrewrk
Copy link
Member

This will affect how debug symbols work. Currently if you do:

struct Foo {
    a: i32,
}
const Bar = Foo;

Foo gets a debug info type, but Bar does not. (See #41)

With all containers being anonymous...

const Foo = struct {
    a: i32,
};
const Bar = Foo;

...now there isn't much of a distinction between Foo and Bar. Both of them should be aliases for the anonymous struct, if that's possible. Otherwise we should create multiple definitions of the anonymous struct, one for each of the aliases. Then when instantiating a variable, the debug info should use the correct alias for the type.

@thejoshwolfe
Copy link
Contributor Author

how does this affect how typedefs would work? i believe the point of zig typedefs is to make structurally equivalent types that are incompatible. are we giving up on that feature?

@andrewrk
Copy link
Member

Good point. I guess to take this all the way it would make typedefs look like:

const Inches = type Meters;

In other words a typedef becomes an expression.

@andrewrk
Copy link
Member

That being said, no change to typedefs is required by this change.

const Meters = f32;
type Inches = Meters;

This could still work independent of this change.

@thejoshwolfe
Copy link
Contributor Author

what would happen here:

const Foo = struct { a: u32, };
const Boo = struct { a: u32, };

const Faz = type Foo;
const Fazz = Faz;
const Baz = type Boo;
const Bar = type Boo;

one idea: the equivalence classes would look like:

  • Foo, Boo
  • Faz, Fazz
  • Baz
  • Bar

so effectively, a type expression "instantiates" a new type that does not participate in structural equivalence testing.

@andrewrk
Copy link
Member

With your proposal would the field names participate in the structural equivalence testing? E.g. are these equivalent?

const Foo = struct { a: u32, };
const Boo = struct { b: u32, };

It's more complicated to do structural equivalence testing than to treat all independent declarations as incompatible. C does not do structural equivalence testing. What's the use case for it?

@andrewrk
Copy link
Member

I'm changing this issue back to "inline parameters" only. All structs being anonymous is a huge can of worms that should be opened separately.

@andrewrk andrewrk changed the title inline parameters and all containers are anonymous inline parameters Jul 17, 2016
@thejoshwolfe
Copy link
Contributor Author

This issue has returned back to simply the proposal to move the separate list of function template parameters into the normal parameter list in any position with the keyword inline.

current syntax:

pub fn slice_eql(T: type)(a: []const T, b: []const T) -> bool {
    ...
}
pub const eql = slice_eql(u8);

proposed syntax:

pub fn slice_eql(inline T: type, a: []const T, b: []const T) -> bool {
    ...
}
pub fn eql(a: []const u8, b: []const u8) -> bool {
    slice_eql(u8, a, b)
}

@kiljacken
Copy link

I feel like the original / current syntax fits better if the type parameters are only usable as types. If they, however, had some sort of sized runtime value, the proposed syntax would seem better.

@thejoshwolfe
Copy link
Contributor Author

thejoshwolfe commented Jul 18, 2016

Another application of an inline parameter would be using it as the size of an array. for example:

(EDIT: added more code and more explanation in comments)

fn half_assed_contains_duplicates(inline T: type, array: []const T, inline buf_size: usize) -> bool {
    var buf: [buf_size]T = undefined;
    for (array) |element, i| {
        for (buf[0...min(usize, i, buf_size)]) |other| {
            if (element == other) return true;
        }
        buf[i%buf_size] = element;
    }
    false
}

#attribute("test")
fn test_half_assed_contains_duplicates() {
    const array = []u32{1, 2, 3, 4, 1, 6, 7, 8};
    assert(half_assed_contains_duplicates(u32, array, 4) == false);
    assert(half_assed_contains_duplicates(u32, array, 5) == true);

    // array.len works because array is a compile-time constant, and so is its .len by extension
    assert(half_assed_contains_duplicates(u32, array, array.len) == true);

    assert(other_function(array));
}

#static_eval_enable(false)
fn other_function(array: []u32) -> bool {
    // this would be an error because array.len is not known at compile time:
    //half_assed_contains_duplicates(u32, array, array.len);
    true
}

Inline parameters are guaranteed to be known at compile time, which means you can declare fixed-size arrays with sizes that depend on inline parameters.

@kiljacken what do you mean by "some sort of sized runtime value"?

@kiljacken
Copy link

Ahh, I see, so the inline keyword is for guaranteeing that the value is known at compile time? If that is the case I agree that the proposed syntax is useful.

As for the "some sort of sized runtime value", I was thinking of Blow's jai, where types are usable as values, somewhat like an enum entry.

@fsaintjacques
Copy link
Contributor

fsaintjacques commented Oct 22, 2016

I'm a bit late. Please consider reverting this change. While it does reduce the function declaration complexity by having a single parameter list, consider the caller's POV:

  • old syntax, some_fn(u64, 8)(a, b, c)
  • rust, some_fn::<u64, 8>(a, b, c)
  • scala, some_fn[u64, 8](a, b, c)
  • new syntax, some_fn(a, u64, b, 8, c) (is 8 an inline parameter? maybe, maybe not)

The first 3 directly hints the reader what's happening, the last one isn't.

Having the compile-time required parameters separated from the runtime parameters is also great for readability, one can read the definition in a single pass without keeping state in his head.

I feel that this change impede readability. Maybe I'm missing something else from a semantic POV?

@andrewrk andrewrk reopened this Oct 27, 2016
@fsaintjacques
Copy link
Contributor

I estimate that this will forces developers to create a non-enforced convention where all inlines parameters must preceed runtime parameters. A bit like C/C++ "inputs, inputs-outputs, outputs" convention.

@andrewrk
Copy link
Member

Re-opening to keep the discussion alive.

It is true that some_fn(u64, 8)(a, b, c) gives the reader more information than some_fn(a, u64, b, 8, c).

Playing devil's advocate here, I'll call into question whether that extra information helps the readability of the code. What would one do with this information? The compiler catches the error when the programmer fails to provide a compile-time value for an inline parameter.

Here's an example function use case: printf.

fn printf(out_stream: &OutStream, inline format: []u8, args: []var) -> %usize {}

Note: this function prototype won't work until I finish doing more work in the IR branch and add support for args: []var.

Usage would be something like stdout.printf("format %s string %d", []var {arg1, arg2});. As opposed to stdout.printf("format %s string %d")([]var {arg1, arg2});

@fsaintjacques
Copy link
Contributor

fsaintjacques commented Oct 27, 2016

I find myself reading code much more often than compiling, either by reviewing code or by browsing on github. I'd say that optimizing for reading is a must.

@andrewrk
Copy link
Member

You're preaching to the choir about reading code being more important than writing it. That's an explicit design principle of Zig. So the nature of the counter argument here is - does it improve readability to have inline parameters separate? How so?

@fsaintjacques
Copy link
Contributor

fsaintjacques commented Oct 27, 2016

-readability: it doesn't require you to read (and remember) the function definition to understand that parameters are assigned at compile time. The use case would be reading the implementation of something in a .c file and not needing to jump into the .h of another struct/function.

-uniformisation: it enforces the ordering, which will inevitably happens via general consensus, aka "good practices".

@andrewrk
Copy link
Member

-readability: it doesn't require you to read (and remember) the function definition to understand that parameters are assigned at compile time, when looking at the callees.

Does the knowledge of whether a parameter type is expected to be compile-time known help readability? Perhaps so.

uniformisation: it enforces the ordering, which will inevitably happens via general consensus, aka "good practices".

Fair argument.

I'm curious if @thejoshwolfe has any thoughts on this matter?

@andrewrk
Copy link
Member

Ah yes I remember one reason for using inline parameters.

Generic member functions were awkward before. See #141.

@fsaintjacques
Copy link
Contributor

I don't understand the simplification. Are you saying that this syntax is awkward:

const range_func: fn.(&Rand, i16, i16)->i16 = if (preferences.inclusivity) rand.range_inclusive(i16) else rand.range_exclusive(i16);

If so, I believe it can be corrected by simply changing the character () into [] or <>. I'd go with the angle bracket since it matches Java, C++ and Rust. Thus there would be no confusion when "currying" inplace parameter like this example.

@thejoshwolfe
Copy link
Contributor Author

This issue's impact on readability is very hard for me to understand. Is it more readable? or less readable? in what contexts is each one better? I don't know, but here are some thoughts.

There seems to be an assumption that old syntax like this: f(a, b)(c, d) implies that a and b are compile-time known and c and d are not. This is not the case. Normal runtime functions can return function pointers, which can then be called immediately, resulting in syntax like the above. Furthermore, it's either possible or it should be possible (sorry, didn't have time to research this yet) to declare a function that must always be run at compile time. That means that in both old and new sytnax, code like this var x: u32 = f(a, b); does not guarantee anything about whether a and b must be known at compile time.

In summary, neither old nor new caller syntax tells you if any of the parameters need to be known at compile time.

I'd also like to address some ideas about different bracket operators. We can't use [] or {} because those already have meaning: array dereference and struct initialization respectively. That leaves <> among the proposed operators so far, and I really don't like the idea. I know that C++, Java, and many other languages use <> as grouping operators, but it's pretty terrible that those operators are also used for comparison. Consider these expressions in C++: a<b<c> and a<(b<c)>. Not only is this hard for a hand-written recursive-descent parser to parse, but it's also hard for humans to read. For this reason, I've been very negative toward using <> as grouping operators.

So now let's talk about whether there is any value in knowing if a parameter must be known at compile time. I will say that if I were writing some zig code in an IDE, I would expect the IDE to color or italicize or indicate in some way what expressions were compile-time known, and even tell me what the values were when I hove over them. That being said, I can't really say why I think it's so important to me. I also want my IDE to distinguish between variables, constants, types, methods, etc., although it's not zig's responsibility to communicate that through the design of the language. I believe that it's useful information, but I can't say why, and it doesn't seem very critical.

I think the biggest argument in favor of inline parameters (new syntax) is that it's elegant. Consider the case where a function is declared to take a parameter that has 0 size, like an argument of type void (for example the value parameter in the put method for a hashtable being used as a hashset). A 0-sized type is effectively always inlined, and it's omitted from the emitted llvm function. Any other parameter that's actually inlined is also omitted from the emitted function (and can have other effects on the function too). The difference between a runtime parameter and a template parameter doesn't need to be very significant, and it's even conceivable that the compiler might automatically inline some parameters that were declared as runtime parameters (for example, a parameter of type bool). Zig will probably not do that kind of optimization automatically, but consider Jonathan Blow's argued usecase where a human is manually adding and removing the inline keyword from various function parameters (for example on bool parameters) to tweak the performance impact late in a development cycle. If all parameters, runtime and compile-time, are declared with the same syntax interleaved, then they're easier to edit.

Now I know I just made a writability argument during a readability discussion, but we can't completely ignore writability. Readability is more important, but I'm still not sure if the new syntax is actually any less readable.

@fsaintjacques
Copy link
Contributor

fsaintjacques commented Oct 28, 2016

More often than not, I will read code outside of an IDE, e.g. github. I don't expect the syntax parser to analyse and read definitions. And let's be realistic, zig will not have a supported feature complete IDE in the next 5-10 years minimum.

OTOH, I expect the average programmer with experience in Java, C++, C# to immediately pickup the semantics behind

const closure = myFunction<u32, 8>;

whether or not they know the intricacies of the language. It's not a question of is it harder/easier to read. It's that the programmer capture the semantics of compile time parameter in a single glance without having to read the definition. Something that the proposed change failed to achieve.

Another things pops in my mind, it was previously possible to curry compile time parameters like this (from the previous comment example):

const range_func: fn.(&Rand, i16, i16)->i16 = rand.range_inclusive(i16)

Am I mistaken by assuming that the new syntax does not allow such construct that was previously allowed (without having to define a wrapper function that captures compile time parameters)?

What I wish from a language is that any programmer can drop in code and start reading without astonishment. To achieve this goal, we somewhat needs to accept existing convention (<>) and follow a model of least surprise. I feel the inline keyword breaks this assumption of least surprise.

@andrewrk
Copy link
Member

andrewrk commented Oct 28, 2016

More often than not, I will read code outside of an IDE, e.g. github. I don't expect the syntax parser to analyse and read definitions. And let's be realistic, zig will not have a supported feature complete IDE in the next 5-10 years minimum.

While Zig is designed to be highly IDE friendly, I agree with you here, and I further agree that "this would be readable in an IDE" is a faulty argument. Although I think @thejoshwolfe didn't mean to make that argument; I took it more along the lines of, let's put some facts down that we can all agree on, and then figure out the best thing to do.

It's that the programmer capture the semantics of compile time parameter in a single glance without having to read the definition. Something that the proposed change failed to achieve.

Here are some of the real world use cases for these inline parameters, from zig std library:

    var list = List(i32).init(&debug.global_allocator);
    const answer = rand.rangeUnsigned(u8, 0, 100) + 1;
        %return in_stream.readIntLe(u64)
        const version = %return st.self_exe_stream.readInt(st.elf.is_big_endian, u16);
        const byte_count = %return math.mulOverflow(usize, @sizeOf(T), n);
    // caveat not working yet
    %%stdout.printf("a number: %d a string: %s\n", []var { foo, bar });

It seems to me that in practice, this way of expressing generics is quite readable, perhaps even more readable than a separate argument list or angle brackets.

There's also the point about angle brackets that @thejoshwolfe made:

Consider these expressions in C++: a<b<c> and a<(b<c)>. Not only is this hard for a hand-written recursive-descent parser to parse, but it's also hard for humans to read. For this reason, I've been very negative toward using <> as grouping operators.

And then regarding currying (const range_func: fn.(&Rand, i16, i16)->i16 = rand.range_inclusive(i16)):

With inline parameters we don't need currying. We can eliminate this from the language. Defining a wrapper function to specify some parameters as constants is easy to read and easy to write. The only downside is it's a tiny bit more typing, which I find to be a weak argument against it.

What I wish from a language is that any programmer can drop in code and start reading without astonishment. To achieve this goal, we somewhat needs to accept existing convention (<>) and follow a model of least surprise. I feel the inline keyword breaks this assumption of least surprise.

I agree with your goal whole-heartedly and I agree with your premise about least surprise. However I do not agree with your conclusion that the inline keyword breaks this assumption of least surprise.

In general, I think you are making reasonable arguments and I am trying to represent the other side of the issue so that we can make the best decision here.

@andrewrk
Copy link
Member

andrewrk commented Oct 28, 2016

One more note: we have conventions that make it clear when a type is being passed as a parameter. It's open for discussion whether these conventions should be enforced by the compiler.

@fsaintjacques
Copy link
Contributor

Can we agree that if zig follows the established (good or bad) convention of angle brackets for compile time type argument, i.e struct MyType<T>; and fn myFunction<T>(T: fst, T: snd); most (C#, C++, Java, Rust, even PHP!) developers will instinctively understand without reading zig's grammar:

  • No need to explain a new parameter keyword.
  • No need to suggest the user to follow an informal convention (use snake_case and put all inline before non-inline).

This is the crux of my readability argument; ease the integration of new user coming in. If by elegant we mean the grammar is shorter, then yes, we lose elegance.

Sorry for the bike shedding.

@andrewrk
Copy link
Member

Yes I agree with that statement. And I don't think you have to apologize and I don't think this is bike shedding. I appreciate the perspective you're presenting. I also agree that shorter grammar isn't necessarily more readable.

@andrewrk
Copy link
Member

I feel pretty confident about this design decision and I'm going to close this issue.

@ddevault
Copy link

+1 to angle brackets

-1 to f(a, b)(c, d), which is ambiguous to calling an f that returns a function pointer which is immediately invoked.

@jibal
Copy link

jibal commented Jan 31, 2022

As a newcomer to Zig, I want to thank you for what I think was an excellent design decision, and I believe the readability arguments against it were mistaken. <> and D's ! and even [] in Scala/Julia/etc. are distracting clutter and suggest "here be dragons", especially for people coming from C or dynamic languages who have heard horror stories about C++'s "template programming" or Scala's "higher-kinded types" being relegated to experts who have learned the finer details about a whole other type language embedded in the language. Uniformity and simplicity aid readability.

And the division between comptime and runtime parameters is artificial and largely an implementation issue, especially for numeric "template parameters"--the reader doesn't care whether comptime-known values are being used for things that can only be done at comptime like sizing arrays or types or doing inline loops, or whether they are passed to parameters that it so happens can also be determined at runtime--at the call site they are known to be comptime-known; that they occur inside <> is not useful information because it's not a reliable indicator that something is a comptime value--most comptime-known values aren't passed to comptime parameters.

And as a rank beginner I'm mostly writing generic functions in Zig that use @TypeOf(param) rather than passing explicit type parameters, so whether a function is generic doesn't depend on special markers like <>/!/[], so establishing this uniformity isn't just about readability but also about power and consistency. And I follow the convention that functions that return types have capitalized names, which gives the reader the information they actually need.

The unification of comptime and runtime programming in Zig makes it much easier to think about code, both in the writing and the reading--writability often is readability when it's at a higher level than just saving keystrokes.

That Zig allows for generic functions that take comptime values or determine the type info of their parameters and then return types as values (or check conformity and issue @CompileError as appropriate) takes only a few seconds to "get" forever, and the immediate thought is "wow, there's no limit to what I can do with this". Yes that can be abused, but this is a low-level systems programming language ... as Doug Gwyn said of UNIX, "[it] was not designed to stop you from doing stupid things, because that would also stop you from doing clever things". Yet Zig does stop you from doing many stupid things that are clearly stupid, while still providing powerful general software construction tools.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Solving this issue will likely involve adding new logic or components to the codebase.
Projects
None yet
Development

No branches or pull requests

7 participants