Description
Preface
The idea of this suggestion is gathered from discussion in this issue.
I have to mention that (...)
is already for calling constructores, grouping expressions and initializing lists.
Now, consider the following ambiguities:
(1)
as an expression, are they parenthesis around a value? Or is it an array of one item?()
is not an empty array, but it calls default constructor in variable declarations.(1, 2)
in declarations, is it the arguments of a constructor? Or is it an array of two items?
x0: std::vector<int> = (1, 2);
Yes I know that std::vector
has a bad API design, but I ask myself why would Cpp2 (like Cpp1) allow libraries to have this ambiguity in the first place?
Having array literals with a different syntax, will solve those three ambiguities. I suggest to use [...]
for array literals:
x0: = [1, 2, 3];
Also nested (...)
s or ;
s or etc, will create multidimensional arrays, because they don't create a new array, and they are for mathematical grouping (as they are used to group expressions and to change the precedence of operators). On the other hand, nested [...]
s will create jagged arrays, because they create a new array:
// Multideminsional Array
x0: = [(1, 2, 3), (4, 5, 6)];
// Or alternatively one of the following syntax:
// x0: = [1, 2, 3; 4, 5, 6];
// x0: = [(1, 2, 3); (4, 5, 6);];
r0: = x0[0, 1] == 2; // true
// Jagged Array
x1: = [[1, 2, 3], [4, 5, 6]];
r1: = x1[0][1] == 2; // true
I currently do not suggest to support multidimensional arrays, but it's a possibility to consider in the future.
Suggestion Detail
Three options are available instead of ()
for array literals:
<...>
is already for template parameters/arguments. It's not a good choice, because:- It doesn't have any known relation with arrays.
- It looks like less-than and greater-than operators, because of this similarity, it's not a good choice for arrays which are methematical such as a vector of boolean values, e.g.
<a < b, c > 2>
.
[...]
is already for accessing items of an array. It seems to be a good choice.{...}
is already for function/statement blocks and type definitions. It can be considered as a good choice.
Now, it's the time to compare both [...]
and {...}
for array literals:
x0: /*...*/ = [1];
x1: /*...*/ = {1};
OK. Both of them look good. So what if we want to write an empty array?
x0: /*...*/ = [];
x1: /*...*/ = {};
[]
is clearly an empty array, but {}
can be either an empty function/statement block or an empty array in which it depends on the declaration. For example:
x0 : std::vector<int> = {}; // empty array
x1 : () -> void = {} // empty statement block
x2 : () -> std::vector<int> = {} // ERROR: It doesn't work, although visually it's the same as above.
x3: = call(: () -> std::vector<int> = {}); // SURPRISE! It works, although visually it's the same as above.
{}
is visually surprising and inconsistent for x1
, x2
and x3
, although they look the same:
- In
x0
declaration,{}
is an empty array. - In
x1
declaration,{}
is an empty statement block. - But in
x2
declaration,{}
is an error, because it must end with;
. - But in
x3
declaration,{}
is an empty array!
So [...]
is more expressive than {...}
for array literals.
Now let's consider this situation in the following example:
x0: = [1, 2, 3][1]; // It's equal to 2
The first [...]
creates an array, and the second [...]
accesses an item from it. A sequence of [...]
s is not ambiguous, because its behaviour is similar to parenthesis:
x0: = (call() + something)(1);
The first (...)
groups the operands of operator+
, and the second (...)
calls operator()
on the result.
Your Questions
Will your feature suggestion eliminate X% of security vulnerabilities of a given kind in current C++ code?
Yes. If a bad API design can suddenly change the meaning of code, it's going to be a security vulnerability. This suggestion is a way to prevent it by separating arrays from constructors and expressions.
Will your feature suggestion automate or eliminate X% of current C++ guidance literature?
Yes. It's not needed to learn if user-defined constructors are ambiguous with initializer lists, because it prevents ambiguous situation completely. It allows more API choices.
Considered Alternatives
An alternative solution was that if a type has ambiguous constructor with initializer list, it should be a syntax error. By the way, this approach wouldn't fix the bad API design of std::vector
.
Another alternative solution was a little complicated. The idea was to favor constructors over initializer list, and to consider a comma-separated list with parenthesis to be an initializer list:
x0: = (1, 2); // x0 is an initializer list
With the help of unnamed variable declaration and indirect initialization, it could be used like this:
// `: = (1, 2)` is an initializer list
x0: std::vector<int> = : = (1, 2);
// This calls the constructor to create a vector of one element with value 2.
x1: std::vector<int> = (1, 2);
But I gave up on this idea, becuase it would encourage unnamed variable declaration more than necessary.
Finally I considered to use literal templates syntax:
// `(1, 2)<int>` is an initializer list
x0: std::vector<int> = (1, 2)<int>;
// `list` is a user-defined literal suffix which creates an initializer list
x1: std::vector<int> = (1, 2)list;
// This calls the constructor to create a vector of one element with value 2.
x2: std::vector<int> = (1, 2);
But I gave up on this idea too, because (1, 2)<int>
would require to always specify the type, and (1, 2)list
would make user-defined literal suffixes to be look like constructors.
Edits
- I've added one more alternative solution which I was considered.