Skip to content

[SUGGESTION] Array Literals with the support of both Multidimensional and Jagged Arrays #424

Closed
@msadeqhe

Description

@msadeqhe

Preface

The idea of this suggestion is gathered from discussion in this issue.

I have to mention that (...) is already for calling constructores, grouping expressions and initializing lists.

Now, consider the following ambiguities:

  • (1) as an expression, are they parenthesis around a value? Or is it an array of one item?
  • () is not an empty array, but it calls default constructor in variable declarations.
  • (1, 2) in declarations, is it the arguments of a constructor? Or is it an array of two items?
x0: std::vector<int> = (1, 2);

Yes I know that std::vector has a bad API design, but I ask myself why would Cpp2 (like Cpp1) allow libraries to have this ambiguity in the first place?

Having array literals with a different syntax, will solve those three ambiguities. I suggest to use [...] for array literals:

x0: = [1, 2, 3];

Also nested (...)s or ;s or etc, will create multidimensional arrays, because they don't create a new array, and they are for mathematical grouping (as they are used to group expressions and to change the precedence of operators). On the other hand, nested [...]s will create jagged arrays, because they create a new array:

// Multideminsional Array
x0: = [(1, 2, 3), (4, 5, 6)];
// Or alternatively one of the following syntax:
// x0: = [1, 2, 3; 4, 5, 6];
// x0: = [(1, 2, 3); (4, 5, 6);];
r0: = x0[0, 1] == 2; // true

// Jagged Array
x1: = [[1, 2, 3], [4, 5, 6]];
r1: = x1[0][1] == 2; // true

I currently do not suggest to support multidimensional arrays, but it's a possibility to consider in the future.

Suggestion Detail

Three options are available instead of () for array literals:

  • <...> is already for template parameters/arguments. It's not a good choice, because:
    • It doesn't have any known relation with arrays.
    • It looks like less-than and greater-than operators, because of this similarity, it's not a good choice for arrays which are methematical such as a vector of boolean values, e.g. <a < b, c > 2>.
  • [...] is already for accessing items of an array. It seems to be a good choice.
  • {...} is already for function/statement blocks and type definitions. It can be considered as a good choice.

Now, it's the time to compare both [...] and {...} for array literals:

x0: /*...*/ = [1];
x1: /*...*/ = {1};

OK. Both of them look good. So what if we want to write an empty array?

x0: /*...*/ = [];
x1: /*...*/ = {};

[] is clearly an empty array, but {} can be either an empty function/statement block or an empty array in which it depends on the declaration. For example:

x0         :       std::vector<int> = {};  // empty array
x1         : () -> void             = {}   // empty statement block
x2         : () -> std::vector<int> = {}   // ERROR: It doesn't work, although visually it's the same as above.
x3: = call(: () -> std::vector<int> = {}); // SURPRISE! It works, although visually it's the same as above.

{} is visually surprising and inconsistent for x1, x2 and x3, although they look the same:

  • In x0 declaration, {} is an empty array.
  • In x1 declaration, {} is an empty statement block.
  • But in x2 declaration, {} is an error, because it must end with ;.
  • But in x3 declaration, {} is an empty array!

So [...] is more expressive than {...} for array literals.

Now let's consider this situation in the following example:

x0: = [1, 2, 3][1]; // It's equal to 2

The first [...] creates an array, and the second [...] accesses an item from it. A sequence of [...]s is not ambiguous, because its behaviour is similar to parenthesis:

x0: = (call() + something)(1);

The first (...) groups the operands of operator+, and the second (...) calls operator() on the result.

Your Questions

Will your feature suggestion eliminate X% of security vulnerabilities of a given kind in current C++ code?

Yes. If a bad API design can suddenly change the meaning of code, it's going to be a security vulnerability. This suggestion is a way to prevent it by separating arrays from constructors and expressions.

Will your feature suggestion automate or eliminate X% of current C++ guidance literature?

Yes. It's not needed to learn if user-defined constructors are ambiguous with initializer lists, because it prevents ambiguous situation completely. It allows more API choices.

Considered Alternatives

An alternative solution was that if a type has ambiguous constructor with initializer list, it should be a syntax error. By the way, this approach wouldn't fix the bad API design of std::vector.

Another alternative solution was a little complicated. The idea was to favor constructors over initializer list, and to consider a comma-separated list with parenthesis to be an initializer list:

x0: = (1, 2); // x0 is an initializer list

With the help of unnamed variable declaration and indirect initialization, it could be used like this:

// `: = (1, 2)` is an initializer list
x0: std::vector<int> = : = (1, 2);

// This calls the constructor to create a vector of one element with value 2.
x1: std::vector<int> = (1, 2);

But I gave up on this idea, becuase it would encourage unnamed variable declaration more than necessary.

Finally I considered to use literal templates syntax:

// `(1, 2)<int>` is an initializer list
x0: std::vector<int> = (1, 2)<int>;

// `list` is a user-defined literal suffix which creates an initializer list
x1: std::vector<int> = (1, 2)list;

// This calls the constructor to create a vector of one element with value 2.
x2: std::vector<int> = (1, 2);

But I gave up on this idea too, because (1, 2)<int> would require to always specify the type, and (1, 2)list would make user-defined literal suffixes to be look like constructors.

Edits

  • I've added one more alternative solution which I was considered.

Metadata

Metadata

Assignees

No one assigned

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions