Skip to content

[BUG] comma-operator is irrelevant when parsing template-argument-list #103

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
seanbaxter opened this issue Oct 30, 2022 · 28 comments
Closed
Labels
bug Something isn't working

Comments

@seanbaxter
Copy link

seanbaxter commented Oct 30, 2022

https://github.com/hsutter/cppfront/wiki/Design-note%3A-Unambiguous-parsing#a-first-match-wins-there-are-no-comma-expressions-and-a-relational-comparison-in-a-template-argument-must-be-parenthesized

In Cpp2 my intent is to make this a non-problem (famous last words!) by addressing this in two chunks:

  • Cpp2 has no comma operator, so that removes the possibility of b,c being a comma-expression.
  • Cpp2 requires a relational comparison expression in a template-argument-list to be parenthesized. (*)

In C++, b,c in a template-argument-list cannot be a comma-expression because the grammar sets the template-argument production to conditional-expression, which is higher precedence than expression, which is where the comma is parsed. The notes concerning comma-expression are irrelevant.

More generally, I don't see how context insensitivity is achieved. What is changed, other than requiring parens around comparisons in the template-argument-list?

f(temp_or_object<a, b>(c));

Is this disallowed Cpp2 syntax? It's definitely ambiguous.

@seanbaxter seanbaxter added the bug Something isn't working label Oct 30, 2022
@seanbaxter
Copy link
Author

seanbaxter commented Oct 31, 2022

Ah there's another problem in your parsing document.

if (auto e = expression(false)) {   // disallow unparenthesized relational comparisons in template args
    term.arg = std::move(e);
}
else if (auto i = id_expression()) {
    term.arg = std::move(i);
}

The id_expression case is unreachable, because id-expression is a terminal of expression. This code doesn't do anything to disambiguate.

@filipsajdak
Copy link
Contributor

Any example of code that will suffer from that implementation?

@seanbaxter
Copy link
Author

f(temp_or_object<a, b>(c)); suffers from it. The grammar isn't context-free.

@gregmarr
Copy link
Contributor

gregmarr commented Nov 1, 2022

Since it took me a while to spot the ambiguity here (at least partially because I was reading temp_or_object as temporary instead of template), just wanted to confirm that these are the possible interpretations and what parens would be required to disambiguate them:

f((object_name < a), (b > (c)));
f((template_name<a, b>(c)));

@filipsajdak
Copy link
Contributor

Ah, now I get it.

@filipsajdak
Copy link
Contributor

filipsajdak commented Nov 3, 2022

I am thinking about simplifying this. Maybe it would be good to introduce a rule that operators need to be separated with whitespace and if you want to write a template you need to put < directly after the id-expression.

a < b; // correct form of calling < operator
a< b; // error
a<b;   // error

temp<int> i1; // ok
temp<
  int
> i2; // ok
temp <int> i3; // error

I thought about it as I was confused reading your example and my second thought was that we shall support syntax that can be easily interpreted by humans and by compilers. So, maybe we can limit the syntax to only acceptable forms and make it clear what we are looking at.

@seanbaxter
Copy link
Author

That may be a valid engineering solution. I don't care one way or another. But making C++ white space sensitive would be received as a provocation. You'd set off a never-ending amount of bickering.

@filipsajdak
Copy link
Contributor

filipsajdak commented Nov 3, 2022

Well, it is all about having fewer things to learn, and in most of the coding guidelines we have such rules already:

For sure that is something that will increase the readability of the code and will make writing tools easier as there will be fewer ambiguity cases.

That will save us from the code like:

return a*pa**ppa**; // 8

which would be forbidden in favor of

return a * pa* * ppa**; // 8

and I would love to avoid that kind of ambiguity and puzzle-like code.

@gregmarr
Copy link
Contributor

gregmarr commented Nov 3, 2022

Previously C++ had some whitespace sensitivity requirements, namely that std::vector<std::vector<long>> had to be written as std::vector<std::vector<long> > for it to parse properly. That was eventually removed. So on the one hand, there is precedence for whitespace sensitivity. On the other hand, there is also precedence for a desire to remove whitespace sensitivity.

Personally I prefer having whitespace around binary operators, and not having whitespace between a unary operator and its operand. I also prefer not having whitespace between template and < and between my_funcname and (, but I know others prefer having the space there, so I'm not sure that's a method that people can get behind unless there is literally no other option than changing the template bracket type to be something that is already only used in a balanced form.

@seanbaxter
Copy link
Author

Please don't give prefix, infix and postfix operators whitespace-sensitive semantics.

Look at the SPECS project from the 90s, which is a context-free reskin of C++. Start there and move it forward to include modern features. Will have to reimagine the declarator syntax, cast-expression, etc. Don't expect it to resemble C++. Can't just wave your hands over C++ and declare it to be context-free, which is what cppfront does currently.

I don't think a context-free reskin of C++ is very useful. It presents a big training barrier with almost no benefit. OTOH, nailing down the semantics of the parameter passing directives would be super useful. Designing memory safety types (ref and mutref) would be super useful.

@gregmarr
Copy link
Contributor

gregmarr commented Nov 3, 2022

Please don't give prefix, infix and postfix operators whitespace-sensitive semantics.

I think I'm leaning that way as well.

It presents a big training barrier with almost no benefit.

I disagree on the "almost no benefit" here based on how difficult parsing is here. Now, I'm not generally a tool writer, though in a past life I was primary author on some language parsers, so I feel the pain of tool writers having to create context-sensitive parsers. If this lowers the burdens for tool writers and also makes it easier for people to understand, then it's a plus. I have learned multiple programming languages on my own over the last 40 years, I've been writing C++ professionally for 25+ years, have been following standards processes and language development, attending C++ conferences for more than a decade, and it still took me a relatively long time to see the ambiguity in f(temp_or_object<a, b>(c));. I can't imagine now what it's like for new coders having to learn it.

Having said that, I do appreciate the discussion and alternate points of view, so keep it coming.

@seanbaxter
Copy link
Author

If this lowers the burdens for tool writers

It doesn't really matter for compiler writers. C++ is hard to translate for different reasons. I'm not saying no value, but compared to parameter directives or memory safety...?

@jcanizales
Copy link

Will have to reimagine the declarator syntax, cast-expression, etc. Don't expect it to resemble C++

I mean that's already the case for cppfront. Declarations use a colon and the type goes after the identifier, the recommended way to cast is to use as (and it's not clear old-style parenthesis cast should be allowed, given how the current C++ guidelines suggest to avoid it). If you look at the example files, it only resembles C++ in the places where that resemblance doesn't compromise the other goals of the project. And that makes sense: if resemblance had priority, why try a new syntax at all.

It doesn't really matter for compiler writers.

Per his comments on the design of the language, that's not Herb's experience (and he should know a thing or two about it 🙂 ), but more importantly not all tool writers are writing compilers. Some are writing refactoring tools. Some are writing intellisense-like plugins. Some are writing simple syntax highlighters. Some are writing static analyzers. None of them have to translate C++, but still are impeded by its syntax.

@jcanizales
Copy link

I do agree though that being context-free won't be the main driver for adoption: better tools are IMO just a side benefit. Care has to be taken to not end up more verbose than C++, for example (except in cases where the benefit is very clearly obvious to the user). I think SPECS failed in that respect. Swift and Kotlin had that easy: Java and Objective-C were super cumbersome to start with, and just the ability to call existing libraries with a modern syntax was a relief. (The latter, btw, I think is crucial for adoption too: using C++ libraries within cppfront should be seamless, should feel and look like cppfront, and should feel like an improvement).

@switch-blade-stuff
Copy link

I agree with the general sentiment that required whitespace is a no-go. Many people have different code style and even if it may raise eyebrows, requiring that everyone sticks to one general style goes a bit out of realm of a formal language standard (to me, at least).

@filipsajdak
Copy link
Contributor

filipsajdak commented Nov 3, 2022

I am looking at this from the perspective of the goal of making the language 10x safer.

If one of the steps is to require extra space (that is already required by coding guidelines and good practices) I think that this is worth the price.

@jcanizales
Copy link

jcanizales commented Nov 3, 2022

I think making progress in https://github.com/hsutter/cppfront#2017-reflection-generation-and-metaclasses is also important for deciding this. Because if the result of that development is that we end up with ( ) as a valid way to instantiate execute templates compile-time functions, then we can just let go of < > and the problems they carry.

@BenHanson
Copy link

I think making progress in https://github.com/hsutter/cppfront#2017-reflection-generation-and-metaclasses is also important for deciding this. Because if the result of that development is that we end up with ( ) as a valid way to instantiate execute templates compile-time functions, then we can just let go of < > and the problems they carry.

I like the idea that metaclasses could be used to prototype new features/syntax. I've no idea if it could handle the cppfront syntax or not though.

Seeing as Circle has already cracked running arbitrary code at compile time, a programmable compiler seems like the obvious next step. I'd be interested to know just how much effort that would take and how much it interests Sean.

@seanbaxter
Copy link
Author

Metaclasses have nothing to do with the grammar. Whitespaces have nothing to do with safety. The grammar is broken. No reason to involve these other issues.

@jcanizales
Copy link

The choice of < > vs ( ), [ ], ![ ], <[ ]>, { } etc. etc. affects the grammar. If for the "regularization" and "unification" columns of the "meta" row of the roadmap we end up wanting to support something other than < > for compile time functions, then the discussion of how to best deal with < > turns out being moot. That's why I mention it.

Every other modern popular language uses < > for templates. But also their compilers aren't Turing complete machines like C++'s is, and they don't see that as a desirable feature, like C++ does.

@filipsajdak
Copy link
Contributor

I was not clear about the connection between additional spaces and safety. Spaces do not guarantee safety. My point was that e.g. AUTOSAR C++14 coding guidelines or MISRA were created for safety-related systems and they introduce rules that reduce ambiguity and increase readability for developers. The intention is to make it easy to reason about the code.

While I was looking for some examples of coding guidelines about that topic I found that (at least code examples I have found) are pretty consistent in using spaces before and after binary operators.

@neumannt
Copy link

neumannt commented Nov 8, 2022

f(temp_or_object<a, b>(c)); suffers from it. The grammar isn't context-free.

The grammar is context-free but not LR(1). For a detailed discussion see issue 50. Basically the grammar says that if it can be parsed as a template expression it is an a template expression. If you don't want that you have to add parenthesis.

I have written a parsing expression grammar for C++2 that you can use with a parser generator, thus it is certainly context free. The only sad thing is that the grammar is nearly, but not quite LR(1). If you are willing to use a slightly different syntax for templates (e.g., a![b,c] instead of a<b,c>) you can make it LR(1). I have a grammar for that, too, but Herb did not want to change the template syntax. Thus we have to live with something that cannot be parsed with bounded lookahead... But at least it is context free.

@seanbaxter
Copy link
Author

Basically the grammar says that if it can be parsed as a template expression it is an a template expression.

Can't believe that's the line this project is taking.

@neumannt
Copy link

neumannt commented Nov 9, 2022

Basically the grammar says that if it can be parsed as a template expression it is an a template expression.

Can't believe that's the line this project is taking.

I would have preferred something else, too. While the behavior is technical speaking well defined (due to the semantic of the PEG grammar) I consider that surprising behavior. And it requires quite some teaching to explain to programmers why they have to add parenthesis if they do not want that behavior.

Of course the problem are the alternatives. The traditional C++ approach is context dependent, e.g., the parser recognizes that an identifier is a template type. Which is a nightmare and prevents order independent parsing. Or we change the template syntax to something unambigous (like, e.g., a![b,c]), but then C++2 templates look different from C++1 templates. I understand why Herb does not want that, but nevertheless I would very much prefer a new syntax to that surprising parsing behavior.

@seanbaxter
Copy link
Author

seanbaxter commented Nov 14, 2022

I just made the jump in Circle to ![] for template-argument-list. There are other benefits than just disambiguation, like the ability to use it with callables and drop the .template operator() nonsense.

https://twitter.com/seanbax/status/1592153661438054406

It's a bad choice to keep C++'s original sintax while moving forward.

@hsutter
Copy link
Owner

hsutter commented Dec 26, 2022

Catching up: As Sean knows I'm open to changing the template syntax (when Sean saw this in 2021, I was using [ ] for templates), and I continue to be open to new information and changing it if there's reason so. But right now I feel my reasoning in the parsing design note still holds.

@seanbaxter originally asked:

f(temp_or_object<a, b>(c));
Is this disallowed Cpp2 syntax? It's definitely ambiguous.

In Cpp2 this is unambiguous, and instantiating temp_or_object with a and b.

I understand the objection that for f<(a>b)> some people think that requiring parentheses to mean a comparison used in a template argument is somehow ugly. I'll accept that's their opinion and will keep it in mind to see if I keep hearing that. But "YMMV"... I think the parens are nice and clear, requiring them makes this code both actually and visually unambiguous (both of which are important), and even if it were ugly it's an uncommon use case. I think this is much better than today's C++, and is not repeating the mistakes of today's syntax any more than languages like C# did so when they use < > for generics with happy results.

@jcanizales

If for the "regularization" and "unification" columns of the "meta" row of the roadmap we end up wanting to support something other than < > for compile time functions, then the discussion of how to best deal with < > turns out being moot.

Right, I've given that a lot of thought. Super briefly: There's already a parallel today between < > which I think of as compile-time parameter lists, and ( ) run-time parameter lists. They already overlap in that you can put some (NTTP) values in both, such as a<int,10>(20, 30). I thought about unifying them, and/but one thing you need to get all the way is full support for "types as values," and you also need to keep the core must-have of Cpp2 that sets it apart from all the other experiments which is 100% interoperability with today's C++... that means that any inventive solution that unifies those lists must still be able to seamlessly consume/invoke/use today's C++ templates including the existing standard library, and those currently have a strong distinction between compile-time parameter lists and run-time parameter lists.

@gregmarr

I have learned multiple programming languages on my own over the last 40 years, I've been writing C++ professionally for 25+ years, have been following standards processes and language development, attending C++ conferences for more than a decade, and it still took me a relatively long time to see the ambiguity in f(temp_or_object<a, b>(c));. I can't imagine now what it's like for new coders having to learn it.

Yup. In Cpp2 the deliberate choice is that there is no ambuguity (<-- that was a typo, I was trying to write "ambiguity," but it was such a great typo that now I'm intentionally leaving it there).

@jcanizales

more importantly not all tool writers are writing compilers. Some are writing refactoring tools. Some are writing intellisense-like plugins. Some are writing simple syntax highlighters.

Bingo.

Again, thanks for the input, and for understand that I'm going to agree to disagree on this one and continue the experiment on this grammatical path for now -- but, as always, staying open to more data and experience.

@hsutter hsutter closed this as completed Dec 26, 2022
@hsutter
Copy link
Owner

hsutter commented Dec 26, 2022

I've also tweaked the design note's wording to make it clear that the comma operator issue is a visual ambiguity issue. Today's comma operator is still a factor IMO because arises as a visual ambiguity today in @neumannt's foo<1,2> a; example on that page.

@jcanizales
Copy link

There's already a parallel today between < > which I think of as compile-time parameter lists, and ( ) run-time parameter lists. They already overlap in that you can put some (NTTP) values in both, such as a<int,10>(20, 30). I thought about unifying them, and/but one thing you need to get all the way is full support for "types as values," and you also need to keep the core must-have of Cpp2 that sets it apart from all the other experiments which is 100% interoperability with today's C++... that means that any inventive solution that unifies those lists must still be able to seamlessly consume/invoke/use today's C++ templates including the existing standard library, and those currently have a strong distinction between compile-time parameter lists and run-time parameter lists.

While I don't have a proposal for this, I have this thought.

VHDL is a language that also has "compile-time code" and "runtime code", and IMO does it well because it was a design goal (rather than a fortunate accident like in C++). To wit:

for i in 0 to 7 loop  -- this is executed at compile time
    my_bus(i + 1) <= my_bus(i);  -- this is emitted 8 times
end loop;

Caveat for us: Because it describes electronic logic, the runtime part of VHDL is declarative, while the compile-time part is procedural (Pascal-like). Both parts of the language look differently. I think that distinction eased the design a lot. C++ is (apparently) going the opposite direction: From having an OO runtime language and a mix of functional/declarative compile-time language (function and class declarations, and templates), towards making both languages the same. I think it is the right direction for C++. But it's also harder.

This way of looking at it might help us support templates seamlessly in a unified compile-time / runtime syntax. The trick would be to first bring what C++ templates can do to the runtime language:

  • Fully-specializing a class template is already covered: It's the same as constructing an object by giving all the constructor arguments.
  • For partial specialization, we would need to support "currying of partial construction", which is a useful thing on itself too.
  • Automatic matching of the most-specialized template. It's a bit like overload resolution of constructors, but from an overload set that is only known at runtime. I think this is similar enough to runtime matching with is (or this library, etc.). It seems to be generally useful too, given that some part of it is on its way to the language already.
  • Automatic deduction of template types from passed function arguments: This one is different in that it's an interaction between the runtime and compile-time languages. So maybe it stays as its own separate thing.

Off the top of my head, I don't know if there are more features that would need to be "ported"?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

8 participants