[SUGGESTION] Expose string interpolation outside string literals

**DISCLAIMERS TO SET EXPECTATIONS**

This suggestion is inspired from [this pull request](https://github.com/hsutter/cppfront/pull/251). The syntax of this suggestion "how to achieve string interpolation outside string literals" is not important, and it can be anything that fits into C++2.

Currently we have 4 string literals:
- String literals (non-raw)
- Interpolated string literals (non-raw)
- Raw string literals
- Raw interpolated string literals

Does it resemble [this video](https://9gag.com/gag/aA3yPA9) about C++1 initialization? Maybe a little.

C++2 can use the following literal instead of the above literals:
- Raw string literals

But how can we have other string literals? We don't need any other string literals to achieve those featues. C++2 can mix raw string literals with capture expressions. C++2 already considers to use capture expressions (`...$`) inside interpolated string literals (`"..."`), C++2 can go further and join them instead of introducing a new string literal for it.

Currently this is what we have in C++2:
```
x := "I'm waiting for (name)$ to come home.";
```
> Result:
> I'm waiting for NAME to come home.
> ^

Now, the above line can be written like this:
```
x := "I'm waiting for "name$" to come home.";
```
> Result:
> I'm waiting for NAME to come home.
> ^

They look like similar, they do the same thing but in a different approach.

The first one uses interpolated string literal but the second one joins raw string literal `"I'm waiting for "` and capture expression `name$` and raw string literal `" to come home."`.

On the other hand, currently this is how we use escape sequences such as `\n` in C++2:
```
x := "Message:\nuse uppercase letter.";
```
> Result:
> Message:
> use uppercase letter.
> ^

So, how can we use escape sequences such as `\n` if we won't have non-raw string literals? This is how it can be written:
```
x := "Message:"n'"use uppercase letter.";
```
> Result:
> Message:
> use uppercase letter.
> ^

They seem a little different, but they do the same thing in a different approach.

The first one uses non-raw string literal but the second one joins raw string literal `Message:` and escape sequence `n'` and raw string literal `use uppercase letter.`. As you can see, `\n` is not obvious in the first one but `n'` is obvious in the second one.

**DESCRIBE DETAILS**

I suggest to have only raw string literals (without prefix, just only `"..."`):
```
x := "(raw)$n' string literal\n";
```
> Result:
> (raw)$n' string literal\n
> ^

And instead of having interpolated string literals, C++2 can automatically join string literals `"..."` and capture expressions `...$`. I don't know what name is better to call the sequence of `"..."` and `...$`, but for now I pickup the name combination group.

Also escape sequences which doesn't have parameters will be written in `...'` notaion instead of `\...` notation, but escape sequences which have parameters will be written in capture expression notaion, for example: escape sequence `\o{nnn}` will be written as `o(nnn)$`.

Capture expressions `...$` and escape sequences `...'` must be outside string literals `"..."`, otherwise they will be not evaluated. For example:
```
a := "First line variable$n'Second line"; // n' is not new-line
```
> Result:
> First line variable$n'Second line
> ^
```
b := "First line "variable$n'"Second line"; // n' is new-line
```
> Result:
> First line VARIABLE
> Second line
> ^
```
c := "First line"n'; // n' is new-line
```
> Result:
> First line
> 
> ^

While `"` is the only special character, it can be escaped with double `""` inside string literals. In other words, two string literal will be joined together and a `"` character will be inserted between them:
```
d := "I write "" character.";
```
> Result:
> I write " character.
> ^
```
x := "This is quote: """n'"End.";
```
> Result:
> This is quote: "
> End.
> ^

The combination group should follow the prefix of the first string literal. In the following example, `left`, `middle$` and `right` have the same prefix `u8`:
```
x := u8"left"middle$"right";
```
> Result:
> leftMIDDLEright
> ^

Also it's possible to have a suffix:
```
x := "Where is my "object$"?"n'"Next to the door!"s;
y := "Where is my "object$"?"n'"Next to the door!"_user_defined_suffix;
```

But some rules should be followed:
1. There shouldn't be any white-space (except new-lines which I explain later) between them. This rule is to remind programmer they are in a combination group, and it will be easier to find spaces inside string literals (because they are important):
  ```
  x := "You are "name$;
  ```
  > Result:
  > You are NAME
  > ^
  ```
  y := "You are "  name$; // Compiler ERROR!
  ```
2. String literals can be broken into multiple lines instead of using escape sequence `n'`:
  ```
  x := "first "x$" // This is not a comment, here is inside the string literal
  second "y$" // This is not a comment either, here is inside the string literal
  last "z$; // But this is a comment, here is outside the string literal
  ```
  > Result:
  > first X // This is not a comment, here is inside the string literal
  > second Y // This is not a comment, here is inside the string literal
  > last Z
  > ^
3. White-spaces can be before and after a combination group in each line, but it cannot be in the middle of them. Therefore when a combination group is broken to several lines, it's possible to align them:
  ```
  x :=       "a "variable$" b"    ;
  ```
  > Result:
  > a VARIABLE b
  > ^
  ```
  y := "a"  variable$  "b"; // Compiler ERROR!
  
  a := "first "x$n' // This is a comment, here is outside the string literal
       "second "y$n' // This is a comment, here is outside the string literal
       "last "z$  ; // This is a comment, here is outside the string literal
  ```
  > Result:
  > first X
  > second Y
  > last Z
  > ^
  ```
  b := "first "x$  n' // Compiler ERROR!
       "second "  y$n' // Compiler ERROR!
       "last "  z$; // Compiler ERROR!
  ```
4. The combination group should contain at least one string literal. This restriction is only for clarification and can be removed later if you want to make it more relaxed, but removing this restriction means `n'` and other escape sequences can be used everywhere:
  ```
  a := n'; // Compiler ERROR!
  b := variable$; // OK, this is a normal capture expression in current C++2
  c := variable$n'; // Compiler ERROR!
  d := variable$n'"text"; // OK, this contains at least one string literal
  ```
5. The combination group must start with a string literal if they have a string literal prefix:
  ```
  a := u8n'; // Compiler ERROR!
  
  b := u8""n';
  // = u8"\n"; // This is what we do in current C++
  ```
  > Result:
  > 
  > ^
  ```
  c := u8n'"second line"; // Compiler ERROR!
  d := u8""n'"second line";
  // Notice how it feels "n'" is nested inside ""n'"second line".
  ```
  > Result:
  > 
  > second line
  > ^
6. Escape sequences may not appear as the first element in the combination group. In other words, the first element in the combination group must be either a string literal or a capture expression (See NOTE 1):
  ```
  a := n'"string"; // Compiler ERROR!
  b := ""n'"string";
  c := variable$"suffix";
  d := (2 * 2)$" apples";
  ```
7. (OPTIONAL) The combination group must end with a string literal if they have a suffix, this restriction is only for clarification and can be removed later if you want to make it more relaxed. It will not be a conflict without this restriction because `...'` and `...$` notations are postifix:
  ```
  c := "string"n's; // Compiler ERROR!
  d := "string"n'""s;
  // Notice how it feels "n'" is nested inside "string"n'"".
  ```

In addition, string literals can be more integrated into the language for [reflection and generation](https://github.com/hsutter/cppfront/wiki/Design-note%3A-Capture) example:
```
(func.name()+"_wrapper")$: (forward func.params.first().name$: _) = {
    do_wrapped_extra_stuff();
    func.name()$(func.params.first().name$);
}
```
If C++2 could allow directly using string literal for function name, then it can be changed to:
```
func.name()$"_wrapper": (forward func.params.first().name$: _) = {
    do_wrapped_extra_stuff();
    func.name()$(func.params.first().name$);
}
```
Alternatively, instead of rules 4, 5 and 6, we can have one rule: the first element of combination group should always be a string literal. In this way for reflection and generation we have to write this:
```
""func.name()$"_wrapper": (forward func.params.first().name$: _) = {
    do_wrapped_extra_stuff();
    func.name()$(func.params.first().name$);
}
```

**Will your feature suggestion add new keywords, new operators or ...?**

Yes, it adds new syntax for escape sequences with `...'` notation. It is for using escape sequences only and `...'` is not like an operator, it is like a character literal `'...'` and a string literal `"..."` as they are not operators either. Using `...'` notaion for escape sequences, is compatible with the current language syntax.

Also a feature has to be added to automatically join raw string literals and capture expressions and escape sequences together.

The `...'` notation is a right choice for escape sequences as it resembles character literals `'...'` and of course it doesn't conflict with them. In addition, The `...'` notation doesn't conflict with character literal prefix either, because C++2 compiler and programmers can easily distinguish them:
```
x := ""n'; // n' is new-line
x := n'; // n' is new-line. This works if C++2 doesn't restrict this usage.
y := u'x'; // u' is a character literal prefix
```

**NOTE 1:** Briefly `'...'` is a character literal, `x'` is an escape sequence and `x'...'` is a character literal prefix, but there is a corner case if `x` was both defined as a escape sequence and as a character literal prefix:
```
x := x'x'; // x' is a character literal prefix
y := x'x'""; // x' is an escape sequence, but it's still a little ambiguous
```
It's still a little ambiguous, therefore escape sequences shouldn't be the first element of combination group (see rule 6):
```
y := ""x'x';
```

**Will your feature suggestion increase code verbosity or readability?**

It depends. Sometimes it may increase code verbosity.

Consider how the following line in current C++2:
```
x := "first\nand (second)$\nand last\n";
```
> Result:
> first
> and SECOND
> and last
> 
> ^

is different from the following line:
```
x := "first"n'"and "second$n'"and last"n';
```
> Result:
> fist
> and SECOND
> and last
> 
> ^

More examples:
```
x := "First name: "first$n'"Last name: "last$n'"Age: "age$n'"Sex: "sex$n';
```
> Result:
> First name: FIRST
> Last name: LAST
> Age: AGE
> Sex: SEX
> 
> ^
```
y := "First name: "first$"
Last name: "last$"
Age: "age$"
Sex: "sex$"
";
```
> Result:
> First name: FIRST
> Last name: LAST
> Age: AGE
> Sex: SEX
> 
> ^
```
z := "First name: "first$n'
     "Last name: "last$n'
     "Age: "age$n'
     "Sex: "sex$n';
```
> Result:
> First name: FIRST
> Last name: LAST
> Age: AGE
> Sex: SEX
> 
> ^

All `x`, `y`, `z` have the same value.

More on capture expressions:
```
a := "I saw "name.to_uppercase()$" yesterday!";
```
> Result:
> I saw NAME yesterday!
> ^
```
b := "The result of 2 * 2 is "(2 * 2)$", and you knew it.";
```
> Result:
> The result of 2 * 2 is 4, and you knew it.
> ^

**Will your feature suggestion eliminate X% of security vulnerabilities of a given kind in current C++ code?**

No.

**Will your feature suggestion _automate or eliminate_ X% of current C++ guidance literature?**

Yes. It will do in the following ways:
- Unifying
  - If we only have raw string literal, all questions about when to use which one of string literals, will be gone.
- Simplicity
  - Having only raw string literal is simpler and easier to teach than teaching students about the differences between string literals and why we have 4 of them.
  - Teachers don't need to explain why interpolated string literals are similar to capture expressions, because it will be obvious right in the syntax.
- Integration
  Interpolated string literals will be integrated into the language outside of literals, this will open new powerfull features to be used instead of introducing new escape sequences for string literals, for example:
  ```
  x := "This is the answer in octal: "o(127)$;
  ```
  > Result:
  > This is the answer in octal: 87
  Consider in the above line, we don't need a special `\o...` escape sequence, instead we already have used `o()` function. In this way, escape sequences will be reformed into capture expressions.

There is 2 way to describe how to read a combination group of `"..."` and `...$` in this example:
```
x := "My book is named "book$" and I bought it in "year$" when I was young";
```
> Result:
> My book is named BOOK and I bought it in YEAR when I was young
> ^

The first way is: Think about `"`s as an on/off switch! Now, Let's read the above example in this way:
1. The first `"` will start a string literal: `"My book is named `
2. The second `"` will start a capture expression: `"book$`
4. The third `"` will again start a string literal: `" and I bought it in `
5. The forth `"` will again start a capture expression: `"year$`
6. The fifth `"` will again start a string literal: `" when I was young`
7. The last `"` will end everything: `";`

The second way is: Think about `"`s as nested expressions! Now, Let's read the above example in this way:
1. The first `"` opens the combination group.
2. `"book$"` is the first nested expression.
3. `"year$"` is the second nested expression.
4. The last `"` closes the combination group.

How the compiler can determine if `"` or `...$` or `...'` are the last part of the combination group? If the counted `"` is even, and there is a space or an operator or `;` after them, then it has to be the end of the combination group.

**Describe alternatives you've considered.**

Originally I considered to use escape sequences `\...` directly besides string literals `"..."` and capture expressions `...$`, but I gave up on this idea for the following example:
```
x := "first"\nsecond$\n"last"; //OOPS!
```

Because it wouldn't work for `\nsecond$`, I had to place an extra character like `'` to separate them:
```
x := "first"\n'second$\n"last";
```
> Result:
> first
> SECOND
> last
> ^

This approach would make it compelicated by having both `\...` and a separator `'` when needed.

After that, I decided to change escape sequence `\...` to be a postifix `...\`:
```
x := "first"n\second$n\"last";
```
> Result:
> first
> SECOND
> last
> ^

This could be a good solution, but it was better if C++2 could just thread all escape sequences as capture expressions:
```
x := "first"n$second$n$"last";
```
> Result:
> first
> SECOND
> last
> ^

By the way, using `...$` notation for escape sequences could be enhanced (because `\n`, `\r`, ... have to be a variable name like `n$`, `r$`, ... and it could prevent programmers to have local variables with those names).

At the end, using `'` for escape sequences seems more natural as it is different from capture expressions thus it doesn't conflict with user defined variables such as `n` and `r` and `t`:
```
x := "first"n'second$n'"last";
```
> Result:
> first
> SECOND
> last
> ^

Originally the syntax was `q$` to escape `"` character (when `...$` was the notation for escape sequences) but later it changed to double `""` which is visually more natural:
```
a := "This is "q$" character."; // Before the change
b := "This is "" character."; // After the change
```
> Result:
> This is " character.
> ^

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[SUGGESTION] Expose string interpolation outside string literals #271

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

[SUGGESTION] Expose string interpolation outside string literals #271

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions