Skip to content

[SUGGESTION] Expose string interpolation outside string literals #271

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
msadeqhe opened this issue Mar 11, 2023 · 6 comments
Closed

[SUGGESTION] Expose string interpolation outside string literals #271

msadeqhe opened this issue Mar 11, 2023 · 6 comments

Comments

@msadeqhe
Copy link

msadeqhe commented Mar 11, 2023

DISCLAIMERS TO SET EXPECTATIONS

This suggestion is inspired from this pull request. The syntax of this suggestion "how to achieve string interpolation outside string literals" is not important, and it can be anything that fits into C++2.

Currently we have 4 string literals:

  • String literals (non-raw)
  • Interpolated string literals (non-raw)
  • Raw string literals
  • Raw interpolated string literals

Does it resemble this video about C++1 initialization? Maybe a little.

C++2 can use the following literal instead of the above literals:

  • Raw string literals

But how can we have other string literals? We don't need any other string literals to achieve those featues. C++2 can mix raw string literals with capture expressions. C++2 already considers to use capture expressions (...$) inside interpolated string literals ("..."), C++2 can go further and join them instead of introducing a new string literal for it.

Currently this is what we have in C++2:

x := "I'm waiting for (name)$ to come home.";

Result:
I'm waiting for NAME to come home.
^

Now, the above line can be written like this:

x := "I'm waiting for "name$" to come home.";

Result:
I'm waiting for NAME to come home.
^

They look like similar, they do the same thing but in a different approach.

The first one uses interpolated string literal but the second one joins raw string literal "I'm waiting for " and capture expression name$ and raw string literal " to come home.".

On the other hand, currently this is how we use escape sequences such as \n in C++2:

x := "Message:\nuse uppercase letter.";

Result:
Message:
use uppercase letter.
^

So, how can we use escape sequences such as \n if we won't have non-raw string literals? This is how it can be written:

x := "Message:"n'"use uppercase letter.";

Result:
Message:
use uppercase letter.
^

They seem a little different, but they do the same thing in a different approach.

The first one uses non-raw string literal but the second one joins raw string literal Message: and escape sequence n' and raw string literal use uppercase letter.. As you can see, \n is not obvious in the first one but n' is obvious in the second one.

DESCRIBE DETAILS

I suggest to have only raw string literals (without prefix, just only "..."):

x := "(raw)$n' string literal\n";

Result:
(raw)$n' string literal\n
^

And instead of having interpolated string literals, C++2 can automatically join string literals "..." and capture expressions ...$. I don't know what name is better to call the sequence of "..." and ...$, but for now I pickup the name combination group.

Also escape sequences which doesn't have parameters will be written in ...' notaion instead of \... notation, but escape sequences which have parameters will be written in capture expression notaion, for example: escape sequence \o{nnn} will be written as o(nnn)$.

Capture expressions ...$ and escape sequences ...' must be outside string literals "...", otherwise they will be not evaluated. For example:

a := "First line variable$n'Second line"; // n' is not new-line

Result:
First line variable$n'Second line
^

b := "First line "variable$n'"Second line"; // n' is new-line

Result:
First line VARIABLE
Second line
^

c := "First line"n'; // n' is new-line

Result:
First line

^

While " is the only special character, it can be escaped with double "" inside string literals. In other words, two string literal will be joined together and a " character will be inserted between them:

d := "I write "" character.";

Result:
I write " character.
^

x := "This is quote: """n'"End.";

Result:
This is quote: "
End.
^

The combination group should follow the prefix of the first string literal. In the following example, left, middle$ and right have the same prefix u8:

x := u8"left"middle$"right";

Result:
leftMIDDLEright
^

Also it's possible to have a suffix:

x := "Where is my "object$"?"n'"Next to the door!"s;
y := "Where is my "object$"?"n'"Next to the door!"_user_defined_suffix;

But some rules should be followed:

  1. There shouldn't be any white-space (except new-lines which I explain later) between them. This rule is to remind programmer they are in a combination group, and it will be easier to find spaces inside string literals (because they are important):
x := "You are "name$;

Result:
You are NAME
^

y := "You are "  name$; // Compiler ERROR!
  1. String literals can be broken into multiple lines instead of using escape sequence n':
x := "first "x$" // This is not a comment, here is inside the string literal
second "y$" // This is not a comment either, here is inside the string literal
last "z$; // But this is a comment, here is outside the string literal

Result:
first X // This is not a comment, here is inside the string literal
second Y // This is not a comment, here is inside the string literal
last Z
^

  1. White-spaces can be before and after a combination group in each line, but it cannot be in the middle of them. Therefore when a combination group is broken to several lines, it's possible to align them:
x :=       "a "variable$" b"    ;

Result:
a VARIABLE b
^

y := "a"  variable$  "b"; // Compiler ERROR!

a := "first "x$n' // This is a comment, here is outside the string literal
     "second "y$n' // This is a comment, here is outside the string literal
     "last "z$  ; // This is a comment, here is outside the string literal

Result:
first X
second Y
last Z
^

b := "first "x$  n' // Compiler ERROR!
     "second "  y$n' // Compiler ERROR!
     "last "  z$; // Compiler ERROR!
  1. The combination group should contain at least one string literal. This restriction is only for clarification and can be removed later if you want to make it more relaxed, but removing this restriction means n' and other escape sequences can be used everywhere:
a := n'; // Compiler ERROR!
b := variable$; // OK, this is a normal capture expression in current C++2
c := variable$n'; // Compiler ERROR!
d := variable$n'"text"; // OK, this contains at least one string literal
  1. The combination group must start with a string literal if they have a string literal prefix:
a := u8n'; // Compiler ERROR!

b := u8""n';
// = u8"\n"; // This is what we do in current C++

Result:

^

c := u8n'"second line"; // Compiler ERROR!
d := u8""n'"second line";
// Notice how it feels "n'" is nested inside ""n'"second line".

Result:

second line
^

  1. Escape sequences may not appear as the first element in the combination group. In other words, the first element in the combination group must be either a string literal or a capture expression (See NOTE 1):
a := n'"string"; // Compiler ERROR!
b := ""n'"string";
c := variable$"suffix";
d := (2 * 2)$" apples";
  1. (OPTIONAL) The combination group must end with a string literal if they have a suffix, this restriction is only for clarification and can be removed later if you want to make it more relaxed. It will not be a conflict without this restriction because ...' and ...$ notations are postifix:
c := "string"n's; // Compiler ERROR!
d := "string"n'""s;
// Notice how it feels "n'" is nested inside "string"n'"".

In addition, string literals can be more integrated into the language for reflection and generation example:

(func.name()+"_wrapper")$: (forward func.params.first().name$: _) = {
    do_wrapped_extra_stuff();
    func.name()$(func.params.first().name$);
}

If C++2 could allow directly using string literal for function name, then it can be changed to:

func.name()$"_wrapper": (forward func.params.first().name$: _) = {
    do_wrapped_extra_stuff();
    func.name()$(func.params.first().name$);
}

Alternatively, instead of rules 4, 5 and 6, we can have one rule: the first element of combination group should always be a string literal. In this way for reflection and generation we have to write this:

""func.name()$"_wrapper": (forward func.params.first().name$: _) = {
    do_wrapped_extra_stuff();
    func.name()$(func.params.first().name$);
}

Will your feature suggestion add new keywords, new operators or ...?

Yes, it adds new syntax for escape sequences with ...' notation. It is for using escape sequences only and ...' is not like an operator, it is like a character literal '...' and a string literal "..." as they are not operators either. Using ...' notaion for escape sequences, is compatible with the current language syntax.

Also a feature has to be added to automatically join raw string literals and capture expressions and escape sequences together.

The ...' notation is a right choice for escape sequences as it resembles character literals '...' and of course it doesn't conflict with them. In addition, The ...' notation doesn't conflict with character literal prefix either, because C++2 compiler and programmers can easily distinguish them:

x := ""n'; // n' is new-line
x := n'; // n' is new-line. This works if C++2 doesn't restrict this usage.
y := u'x'; // u' is a character literal prefix

NOTE 1: Briefly '...' is a character literal, x' is an escape sequence and x'...' is a character literal prefix, but there is a corner case if x was both defined as a escape sequence and as a character literal prefix:

x := x'x'; // x' is a character literal prefix
y := x'x'""; // x' is an escape sequence, but it's still a little ambiguous

It's still a little ambiguous, therefore escape sequences shouldn't be the first element of combination group (see rule 6):

y := ""x'x';

Will your feature suggestion increase code verbosity or readability?

It depends. Sometimes it may increase code verbosity.

Consider how the following line in current C++2:

x := "first\nand (second)$\nand last\n";

Result:
first
and SECOND
and last

^

is different from the following line:

x := "first"n'"and "second$n'"and last"n';

Result:
fist
and SECOND
and last

^

More examples:

x := "First name: "first$n'"Last name: "last$n'"Age: "age$n'"Sex: "sex$n';

Result:
First name: FIRST
Last name: LAST
Age: AGE
Sex: SEX

^

y := "First name: "first$"
Last name: "last$"
Age: "age$"
Sex: "sex$"
";

Result:
First name: FIRST
Last name: LAST
Age: AGE
Sex: SEX

^

z := "First name: "first$n'
     "Last name: "last$n'
     "Age: "age$n'
     "Sex: "sex$n';

Result:
First name: FIRST
Last name: LAST
Age: AGE
Sex: SEX

^

All x, y, z have the same value.

More on capture expressions:

a := "I saw "name.to_uppercase()$" yesterday!";

Result:
I saw NAME yesterday!
^

b := "The result of 2 * 2 is "(2 * 2)$", and you knew it.";

Result:
The result of 2 * 2 is 4, and you knew it.
^

Will your feature suggestion eliminate X% of security vulnerabilities of a given kind in current C++ code?

No.

Will your feature suggestion automate or eliminate X% of current C++ guidance literature?

Yes. It will do in the following ways:

  • Unifying
    • If we only have raw string literal, all questions about when to use which one of string literals, will be gone.
  • Simplicity
    • Having only raw string literal is simpler and easier to teach than teaching students about the differences between string literals and why we have 4 of them.
    • Teachers don't need to explain why interpolated string literals are similar to capture expressions, because it will be obvious right in the syntax.
  • Integration
    Interpolated string literals will be integrated into the language outside of literals, this will open new powerfull features to be used instead of introducing new escape sequences for string literals, for example:
    x := "This is the answer in octal: "o(127)$;
    

    Result:
    This is the answer in octal: 87
    Consider in the above line, we don't need a special \o... escape sequence, instead we already have used o() function. In this way, escape sequences will be reformed into capture expressions.

There is 2 way to describe how to read a combination group of "..." and ...$ in this example:

x := "My book is named "book$" and I bought it in "year$" when I was young";

Result:
My book is named BOOK and I bought it in YEAR when I was young
^

The first way is: Think about "s as an on/off switch! Now, Let's read the above example in this way:

  1. The first " will start a string literal: "My book is named
  2. The second " will start a capture expression: "book$
  3. The third " will again start a string literal: " and I bought it in
  4. The forth " will again start a capture expression: "year$
  5. The fifth " will again start a string literal: " when I was young
  6. The last " will end everything: ";

The second way is: Think about "s as nested expressions! Now, Let's read the above example in this way:

  1. The first " opens the combination group.
  2. "book$" is the first nested expression.
  3. "year$" is the second nested expression.
  4. The last " closes the combination group.

How the compiler can determine if " or ...$ or ...' are the last part of the combination group? If the counted " is even, and there is a space or an operator or ; after them, then it has to be the end of the combination group.

Describe alternatives you've considered.

Originally I considered to use escape sequences \... directly besides string literals "..." and capture expressions ...$, but I gave up on this idea for the following example:

x := "first"\nsecond$\n"last"; //OOPS!

Because it wouldn't work for \nsecond$, I had to place an extra character like ' to separate them:

x := "first"\n'second$\n"last";

Result:
first
SECOND
last
^

This approach would make it compelicated by having both \... and a separator ' when needed.

After that, I decided to change escape sequence \... to be a postifix ...\:

x := "first"n\second$n\"last";

Result:
first
SECOND
last
^

This could be a good solution, but it was better if C++2 could just thread all escape sequences as capture expressions:

x := "first"n$second$n$"last";

Result:
first
SECOND
last
^

By the way, using ...$ notation for escape sequences could be enhanced (because \n, \r, ... have to be a variable name like n$, r$, ... and it could prevent programmers to have local variables with those names).

At the end, using ' for escape sequences seems more natural as it is different from capture expressions thus it doesn't conflict with user defined variables such as n and r and t:

x := "first"n'second$n'"last";

Result:
first
SECOND
last
^

Originally the syntax was q$ to escape " character (when ...$ was the notation for escape sequences) but later it changed to double "" which is visually more natural:

a := "This is "q$" character."; // Before the change
b := "This is "" character."; // After the change

Result:
This is " character.
^

@AbhinavK00
Copy link

AbhinavK00 commented Mar 11, 2023

I kind of like this approach. It feels minimalistic (which is a good thing IMHO) but can be a bit hard to grasp at first especially when the current approach is familiar to cpp programmers. But I still think that it could be integrated into cpp2 with some minor tweaks.
Also, you should also specify the output of various examples to better convey how this works.

@msadeqhe
Copy link
Author

Thanks. Now I've written the output of examples.

@msadeqhe
Copy link
Author

I've changed the suggested feature to use ...' notation instead of ...$ notation for escape sequences.

@AbhinavK00
Copy link

' can get confused with char, no? I think backslash is already is good enough and don't see any reason of not using it.

@msadeqhe
Copy link
Author

msadeqhe commented Mar 12, 2023

Thanks, you're right, and I intentionally choose ' to resemble that it's a kind of character. I've updated the original suggestion with the following explanation:

Will your feature suggestion add new keywords, new operators or ...?

Yes, it adds new syntax for escape sequences with ...' notation. It is for using escape sequences only and ...' is not like an operator, it is like a character literal '...' and a string literal "..." as they are not operators either. Using ...' notaion for escape sequences, is compatible with the current language syntax.

Also a feature has to be added to automatically join raw string literals and capture expressions and escape sequences together.

The ...' notation is a right choice for escape sequences as it resembles character literals '...' and of course it doesn't conflict with them. In addition, The ...' notation doesn't conflict with character literal prefix either, because C++2 compiler and programmers can easily distinguish them:

x := ""n'; // n' is new-line
x := n'; // n' is new-line. This works if C++2 doesn't restrict this usage.
y := u'x'; // u' is a character literal prefix

NOTE 1: Briefly '...' is a character literal, x' is an escape sequence and x'...' is a character literal prefix, but there is a corner case if x was both defined as a escape sequence and as a character literal prefix:

x := x'x'; // x' is a character literal prefix
y := x'x'""; // x' is an escape sequence, but it's still a little ambiguous

It's still a little ambiguous, therefore escape sequences shouldn't be the first element of combination group (see rule 6):

y := ""x'x';

If it feels ...' notation is not a good choice, we can examine other alternative notations.

@msadeqhe
Copy link
Author

msadeqhe commented Mar 13, 2023

I should think more about it to be simpler and more familiar to programmers 😰 😁. Thanks everyone.

@msadeqhe msadeqhe closed this as not planned Won't fix, can't repro, duplicate, stale Mar 13, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants