Skip to content

[SUGGESTION] Less Noised Interpolated Raw String Literals #300

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
msadeqhe opened this issue Mar 27, 2023 · 4 comments
Closed

[SUGGESTION] Less Noised Interpolated Raw String Literals #300

msadeqhe opened this issue Mar 27, 2023 · 4 comments

Comments

@msadeqhe
Copy link

msadeqhe commented Mar 27, 2023

Currently $R prefix is used to quote interpolated raw string literals in C++2, e.g. $R"(text)".

Template String Literals

Before I start to explain about my suggestion, I should mention that we don't have an empty character literal, therefore '' is a syntax error in C++1. Instead of being a syntax error, I suggest to make '' the start and end of interpolated raw string literals:

// := $R"(This is an (interpolated)$ (raw)$ string literal.)";
x1 := ''This is an (interpolated)$ (raw)$ string literal.'';

// := $R"(Escape sequences such as \n does not work.)";
x2 := ''Escape sequences such as \n does not work.'';

// := $R"(Also a single quote ' doesn't close it.)";
x3 := ''Also a single quote ' doesn't close it.'';

To write multiple single quotes ' inside interpolated raw string literals, we can start and end them with more ' characters:

x1 := ''This ' doesn't close it.'';
x2 := '''This '' doesn't close it.''';
x3 := ''''This ''' doesn't close it.'''';
x4 := '''''This '''' doesn't close it.''''';

NOTE 1: Also if they are placed side-by-side, then they will be concatenated with normal string literals (because they both are interpolated), that's the other reason why I suggested '' instead of """ (in languages such as C#) for them to be easily distinguishable when they are placed side-by-side. To have ' as a character at the begining or end in the content of interpolated raw string literals, we concatenate them with normal string literals instead of allowing optional separators such as new-lines (like in C#) or white-spaces (like in Markdown) between content and quotes. More examples:

// := $R"(This is an )" + $R"(interpolated raw)" + $R"( string literal)";
x1 := ''This is an '' + ''interpolated raw'' + '' string literal'';

// It's a compiler error in current C++2.
// I don't know if it's acceptable for $R"(...)" to be concatenated together.
// := $R"(This is an )"$R"(interpolated raw)"$R"( string literal)";
x2 := ''This is an ''''interpolated raw'''' string literal'';
// ''This is an '' + ''interpolated raw'' + '' string literal''

// NOTE 1
// := $R"('This text is quoted inside ' characters.')";
x3 := "'"''This text is quoted inside ' characters.''"'";
// "'" + ''This text is quoted inside ' characters.'' + "'"

Finally, I have to mention that interpolated raw string literals are not a real raw... maybe we should define a new term for it, something like Interpolated Non-escape-sequenced String Literals. Simply we can call them Template String Literals.

Just like character literals, interpolated raw string literals cannot be empty, becuase multiple ' characters (such as '' or ''') always form the beginning of a string literal:

// It's started with '''' and must have a content and ends with ''''
x0 := ''''; // Compiler ERROR!

x1 := "";   // OK.

My suggestion in a nutshell

  • These two syntaxes will be for interpolated string literals (non-raw or raw) in C++2:
// Interpolated Non-raw String Literal
x1 := "text";

// Interpolated Raw String Literal
x2 := ''text'';
// or '''text'''
// or more ''''...
  • In addition, this syntax will be for non-interpolated string literals (raw only) in C++2:
// Non-Interpolated Raw String Literal
x3 := R"(text)";
// or R"x(text)x"
// or more R"xx...

Why do I suggest this change?

R"(text)" is a powerfull raw string literal, but most of the time we just want to disable escape sequences and be able to simply write single quotes ' and double quotes " inside a string literal. Simply we can call them Template String Literals. Using '' is more readable and more convenient with less typing than $R"( to start an interpolated raw string literal.

Also '' is a syntax error in C++1 becuase we don't have an empty character literal in C++, therefore we can use this never used potential syntax, and programmers won't ask why '' doesn't work (becuase someone may think it should be a null character), they simply learn '' is the start and end of an interpolated raw (non-escape-sequenced) string literal.

We can categorize string literals in a way that '' is visually similar to ", both ''text'' (without escape sequences) and "text" (with escape sequences) are interpolated string literals because they haven't a prefix, but R"(...)" is non-interpolated non-escape-sequenced (real raw) string literal because it's prefixed with R and has paranthesis for more complex texts.

Is there any exprience, data or working implementation available?

My suggestion is similar to raw string literals in C# programming language, but C# 11 uses at least triple double quotes to start and end raw string literals, e.g. """A raw string literal in C# 11""". The first and last new-lines of the content won't be ignored in my suggestion, becuase C++2 can concatenate side-by-side interpolated string literals (see NOTE 1 and the example code), but in C# we have to put the content in a separate line if we want to start the content with ", except that everything is the same.

Also my suggestion is similar to inline code in Markdown launguage, but it uses at least a backtick ` instead of double single quotes '', except that everything is the same.

I have to mention Python have triple single quotes ''' and triple double quotes """ for multi-line string literals.

Literally experiences from C#, Python and Markdown languages can be reviewed.

@AbhinavK00
Copy link

The only thing I find problematic about this proposal is how we escape the ' character.
My question is, do we really need interpolated raw strings?
We could definitely do with only pure raw strings and simple strings which have escape character and interpolate.

@msadeqhe
Copy link
Author

msadeqhe commented Mar 27, 2023

My question is, do we really need interpolated raw strings?

Interpolated Raw String Literals are already added to C++2 with notation $R"(text)".

We don't need to escape the ' character, because it doesn't end the string literal:

x1 := ''We don't need to escape ' character.'';
x2 := '''Also double '' doesn't need to be escaped.''';
x3 := ''''Even triple ''' doesn't need to be escaped.'''';

But if we want to escape only a single ' character, we use normal strings:

x0 := "'";

Also if we want to escape ' character at the beginning or ending, we again use normal strings:

// := $R"('text')";
x0 := "'"''text''"'";
// "'" + ''text'' + "'"

For the above example in C#, they allow optional new-lines to escape " character:

var v1 = """
         "The content of this string starts and ends with a quote"
         """

In a nutshell, instead of allowing optional new-lines (like in C#), if both ''...'' and "..." are placed side-by-side, C++2 will concatenate them. Both of them are interpolated string literals, ''...'' doesn't understand escape sequences but "..." understands escape sequences:

// := "\n" + $R"(text\n(var)$')";
x0 := "\n"''text\n(var)$''"'";
// "\n" + ''text\n(var)$'' + "'"

Already in C++1 if multiple string literals "..." placed side-by-side will be concatenated. The same thing will be applied to ''...'' as well in C++2.

@msadeqhe
Copy link
Author

msadeqhe commented Mar 27, 2023

Alternative Suggestion

This is an alternative suggestion, in the case you don't think the original suggestion is well suited for C++2.

First I have to mention we need to force programmers to use escape sequence for " in character literals, therefore '"' will be changed to '\"'. Because '" will be the starting quote and "' will be the ending quote of interpolated raw string literals:

// :char = '"';
c1 :char = '\"';

// := $R"(text)";
x1 := '"text"';

And multiple ' can be added outside '"..."', of course that's because '' is not a valid character literal in C++ (and empty character literal is a syntax error). For example:

x1 := '"This " doesn't close it."';
x2 := ''"This "' doesn't close it."'';
x3 := '''"This "'' doesn't close it."''';
x4 := ''''"This "''' doesn't close it."'''';

By changing the notation of interpolated raw string literals from the original suggestion ''...'' to the alternative suggestion '"..."', also a character ' or " can be written inside string literal, and we don't need to use normal string literals for that:

// := $R"(')";
x1 := '"'"';

// := $R"(")";
x2 := '"""';

Also the content of string literals can have characters ' or " at the begining or ending:

// := $R"('This text is quoted inside ' characters.')";
x1 := '"'This text is quoted inside ' characters.'"';

// := $R"("This text is quoted inside " characters.")";
x2 := '""This text is quoted inside " characters.""';

And it can be empty:

// := $R"()";
x0 := '""';

And these string literals can be placed side-by-side, C++2 will concatenate them, however it would be ugly and there is no need for that, compare how y0 is more readable than x0:

// It's a compiler error in current C++2.
// I don't know if it's acceptable for $R"(...)" to be concatenated together.
// := "'"$R"(text)""'";
x0 := "'"'"text"'"'";
// "'" + '"text"' + "'";

// := $R"('text')";
y0 := '"'text'"';

The whole point of my suggestion is to have a simpler way to write interpolated raw string literals, syntax doesn't matter.

This part is not suggested, but it can be an option

Also the notation of non-interpolated raw string literals may change from R"chars(text)chars" to N'chars"text"srahc'. In this way we could have the following string literals:

// Interpolated Non-raw String Literal
x1 := "text";

// Interpolated Raw String Literal
x2 := '"text"';
// := ''"text"'';
// := '''"text"''';
// := ''''...
// := 'chars"text"srahc';

// Non-interpolated Raw String Literal
x3 := N'"text"';
// := N'''"text"''';
// := N'chars"text"srahc';

The chars in N'chars"text"srach' can be any character but if the character is an open bracket, at the end it should be the corresponding closed bracket, and the order of chars will be reversed at the end:

// := R"x[(text)x[";
x0 := N'x["text"]x';

The interpolated raw string literals may have additional characters instead of ', just like the above example:

x0 := 'chars"text"srahc';

But chars should not contain ' or ". For example:

// := $R"x[(text)x[";
x0 := 'x["text"]x';

In addition C++2 may have a non-interpolated non-raw string literal:

// Non-interpolated Non-raw String Literal
x4 := N"text";

@msadeqhe
Copy link
Author

OK, I feel I need to summerize my final suggestion in a new issue, so everyone can directly get my suggestion without too much thinking and reading.

@msadeqhe msadeqhe closed this as not planned Won't fix, can't repro, duplicate, stale Mar 28, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants