Skip to content

[SUGGESTION] Less Noised Interpolated Raw String Literals #300

Closed as not planned
@msadeqhe

Description

@msadeqhe

Currently $R prefix is used to quote interpolated raw string literals in C++2, e.g. $R"(text)".

Template String Literals

Before I start to explain about my suggestion, I should mention that we don't have an empty character literal, therefore '' is a syntax error in C++1. Instead of being a syntax error, I suggest to make '' the start and end of interpolated raw string literals:

// := $R"(This is an (interpolated)$ (raw)$ string literal.)";
x1 := ''This is an (interpolated)$ (raw)$ string literal.'';

// := $R"(Escape sequences such as \n does not work.)";
x2 := ''Escape sequences such as \n does not work.'';

// := $R"(Also a single quote ' doesn't close it.)";
x3 := ''Also a single quote ' doesn't close it.'';

To write multiple single quotes ' inside interpolated raw string literals, we can start and end them with more ' characters:

x1 := ''This ' doesn't close it.'';
x2 := '''This '' doesn't close it.''';
x3 := ''''This ''' doesn't close it.'''';
x4 := '''''This '''' doesn't close it.''''';

NOTE 1: Also if they are placed side-by-side, then they will be concatenated with normal string literals (because they both are interpolated), that's the other reason why I suggested '' instead of """ (in languages such as C#) for them to be easily distinguishable when they are placed side-by-side. To have ' as a character at the begining or end in the content of interpolated raw string literals, we concatenate them with normal string literals instead of allowing optional separators such as new-lines (like in C#) or white-spaces (like in Markdown) between content and quotes. More examples:

// := $R"(This is an )" + $R"(interpolated raw)" + $R"( string literal)";
x1 := ''This is an '' + ''interpolated raw'' + '' string literal'';

// It's a compiler error in current C++2.
// I don't know if it's acceptable for $R"(...)" to be concatenated together.
// := $R"(This is an )"$R"(interpolated raw)"$R"( string literal)";
x2 := ''This is an ''''interpolated raw'''' string literal'';
// ''This is an '' + ''interpolated raw'' + '' string literal''

// NOTE 1
// := $R"('This text is quoted inside ' characters.')";
x3 := "'"''This text is quoted inside ' characters.''"'";
// "'" + ''This text is quoted inside ' characters.'' + "'"

Finally, I have to mention that interpolated raw string literals are not a real raw... maybe we should define a new term for it, something like Interpolated Non-escape-sequenced String Literals. Simply we can call them Template String Literals.

Just like character literals, interpolated raw string literals cannot be empty, becuase multiple ' characters (such as '' or ''') always form the beginning of a string literal:

// It's started with '''' and must have a content and ends with ''''
x0 := ''''; // Compiler ERROR!

x1 := "";   // OK.

My suggestion in a nutshell

  • These two syntaxes will be for interpolated string literals (non-raw or raw) in C++2:
// Interpolated Non-raw String Literal
x1 := "text";

// Interpolated Raw String Literal
x2 := ''text'';
// or '''text'''
// or more ''''...
  • In addition, this syntax will be for non-interpolated string literals (raw only) in C++2:
// Non-Interpolated Raw String Literal
x3 := R"(text)";
// or R"x(text)x"
// or more R"xx...

Why do I suggest this change?

R"(text)" is a powerfull raw string literal, but most of the time we just want to disable escape sequences and be able to simply write single quotes ' and double quotes " inside a string literal. Simply we can call them Template String Literals. Using '' is more readable and more convenient with less typing than $R"( to start an interpolated raw string literal.

Also '' is a syntax error in C++1 becuase we don't have an empty character literal in C++, therefore we can use this never used potential syntax, and programmers won't ask why '' doesn't work (becuase someone may think it should be a null character), they simply learn '' is the start and end of an interpolated raw (non-escape-sequenced) string literal.

We can categorize string literals in a way that '' is visually similar to ", both ''text'' (without escape sequences) and "text" (with escape sequences) are interpolated string literals because they haven't a prefix, but R"(...)" is non-interpolated non-escape-sequenced (real raw) string literal because it's prefixed with R and has paranthesis for more complex texts.

Is there any exprience, data or working implementation available?

My suggestion is similar to raw string literals in C# programming language, but C# 11 uses at least triple double quotes to start and end raw string literals, e.g. """A raw string literal in C# 11""". The first and last new-lines of the content won't be ignored in my suggestion, becuase C++2 can concatenate side-by-side interpolated string literals (see NOTE 1 and the example code), but in C# we have to put the content in a separate line if we want to start the content with ", except that everything is the same.

Also my suggestion is similar to inline code in Markdown launguage, but it uses at least a backtick ` instead of double single quotes '', except that everything is the same.

I have to mention Python have triple single quotes ''' and triple double quotes """ for multi-line string literals.

Literally experiences from C#, Python and Markdown languages can be reviewed.

Metadata

Metadata

Assignees

No one assigned

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions