Skip to content

[SUGGESTION] Quote string literals with backticks #289

Closed as not planned
Closed as not planned
@msadeqhe

Description

@msadeqhe

This suggestion is a rework from this issue. The syntax of my suggestion is not important, and it can be anything that fits into C++2. I know I must keep things simple and obvious in my suggestion. To do this, I should minimize concepts and keep the syntax familiar to programmers as much as I can. We can have the following string literals:

  • String literals
  • Interpolated string literals
  • Raw string literals
  • Raw interpolated string literals (soon)
  • A new format for string literals (maybe in the future)

In the future, C++2 may introduce more string literals as well. Does it resemble this video about C++1 initialization? Maybe a little. But one string literal is enough, because they can be fundamentally the building block of producing other string literals. In other words, C++2 internally can join multiple raw string literals, escape sequences, captures and other language expressions to produce the string value.

I want to suggest a radical change to string literals by starting from the begining of how to write string literals. 3 common symbols double quote ", single quote ' and backtick ` are suitable to quote string literals. If we look at how sentences are written in English, it would be obvious that double quote " and single quote ' are more often used than backtick `, also an analysis is available here that is interesting because double quote " is more frequency used than single quote '. Therefore backtick ` is an appropriate symbol to be the only escape character in string literals, because it's not a common punctation mark in English and most of the other languages, also it was mainly designed for typewriters as described here, maybe that is why markup languages such as Markdown use backtick ` to create inline code inside normal text. It should be explained that JavaScript uses backtick ` for template literals, also D and Go use it for raw string literals. So, backtick ` should be the only character that have a special behaviour in string literals.

String literals will be quoted inside backticks `, and they don't understand escape sequences and captures until we put them inside a nested backtick `. Captures may have extra parenthesis for expressions, or when escape sequences are beside them. For example:

// "text"
   `text`

// "first\nsecond\nlast"
   `first`\n`second`\n`last`

// "You bought this (object)$ yesterday."
   `You bought this `object$` yesterday.`

// "I know 2 * 2 is (2 * 2)$."
   `I know 2 * 2 is `(2 * 2)$`.`

// "Name: (user)$, Age: (age)$"
   `Name: `user$`, Age: `age$``

// "Name: (user)$\nAge: (age)$"
   `Name: `(user)$\n`Age: `age$``

// "Name: \t(name)$\nAge: \t(age)$"
   `Name: `\t(name)$\n`Age: `\t(age)$``

To write a backtick ` inside a string literal, we can write double backticks ``. String literals placed side-by-side are concatenated, but a white-space should be between them otherwise they will be treated like a single string literal which contains double backticks ``. For example:

// "This is a backtick `"
   `This is a backtick ```

// "User-name"
   `User-name`

// "User`-`name"
   `User``-``name`   //--> White-space is not between them.

// "User""-""name"
   `User` `-` `name` //--> White-space is between them.

// "User"    "-"    "name"
   `User`    `-`    `name`

In a nutshell, `User``-``name` is not equal to `User` `-` `name`.

The goal of my suggestion is to keep it simple to teach and familiar to programmers, that's why I keep symbol \ for escape sequence such as \n whereas I could remove or change it in my suggestion.

String Expression

As you can see, the syntax is similar to current C++2. Programmers put nested backtick expressions inside string literals, although it can be viewed a little bit different that I'll explain in the next paragraph.

Consider string literal: `Name: `(user)$\n`Age: `age$``, let's call it a string expression, it is a combination sequence of the following elements respectively which has to both start and end with a string literal:

  • string literal `Name: `
  • capture (user)$
  • escape sequence \n
  • string literal `Age: `
  • capture age$
  • an empty string literal ``

String expressions can have one of encoding prefixes L, u8, u or U, and they can have suffixes:

// u8 is the prefix and s is the suffix
// u8"Name: (user)$\nAge: (age)$"s
   u8`Name: `(user)$\n`Age: `age$``s

But that's not enough without character literals.

A string literal is a sequence of character literals, that's why I have to also consider character literals. Character literals like before, can have escape sequences, but the notation is c`...`. For example:

  • 'n' becomes c`n`
  • '\n' becomes c`\n`
  • '\x{6e}' becomes c`\x{6e}`
  • '' doesn't have any meaning in C++2, becuase character literals cannot be empty and c`` is the backtick ` itself.

Character literals placed side-by-side are not concatenated. Multi-character literals must have prefix b which means 'ABCD' becomes b`ABCD`, because multi-character literals have a different underlying type, they should be visually different. For example:

x1 := c`A` c`B` c`C` c`D`; // ERROR!
x2 := c`A`  `B`  `C`  `D`; // ERROR!
x3 := c`ABCD`; // ERROR!
x4 := b`ABCD`; // OK.

We can use other notations for character literals, my recommended notation c`...` has two benefits:

  • It's not possible to have an empty character literal, c`` is simply the backtick ` itself (similar to double backticks inside string literals).
  • Only backtick ` is enough for both string literals and character literals, and if C++2 use underline _ (or backtick `) instead of single quote ' as digit separator e.g. 1'500'444 becomes 1_500_444 (similar to Python language) (or 1`500`444), then it's possible to reserve double quotes " and single quotes ' for future use either as new operators or new literals.

Will your feature suggestion eliminate X% of security vulnerabilities of a given kind in current C++ code?

No.

Will your feature suggestion automate or eliminate X% of current C++ guidance literature?

Yes. It will do in the following ways:

  • Unifying
    • If we only have one string literal, all questions about when to use which one of string literals, will be gone.
  • Simplicity
    • Having only one string literal is simpler and easier to teach than teaching students about the differences between string literals and why we have n-number of them.
  • Integration
    • In this way, interpolated string literals will be integrated into the language, this will allow new features to be added without introducing new escape character (such as \ or ()$ or etc) for each feature in string literals, because in addition to escape sequences and captures, another new expressions can be added later. The point is, all of them are available just with a single backtick ` instead of introducing new escape characters inside string literals such as \ or ()$ or etc. A single backtick ` may end the string literal, may be a backtick itself (with double backticks ``) and may be an escape sequence (`\...`) or a capture (`...$`) or a combination of them. In addition, more expressions can be allowed besides escape sequences and captures.

Will your feature suggestion remove unnecessary syntax or concepts?

Yes, my suggestion is a little verbose. Backtick ` will be used for quoting both string literals and character literals. Also if we use underline _ or backtick ` as digit separator (e.g. 1_500_444 or 1`500`444), then it allows C++2 to use both double quotes " and single quotes ' either as new operators or new literals. For example:

x := n' * m";

Also escape sequences \' and \" for quotes are not needed anymore, and escape sequence \` is not needed for backtick.

Metadata

Metadata

Assignees

No one assigned

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions