Skip to content

pod and comments: Note escape vs quote #23264

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
May 10, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 4 additions & 4 deletions pod/perldiag.pod
Original file line number Diff line number Diff line change
Expand Up @@ -2602,8 +2602,8 @@ and perl's F</dev/null> emulation was unable to create an empty temporary file.
(W regexp)(F) A character class range must start and end at a literal
character, not another character class like C<\d> or C<[:alpha:]>. The "-"
in your false range is interpreted as a literal "-". In a C<(?[...])>
construct, this is an error, rather than a warning. Consider quoting
the "-", "\-". The S<<-- HERE> shows whereabouts in the regular expression
construct, this is an error, rather than a warning. Consider escaping
the "-" as "\-". The S<<-- HERE> shows whereabouts in the regular expression
the problem was discovered. See L<perlre>.

=item Fatal VMS error (status=%d) at %s, line %d
Expand Down Expand Up @@ -5453,7 +5453,7 @@ S<<-- HERE> in m/%s/
(F) Within regular expression character classes ([]) the syntax beginning
with "[." and ending with ".]" is reserved for future extensions. If you
need to represent those character sequences inside a regular expression
character class, just quote the square brackets with the backslash: "\[."
character class, just escape the square brackets with the backslash: "\[."
and ".\]". The S<<-- HERE> shows whereabouts in the regular expression the
problem was discovered. See L<perlre>.

Expand All @@ -5463,7 +5463,7 @@ S<<-- HERE> in m/%s/
(F) Within regular expression character classes ([]) the syntax beginning
with "[=" and ending with "=]" is reserved for future extensions. If you
need to represent those character sequences inside a regular expression
character class, just quote the square brackets with the backslash: "\[="
character class, just escape the square brackets with the backslash: "\[="
and "=\]". The S<<-- HERE> shows whereabouts in the regular expression the
problem was discovered. See L<perlre>.

Expand Down
5 changes: 5 additions & 0 deletions pod/perlfunc.pod
Original file line number Diff line number Diff line change
Expand Up @@ -6536,6 +6536,11 @@ the C<\Q> escape in double-quoted strings.

If EXPR is omitted, uses L<C<$_>|perlvar/$_>.

The motivation behind this is to make all characters in EXPR match their
literal selves. Otherwise any metacharacters in it could trigger
their "magic" matching behaviors. The characters this function has been
applied to are said to be "quoted" or "escaped".

quotemeta (and C<\Q> ... C<\E>) are useful when interpolating strings into
regular expressions, because by default an interpolated variable will be
considered a mini-regular expression. For example:
Expand Down
30 changes: 26 additions & 4 deletions pod/perlintro.pod
Original file line number Diff line number Diff line change
Expand Up @@ -584,10 +584,32 @@ the meantime, here's a quick cheat sheet:
^ start of string
$ end of string

Quantifiers can be used to specify how many of the previous thing you
want to match on, where "thing" means either a literal character, one
of the metacharacters listed above, or a group of characters or
metacharacters in parentheses.
Note that in the above, C<$> doesn't match a dollar sign. Similarly
C<.>, C<\>, C<[>, C<]>, C<(>, C<)>, and C<^> don't match the characters
you might expect. These are called "metacharacters". In contrast, the
characters C<a>, C<e>, C<i>, C<o>, and C<u>, for example, are not
metacharacters. They match themselves literally. Metacharacters
normally match something that isn't their literal value. There are a few
more metacharacters than the ones above. Some quantifier ones are
given below, and the full list is in L<perlre/Metacharacters>.

To make a metacharacter match its literal value, you "escape" (or "quote")
it, by preceding it with a backslash. Hence, C<\$> does match a dollar sign,
and C<\\> matches a literal backslash.

Note also that above, the string C<\s>, for example, doesn't match a
backslash followed by the letter C<s>. In this case, preceding the
non-metacharacter C<s> with a backslash turns it into something that
doesn't match its literal value. Such a sequence is called an "escape
sequence". L<perlrebackslash> documents all of the current ones.

A warning is raised if you escape a character that isn't a metacharacter
and isn't part of a currently defined escape sequence.

You can specify how many of the previous thing you want to match on by
using quantifiers (where "thing" means one of: a literal character, one
of the constructs listed above, or a group of either of them in
parentheses).

* zero or more of the previous thing
+ one or more of the previous thing
Expand Down
65 changes: 48 additions & 17 deletions pod/perlre.pod
Original file line number Diff line number Diff line change
Expand Up @@ -1348,34 +1348,61 @@ their punctuation character equivalents, however at the trade-off that you
have to tell perl when you want to use them.
X</p> X<p modifier>

=head2 Quoting metacharacters
=head2 Quoting (escaping) metacharacters

Backslashed metacharacters in Perl are alphanumeric, such as C<\b>,
C<\w>, C<\n>. Unlike some other regular expression languages, there
are no backslashed symbols that aren't alphanumeric. So anything
that looks like C<\\>, C<\(>, C<\)>, C<\[>, C<\]>, C<\{>, or C<\}> is
always
interpreted as a literal character, not a metacharacter. This was
once used in a common idiom to disable or quote the special meanings
of regular expression metacharacters in a string that you want to
use for a pattern. Simply quote all non-"word" characters:
To cause a metacharacter to match its literal self, you precede it with
a backslash. Unlike some other regular expression languages, any
sequence consisting of a backslash followed by a non-alphanumeric
matches that non-alphanumeric, literally. So things like C<\\>, C<\(>,
C<\)>, C<\[>, C<\]>, C<\{>, or C<\}> are always interpreted as the
literal character that follows the backslash.

$pattern =~ s/(\W)/\\$1/g;
(That's not true when an alphanumeric character is preceded by a
backslash. There are a few such "escape sequences", like C<\w>, which have
special matching behaviors in Perl. All such are currently limited to
ASCII-range alphanumerics.)

The best method to escape metacharacters is to use the
C<L<quotemeta()|perlfunc/quotemeta>> function, or the equivalent, but the
more flexible, and often more convenient, C<\Q> metaquoting escape
sequence

quotemeta $pattern;

This changes C<$pattern> so that the metacharacters are quoted. You can
then do

$string =~ s/$pattern/foo/;

and be assured that any metacharacters in C<$pattern> will match their
literal selves. If you instead use C<\Q>, like:

$string =~ s/\Qpattern/foo/;

(If C<use locale> is set, then this depends on the current locale.)
Today it is more common to use the C<L<quotemeta()|perlfunc/quotemeta>>
function or the C<\Q> metaquoting escape sequence to disable all
metacharacters' special meanings like this:
you don't have to have a separate C<$pattern> variable. Further, there
is an additional escape sequence, C<\E> that can be combined with C<\Q>
to allow you to escape whatever portions of the pattern you desire:

/$unquoted\Q$quoted\E$unquoted/
$string =~ s/$unquoted\Q$quoted\E$unquoted/foo/;

Beware that if you put literal backslashes (those not inside
interpolated variables) between C<\Q> and C<\E>, double-quotish
backslash interpolation may lead to confusing results. If you
I<need> to use literal backslashes within C<\Q...\E>,
consult L<perlop/"Gory details of parsing quoted constructs">.

C<quotemeta()> and C<\Q> are fully described in L<perlfunc/quotemeta>.
In older code, you may see something like this:

$pattern =~ s/(\W)/\\$1/g;
$string =~ s/$pattern/foo/;

This simply adds backslashes before all non-"word" characters to disable
any special meanings they might have. (If S<C<use locale>> is in
effect, the current locale can affect the results.) This paradigm is
inadequate for Unicode.

C<quotemeta()> and C<\Q> are more fully described in
L<perlfunc/quotemeta>.

=head2 Extended Patterns

Expand Down Expand Up @@ -3384,6 +3411,10 @@ Subroutine call to a named capture group. Equivalent to C<< (?&I<NAME>) >>.

=back

=head2 Quoting metacharacters

This section has been replaced by L</Quoting (escaping) metacharacters>.

=head1 BUGS

There are a number of issues with regard to case-insensitive matching
Expand Down
14 changes: 7 additions & 7 deletions pod/perlrebackslash.pod
Original file line number Diff line number Diff line change
Expand Up @@ -90,8 +90,8 @@ as C<Not in [].>
\o{} Octal escape sequence.
\p{}, \pP Match any character with the given Unicode property.
\P{}, \PP Match any character without the given property.
\Q Quote (disable) pattern metacharacters till \E. Not
in [].
\Q Quote (disable) pattern metacharacters till \E.
(Also called "escape".) Not in [].
\r Return character.
\R Generic new line. Not in [].
\s Match any whitespace character.
Expand Down Expand Up @@ -350,11 +350,11 @@ them, until either the end of the pattern or the next occurrence of
C<\E>, whichever comes first. They provide functionality similar to what
the functions C<lc> and C<uc> provide.

C<\Q> is used to quote (disable) pattern metacharacters, up to the next
C<\E> or the end of the pattern. C<\Q> adds a backslash to any character
that could have special meaning to Perl. In the ASCII range, it quotes
every character that isn't a letter, digit, or underscore. See
L<perlfunc/quotemeta> for details on what gets quoted for non-ASCII
C<\Q> is used to quote or escape (disable) pattern metacharacters, up to
the next C<\E> or the end of the pattern. C<\Q> adds a backslash to any
character that could have special meaning to Perl. In the ASCII range,
it quotes every character that isn't a letter, digit, or underscore.
See L<perlfunc/quotemeta> for details on what gets quoted for non-ASCII
code points. Using this ensures that any character between C<\Q> and
C<\E> will be matched literally, not interpreted as a metacharacter by
the regex engine.
Expand Down
2 changes: 1 addition & 1 deletion pod/perlreref.pod
Original file line number Diff line number Diff line change
Expand Up @@ -318,7 +318,7 @@ Captured groups are numbered according to their I<opening> paren.
fc Foldcase a string

pos Return or set current match position
quotemeta Quote metacharacters
quotemeta Quote metacharacters (escape their normal meaning)
reset Reset m?pattern? status
study Analyze string for optimizing matching

Expand Down
2 changes: 1 addition & 1 deletion pod/perlretut.pod
Original file line number Diff line number Diff line change
Expand Up @@ -187,7 +187,7 @@ C<"["> respectively; other gotchas apply.
The significance of each of these will be explained
in the rest of the tutorial, but for now, it is important only to know
that a metacharacter can be matched as-is by putting a backslash before
it:
it. This is called "escaping" or "quoting" it. Some examples:

"2+2=4" =~ /2+2/; # doesn't match, + is a metacharacter
"2+2=4" =~ /2\+2/; # matches, \+ is treated like an ordinary +
Expand Down
4 changes: 2 additions & 2 deletions pp.c
Original file line number Diff line number Diff line change
Expand Up @@ -5082,7 +5082,7 @@ PP(pp_quotemeta)
else if (UTF8_IS_NEXT_CHAR_DOWNGRADEABLE(s, s + len)) {
if (
#ifdef USE_LOCALE_CTYPE
/* In locale, we quote all non-ASCII Latin1 chars.
/* In locale, we escape all non-ASCII Latin1 chars.
* Otherwise use the quoting rules */

IN_LC_RUNTIME(LC_CTYPE)
Expand Down Expand Up @@ -5116,7 +5116,7 @@ PP(pp_quotemeta)
}
}
else {
/* For non UNI_8_BIT (and hence in locale) just quote all \W
/* For non UNI_8_BIT (and hence in locale) just escape all \W
* including everything above ASCII */
while (len--) {
if (!isWORDCHAR_A(*s))
Expand Down
Loading