Skip to content

Commit 7a66694

Browse files
committed
pod and comments: Note escape vs quote
Fixes #15221 The documentation and comments were misleading about conflating quoting a metacharacter and escaping it. Since \Q stands for quote, we have to continue to use that terminology. This commit clarifies that the two terms are often equivalent. This also adds detail about quotemeta and \Q.
1 parent ce6e6e6 commit 7a66694

File tree

7 files changed

+65
-31
lines changed

7 files changed

+65
-31
lines changed

pod/perldiag.pod

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -2602,8 +2602,8 @@ and perl's F</dev/null> emulation was unable to create an empty temporary file.
26022602
(W regexp)(F) A character class range must start and end at a literal
26032603
character, not another character class like C<\d> or C<[:alpha:]>. The "-"
26042604
in your false range is interpreted as a literal "-". In a C<(?[...])>
2605-
construct, this is an error, rather than a warning. Consider quoting
2606-
the "-", "\-". The S<<-- HERE> shows whereabouts in the regular expression
2605+
construct, this is an error, rather than a warning. Consider escaping
2606+
the "-" as "\-". The S<<-- HERE> shows whereabouts in the regular expression
26072607
the problem was discovered. See L<perlre>.
26082608

26092609
=item Fatal VMS error (status=%d) at %s, line %d
@@ -5453,7 +5453,7 @@ S<<-- HERE> in m/%s/
54535453
(F) Within regular expression character classes ([]) the syntax beginning
54545454
with "[." and ending with ".]" is reserved for future extensions. If you
54555455
need to represent those character sequences inside a regular expression
5456-
character class, just quote the square brackets with the backslash: "\[."
5456+
character class, just escape the square brackets with the backslash: "\[."
54575457
and ".\]". The S<<-- HERE> shows whereabouts in the regular expression the
54585458
problem was discovered. See L<perlre>.
54595459

@@ -5463,7 +5463,7 @@ S<<-- HERE> in m/%s/
54635463
(F) Within regular expression character classes ([]) the syntax beginning
54645464
with "[=" and ending with "=]" is reserved for future extensions. If you
54655465
need to represent those character sequences inside a regular expression
5466-
character class, just quote the square brackets with the backslash: "\[="
5466+
character class, just escape the square brackets with the backslash: "\[="
54675467
and "=\]". The S<<-- HERE> shows whereabouts in the regular expression the
54685468
problem was discovered. See L<perlre>.
54695469

pod/perlfunc.pod

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -6536,6 +6536,11 @@ the C<\Q> escape in double-quoted strings.
65366536

65376537
If EXPR is omitted, uses L<C<$_>|perlvar/$_>.
65386538

6539+
The motivation behind this is to make all characters in EXPR match their
6540+
literal selves. Otherwise any metacharacters in it could trigger
6541+
their "magic" matching behaviors. The characters this function has been
6542+
applied to are said to be "quoted" or "escaped".
6543+
65396544
quotemeta (and C<\Q> ... C<\E>) are useful when interpolating strings into
65406545
regular expressions, because by default an interpolated variable will be
65416546
considered a mini-regular expression. For example:

pod/perlre.pod

Lines changed: 45 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -1350,32 +1350,61 @@ X</p> X<p modifier>
13501350

13511351
=head2 Quoting metacharacters
13521352

1353-
Backslashed metacharacters in Perl are alphanumeric, such as C<\b>,
1354-
C<\w>, C<\n>. Unlike some other regular expression languages, there
1355-
are no backslashed symbols that aren't alphanumeric. So anything
1356-
that looks like C<\\>, C<\(>, C<\)>, C<\[>, C<\]>, C<\{>, or C<\}> is
1357-
always
1358-
interpreted as a literal character, not a metacharacter. This was
1359-
once used in a common idiom to disable or quote the special meanings
1360-
of regular expression metacharacters in a string that you want to
1361-
use for a pattern. Simply quote all non-"word" characters:
1353+
(Also known as "escaping".)
13621354

1363-
$pattern =~ s/(\W)/\\$1/g;
1355+
To cause a metacharacter to match its literal self, you precede it with
1356+
a backslash. Unlike some other regular expression languages, any
1357+
sequence consisting of a backslash followed by a non-alphanumeric
1358+
matches that non-alphanumeric, literally. So things like C<\\>, C<\(>,
1359+
C<\)>, C<\[>, C<\]>, C<\{>, or C<\}> are always interpreted as the
1360+
literal character that follows the backslash.
1361+
1362+
(That's not true when an alphanumeric character is preceded by a
1363+
backslash. There are a few such "escape sequences", like C<\w>, which have
1364+
special matching behaviors in Perl. All such are currently limited to
1365+
ASCII-range alphanumerics.)
1366+
1367+
The best method to escape metacharacters is to use the
1368+
C<L<quotemeta()|perlfunc/quotemeta>> function, or the equivalent, but the
1369+
more flexible, and often more convenient, C<\Q> metaquoting escape
1370+
sequence
1371+
1372+
quotemeta $pattern;
1373+
1374+
This changes C<$pattern> so that the metacharacters are quoted. You can
1375+
then do
1376+
1377+
$string =~ s/$pattern/foo/;
13641378

1365-
(If C<use locale> is set, then this depends on the current locale.)
1366-
Today it is more common to use the C<L<quotemeta()|perlfunc/quotemeta>>
1367-
function or the C<\Q> metaquoting escape sequence to disable all
1368-
metacharacters' special meanings like this:
1379+
and be assured that any metacharacters in C<$pattern> will match their
1380+
literal selves. If you instead use C<\Q>, like:
13691381

1370-
/$unquoted\Q$quoted\E$unquoted/
1382+
$string =~ s/\Qpattern/foo/;
1383+
1384+
you don't have to have a separate C<$pattern> variable. Further, there
1385+
is an additional escape sequence, C<\E> that can be combined with C<\Q>
1386+
to allow you to escape whatever portions of the pattern you desire:
1387+
1388+
$string =~ s/$unquoted\Q$quoted\E$unquoted/foo/;
13711389

13721390
Beware that if you put literal backslashes (those not inside
13731391
interpolated variables) between C<\Q> and C<\E>, double-quotish
13741392
backslash interpolation may lead to confusing results. If you
13751393
I<need> to use literal backslashes within C<\Q...\E>,
13761394
consult L<perlop/"Gory details of parsing quoted constructs">.
13771395

1378-
C<quotemeta()> and C<\Q> are fully described in L<perlfunc/quotemeta>.
1396+
In older code, you may see something like this:
1397+
1398+
$pattern =~ s/(\W)/\\$1/g;
1399+
$string =~ s/$pattern/foo/;
1400+
1401+
This simply adds backslashes before all non-"word" characters to disable
1402+
any special meanings they might have. (If S<C<use locale>> is in
1403+
effect, the current locale can affect the results.) This paradigm is
1404+
inadequate for Unicode.
1405+
1406+
C<quotemeta()> and C<\Q> are more fully described in
1407+
L<perlfunc/quotemeta>.
13791408

13801409
=head2 Extended Patterns
13811410

pod/perlrebackslash.pod

Lines changed: 7 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -90,8 +90,8 @@ as C<Not in [].>
9090
\o{} Octal escape sequence.
9191
\p{}, \pP Match any character with the given Unicode property.
9292
\P{}, \PP Match any character without the given property.
93-
\Q Quote (disable) pattern metacharacters till \E. Not
94-
in [].
93+
\Q Quote (disable) pattern metacharacters till \E.
94+
(Also called "escape".) Not in [].
9595
\r Return character.
9696
\R Generic new line. Not in [].
9797
\s Match any whitespace character.
@@ -350,11 +350,11 @@ them, until either the end of the pattern or the next occurrence of
350350
C<\E>, whichever comes first. They provide functionality similar to what
351351
the functions C<lc> and C<uc> provide.
352352

353-
C<\Q> is used to quote (disable) pattern metacharacters, up to the next
354-
C<\E> or the end of the pattern. C<\Q> adds a backslash to any character
355-
that could have special meaning to Perl. In the ASCII range, it quotes
356-
every character that isn't a letter, digit, or underscore. See
357-
L<perlfunc/quotemeta> for details on what gets quoted for non-ASCII
353+
C<\Q> is used to quote or escape (disable) pattern metacharacters, up to
354+
the next C<\E> or the end of the pattern. C<\Q> adds a backslash to any
355+
character that could have special meaning to Perl. In the ASCII range,
356+
it quotes every character that isn't a letter, digit, or underscore.
357+
See L<perlfunc/quotemeta> for details on what gets quoted for non-ASCII
358358
code points. Using this ensures that any character between C<\Q> and
359359
C<\E> will be matched literally, not interpreted as a metacharacter by
360360
the regex engine.

pod/perlreref.pod

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -318,7 +318,7 @@ Captured groups are numbered according to their I<opening> paren.
318318
fc Foldcase a string
319319

320320
pos Return or set current match position
321-
quotemeta Quote metacharacters
321+
quotemeta Quote metacharacters (escape their normal meaning)
322322
reset Reset m?pattern? status
323323
study Analyze string for optimizing matching
324324

pod/perlretut.pod

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -187,7 +187,7 @@ C<"["> respectively; other gotchas apply.
187187
The significance of each of these will be explained
188188
in the rest of the tutorial, but for now, it is important only to know
189189
that a metacharacter can be matched as-is by putting a backslash before
190-
it:
190+
it. This is called "escaping" or "quoting" it. Some examples:
191191

192192
"2+2=4" =~ /2+2/; # doesn't match, + is a metacharacter
193193
"2+2=4" =~ /2\+2/; # matches, \+ is treated like an ordinary +

pp.c

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -5082,7 +5082,7 @@ PP(pp_quotemeta)
50825082
else if (UTF8_IS_NEXT_CHAR_DOWNGRADEABLE(s, s + len)) {
50835083
if (
50845084
#ifdef USE_LOCALE_CTYPE
5085-
/* In locale, we quote all non-ASCII Latin1 chars.
5085+
/* In locale, we escape all non-ASCII Latin1 chars.
50865086
* Otherwise use the quoting rules */
50875087

50885088
IN_LC_RUNTIME(LC_CTYPE)
@@ -5116,7 +5116,7 @@ PP(pp_quotemeta)
51165116
}
51175117
}
51185118
else {
5119-
/* For non UNI_8_BIT (and hence in locale) just quote all \W
5119+
/* For non UNI_8_BIT (and hence in locale) just escape all \W
51205120
* including everything above ASCII */
51215121
while (len--) {
51225122
if (!isWORDCHAR_A(*s))

0 commit comments

Comments
 (0)