From 7bd98b9bc9ae75166fb7532fc7da11bab06d86aa Mon Sep 17 00:00:00 2001 From: Karl Williamson Date: Wed, 13 Jan 2021 15:10:43 -0700 Subject: [PATCH] perlretut: Grammar, clarifications, white-space --- pod/perlretut.pod | 52 +++++++++++++++++++++++++---------------------- 1 file changed, 28 insertions(+), 24 deletions(-) diff --git a/pod/perlretut.pod b/pod/perlretut.pod index cb4654f552b6..9b198e282549 100644 --- a/pod/perlretut.pod +++ b/pod/perlretut.pod @@ -20,8 +20,10 @@ expressions will allow you to manipulate text with surprising ease. What is a regular expression? At its most basic, a regular expression is a template that is used to determine if a string has certain characteristics. The string is most often some text, such as a line, -sentence, web page, or even a whole book, but less commonly it could be -some binary data as well. +sentence, web page, or even a whole book, but it doesn't have to be. It +could be binary data, for example. Biologists often use Perl to look +for patterns in long DNA sequences. + Suppose we want to determine if the text in variable, C<$var> contains the sequence of characters S> (blanks added for legibility). We can write in Perl @@ -29,8 +31,9 @@ the sequence of characters S> $var =~ m/mushroom/ The value of this expression will be TRUE if C<$var> contains that -sequence of characters, and FALSE otherwise. The portion enclosed in -C<'E'> characters denotes the characteristic we are looking for. +sequence of characters anywhere within it, and FALSE otherwise. The +portion enclosed in C<'E'> characters denotes the characteristic we +are looking for. We use the term I for it. The process of looking to see if the pattern occurs in the string is called I, and the C<"=~"> operator along with the C tell Perl to try to match the pattern @@ -60,7 +63,7 @@ many examples. The first part of the tutorial will progress from the simplest word searches to the basic regular expression concepts. If you master the first part, you will have all the tools needed to solve about 98% of your needs. The second part of the tutorial is for those -comfortable with the basics and hungry for more power tools. It +comfortable with the basics, and hungry for more power tools. It discusses the more advanced regular expression operators and introduces the latest cutting-edge innovations. @@ -135,7 +138,7 @@ And finally, the C default delimiters for a match can be changed to arbitrary delimiters by putting an C<'m'> out front: "Hello World" =~ m!World!; # matches, delimited by '!' - "Hello World" =~ m{World}; # matches, note the matching '{}' + "Hello World" =~ m{World}; # matches, note the paired '{}' "/usr/bin/perl" =~ m"/perl"; # matches after '/usr/bin', # '/' becomes an ordinary char @@ -151,7 +154,7 @@ Let's consider how different regexps would match C<"Hello World">: "Hello World" =~ /oW/; # doesn't match "Hello World" =~ /World /; # doesn't match -The first regexp C doesn't match because regexps are +The first regexp C doesn't match because regexps are by default case-sensitive. The second regexp matches because the substring S> occurs in the string S>. The space character C<' '> is treated like any other character in a regexp and is @@ -169,8 +172,8 @@ always match at the earliest possible point in the string: "That hat is red" =~ /hat/; # matches 'hat' in 'That' With respect to character matching, there are a few more points you -need to know about. First of all, not all characters can be used "as -is" in a match. Some characters, called I, are +need to know about. First of all, not all characters can be used +"as-is" in a match. Some characters, called I, are generally reserved for use in regexp notation. The metacharacters are {}[]()^$.|*+?-#\ @@ -832,8 +835,8 @@ Counting the opening parentheses to get the correct number for a backreference is error-prone as soon as there is more than one capturing group. A more convenient technique became available with Perl 5.10: relative backreferences. To refer to the immediately -preceding capture group one now may write C<\g{-1}>, the next but -last is available via C<\g{-2}>, and so on. +preceding capture group one now may write C<\g-1> or C<\g{-1}>, the next but +last is available via C<\g-2> or C<\g{-2}>, and so on. Another good reason in addition to readability and maintainability for using relative backreferences is illustrated by the following example, @@ -1970,10 +1973,11 @@ C<\x>I (without curly braces and I are two hex digits) doesn't go further than 255. (Starting in Perl 5.14, if you're an octal fan, you can also use C<\o{oct}>.) - /\x{263a}/; # match a Unicode smiley face :) + /\x{263a}/; # match a Unicode smiley face :) + /\x{ 263a }/; # Same B: In Perl 5.6.0 it used to be that one needed to say C to use any Unicode features. This is no more the case: for +utf8> to use any Unicode features. This is no longer the case: for almost all Unicode processing, the explicit C pragma is not needed. (The only case where it matters is if your Perl script is in Unicode and encoded in UTF-8, then an explicit C is needed.) @@ -2050,16 +2054,16 @@ C<\p{Mark}>, meaning things like accent marks. The Unicode C<\p{Script}> and C<\p{Script_Extensions}> properties are used to categorize every Unicode character into the language script it -is written in. (C is an improved version of -C