-
Notifications
You must be signed in to change notification settings - Fork 578
regex (?<name>...) capture-to-var paren, new $^N magic variable #4142
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
From [email protected]SUMMARY: 2) I created a new magic variable 3) If this is of interest, is $^N a good name? Feeling envy for highly advanced :-) languages like Java and Visual Basic, (?<myval>\d+) (sets $myval to the digits captured), I thought I'd use regex overloading (\d+) (?{ (capture to normal parens, then use $+ to access it and assign to the variable) This works fine except for a few details, such as being able to have So, I created a new magic variable, Now, converting to (\d+) (?{ works even with nesting. This is very nice. I can send the patches if wanted. But if wanted, what is a good variable name? For those interested, I've appended my package to allow this, and a short There are still "issues" with my overloading -- the variables named are not So, it really would be nice if named captures were officially supported. I Jeffrey Here is a short test script: ---snip------------------------------------------------------------------ my $areacode; ## both lexical "My number is 408-555-1212." =~ m{ print "phone number is: ($areacode) $exchange-$number\n"; my $fullnumber; "My number is 408-555-1212." =~ m{ if ($fullnumber ne "$areacode-$exchange-$number") Here is the package: ---snip------------------------------------------------------------------ ## ## our $OpenParens; ## needed for matching nested parens my $NestedParenGuts = qr{ (?{ local $OpenParens = 0 }) ## counts the number of nested opens waiting to close (?> ## escaped stuff # another opening paren # a closing paren, if we're expecting any ## $re =~ s{ return $re; ## return mangled regex 1; Perl Info
|
From [Unknown Contact. See original ticket]Well, since there has been no objection, here is the patch. Inline Patchdiff -u -r .orig/embedvar.h ./embedvar.h
--- .orig/embedvar.h Wed Jun 20 11:35:50 2001
+++ ./embedvar.h Tue Jun 26 12:16:35 2001
@@ -113,6 +113,7 @@
#define PL_regint_start (vTHX->Tregint_start)
#define PL_regint_string (vTHX->Tregint_string)
#define PL_reginterp_cnt (vTHX->Treginterp_cnt)
+#define PL_reglastcloseparen (vTHX->Treglastcloseparen)
#define PL_reglastparen (vTHX->Treglastparen)
#define PL_regnarrate (vTHX->Tregnarrate)
#define PL_regnaughty (vTHX->Tregnaughty)
@@ -821,6 +822,7 @@
#define PL_regint_start (aTHXo->interp.Tregint_start)
#define PL_regint_string (aTHXo->interp.Tregint_string)
#define PL_reginterp_cnt (aTHXo->interp.Treginterp_cnt)
+#define PL_reglastcloseparen (aTHXo->interp.Treglastcloseparen)
#define PL_reglastparen (aTHXo->interp.Treglastparen)
#define PL_regnarrate (aTHXo->interp.Tregnarrate)
#define PL_regnaughty (aTHXo->interp.Tregnaughty)
@@ -1518,6 +1520,7 @@
#define PL_regint_start (aTHX->Tregint_start)
#define PL_regint_string (aTHX->Tregint_string)
#define PL_reginterp_cnt (aTHX->Treginterp_cnt)
+#define PL_reglastcloseparen (aTHX->Treglastcloseparen)
#define PL_reglastparen (aTHX->Treglastparen)
#define PL_regnarrate (aTHX->Tregnarrate)
#define PL_regnaughty (aTHX->Tregnaughty)
@@ -1654,6 +1657,7 @@
#define PL_Tregint_start PL_regint_start
#define PL_Tregint_string PL_regint_string
#define PL_Treginterp_cnt PL_reginterp_cnt
+#define PL_Treglastcloseparen PL_reglastcloseparen
#define PL_Treglastparen PL_reglastparen
#define PL_Tregnarrate PL_regnarrate
#define PL_Tregnaughty PL_regnaughty
diff -u -r .orig/gv.c ./gv.c
--- .orig/gv.c Mon Jun 25 08:08:15 2001
+++ ./gv.c Tue Jun 26 12:35:10 2001
@@ -895,6 +895,7 @@
case '\006': /* $^F */
case '\010': /* $^H */
case '\011': /* $^I, NOT \t in EBCDIC */
+ case '\016': /* $^N */
case '\020': /* $^P */
case '\024': /* $^T */
if (len > 1)
@@ -1764,6 +1765,7 @@
case '\010': /* $^H */
case '\011': /* $^I, NOT \t in EBCDIC */
case '\014': /* $^L */
+ case '\016': /* $^N */
case '\020': /* $^P */
case '\023': /* $^S */
case '\024': /* $^T */
diff -u -r .orig/mg.c ./mg.c
--- .orig/mg.c Mon Jun 25 07:48:03 2001
+++ ./mg.c Tue Jun 26 12:36:16 2001
@@ -435,6 +435,13 @@
goto getparen;
}
return 0;
+ case '\016': /* ^N */
+ if (PL_curpm && (rx = PM_GETRE(PL_curpm))) {
+ paren = rx->lastcloseparen;
+ if (paren)
+ goto getparen;
+ }
+ return 0;
case '`':
if (PL_curpm && (rx = PM_GETRE(PL_curpm))) {
if (rx->startp[0] != -1) {
@@ -655,6 +662,14 @@
case '+':
if (PL_curpm && (rx = PM_GETRE(PL_curpm))) {
paren = rx->lastparen;
+ if (paren)
+ goto getparen;
+ }
+ sv_setsv(sv,&PL_sv_undef);
+ break;
+ case '\016': /* ^N */
+ if (PL_curpm && (rx = PM_GETRE(PL_curpm))) {
+ paren = rx->lastcloseparen;
if (paren)
goto getparen;
}
diff -u -r .orig/perlapi.h ./perlapi.h
--- .orig/perlapi.h Wed Jun 20 11:35:50 2001
+++ ./perlapi.h Tue Jun 26 12:16:35 2001
@@ -802,6 +802,8 @@
#define PL_regint_string (*Perl_Tregint_string_ptr(aTHXo))
#undef PL_reginterp_cnt
#define PL_reginterp_cnt (*Perl_Treginterp_cnt_ptr(aTHXo))
+#undef PL_reglastcloseparen
+#define PL_reglastcloseparen (*Perl_Treglastcloseparen_ptr(aTHXo))
#undef PL_reglastparen
#define PL_reglastparen (*Perl_Treglastparen_ptr(aTHXo))
#undef PL_regnarrate
diff -u -r .orig/regexec.c ./regexec.c
--- .orig/regexec.c Thu Jun 21 07:16:49 2001
+++ ./regexec.c Tue Jun 26 12:28:14 2001
@@ -147,7 +147,7 @@
if (paren_elems_to_push < 0)
Perl_croak(aTHX_ "panic: paren_elems_to_push < 0");
-#define REGCP_OTHER_ELEMS 5
+#define REGCP_OTHER_ELEMS 6
SSCHECK(paren_elems_to_push + REGCP_OTHER_ELEMS);
for (p = PL_regsize; p > parenfloor; p--) {
/* REGCP_PARENS_ELEMS are pushed per pairs of parentheses. */
@@ -159,6 +159,7 @@
/* REGCP_OTHER_ELEMS are pushed in any case, parentheses or no. */
SSPUSHINT(PL_regsize);
SSPUSHINT(*PL_reglastparen);
+ SSPUSHINT(*PL_reglastcloseparen);
SSPUSHPTR(PL_reginput);
#define REGCP_FRAME_ELEMS 2
/* REGCP_FRAME_ELEMS are part of the REGCP_OTHER_ELEMS and
@@ -192,6 +193,7 @@
assert(i == SAVEt_REGCONTEXT); /* Check that the magic cookie is there. */
i = SSPOPINT; /* Parentheses elements to pop. */
input = (char *) SSPOPPTR;
+ *PL_reglastcloseparen = SSPOPINT;
*PL_reglastparen = SSPOPINT;
PL_regsize = SSPOPINT;
@@ -1871,6 +1873,7 @@
PL_regstartp = prog->startp;
PL_regendp = prog->endp;
PL_reglastparen = &prog->lastparen;
+ PL_reglastcloseparen = &prog->lastcloseparen;
prog->lastparen = 0;
PL_regsize = 0;
DEBUG_r(PL_reg_starttry = startpos);
@@ -2562,6 +2565,7 @@
cache_re(re);
state.ss = PL_savestack_ix;
*PL_reglastparen = 0;
+ *PL_reglastcloseparen = 0;
PL_reg_call_cc = &state;
PL_reginput = locinput;
@@ -2619,6 +2623,7 @@
PL_regendp[n] = locinput - PL_bostr;
if (n > *PL_reglastparen)
*PL_reglastparen = n;
+ *PL_reglastcloseparen = n;
break;
case GROUPP:
n = ARG(scan); /* which paren pair */
diff -u -r .orig/regexp.h ./regexp.h
--- .orig/regexp.h Sun Apr 22 09:12:37 2001
+++ ./regexp.h Tue Jun 26 12:10:53 2001
@@ -37,6 +37,7 @@
I32 prelen; /* length of precomp */
U32 nparens; /* number of parentheses */
U32 lastparen; /* last paren matched */
+ U32 lastcloseparen; /* last paren matched */
U32 reganch; /* Internal use only +
Tainted information used by regexec? */
regnode program[1]; /* Unwarranted chumminess with compiler. */
diff -u -r .orig/thrdvar.h ./thrdvar.h
--- .orig/thrdvar.h Mon Apr 30 05:29:37 2001
+++ ./thrdvar.h Tue Jun 26 12:12:52 2001
@@ -182,6 +182,7 @@
PERLVAR(Tregstartp, I32 *) /* Pointer to startp array. */
PERLVAR(Tregendp, I32 *) /* Ditto for endp. */
PERLVAR(Treglastparen, U32 *) /* Similarly for lastparen. */
+PERLVAR(Treglastcloseparen, U32 *) /* Similarly for lastcloseparen. */
PERLVAR(Tregtill, char *) /* How far we are required to go. */
PERLVAR(Tregcompat1, char) /* used to be regprev1 */
PERLVAR(Treg_start_tmp, char **) /* from regexec.c */ |
From [Unknown Contact. See original ticket]Inline Patchdiff -ru pod/.orig/perlretut.pod pod/perlretut.pod
--- pod/.orig/perlretut.pod Tue Jun 12 18:39:57 2001
+++ pod/perlretut.pod Fri Jun 29 23:51:13 2001
@@ -710,9 +710,12 @@
/(ab(cd|ef)((gi)|j))/;
1 2 34
-so that if the regexp matched, e.g., C<$2> would contain 'cd' or 'ef'.
-For convenience, perl sets C<$+> to the highest numbered C<$1>, C<$2>,
-... that got assigned.
+so that if the regexp matched, e.g., C<$2> would contain 'cd' or 'ef'. For
+convenience, perl sets C<$+> to the string held by the highest numbered
+C<$1>, C<$2>, ... that got assigned (and, somewhat related, C<$^N> to the
+value of the C<$1>, C<$2>, ... most-recently assigned; i.e. the C<$1>,
+C<$2>, ... associated with the rightmost closing parenthesis used in the
+match).
Closely associated with the matching variables C<$1>, C<$2>, ... are
the B<backreferences> C<\1>, C<\2>, ... . Backreferences are simply
diff -ru pod/.orig/perltoc.pod pod/perltoc.pod
--- pod/.orig/perltoc.pod Tue Jun 26 08:00:08 2001
+++ pod/perltoc.pod Fri Jun 29 23:45:32 2001
@@ -904,7 +904,7 @@
$EVAL_ERROR, $@, $PROCESS_ID, $PID, $$, $REAL_USER_ID, $UID, $<,
$EFFECTIVE_USER_ID, $EUID, $>, $REAL_GROUP_ID, $GID, $(,
$EFFECTIVE_GROUP_ID, $EGID, $), $PROGRAM_NAME, $0, $[, $], $COMPILING, $^C,
-$DEBUGGING, $^D, $SYSTEM_FD_MAX, $^F, $^H, %^H, $INPLACE_EDIT, $^I, $^M,
+$DEBUGGING, $^D, $SYSTEM_FD_MAX, $^F, $^H, %^H, $INPLACE_EDIT, $^I, $^M, $^N,
$OSNAME, $^O, $PERLDB, $^P, 0x01, 0x02, 0x04, 0x08, 0x10, 0x20, 0x40, 0x80,
0x100, 0x200, $LAST_REGEXP_CODE_RESULT, $^R, $EXCEPTIONS_BEING_CAUGHT, $^S,
$BASETIME, $^T, $PERL_VERSION, $^V, $WARNING, $^W, ${^WARNING_BITS},
diff -ru pod/.orig/perlvar.pod pod/perlvar.pod
--- pod/.orig/perlvar.pod Mon Jun 18 07:42:24 2001
+++ pod/perlvar.pod Sat Jun 30 00:04:05 2001
@@ -180,14 +180,29 @@
=item $+
-The last bracket matched by the last search pattern. This is useful if
-you don't know which one of a set of alternative patterns matched. For
-example:
+The text matched by the last bracket of the last successful search pattern.
+This is useful if you don't know which one of a set of alternative patterns
+matched. For example:
/Version: (.*)|Revision: (.*)/ && ($rev = $+);
(Mnemonic: be positive and forward looking.)
This variable is read-only and dynamically scoped to the current BLOCK.
+
+=item $^N
+
+The text matched by the used group most-recently closed (i.e. the group
+with the rightmost closing parenthesis) of the last successful search
+pattern. This is primarly used inside C<(?{...})> blocks for examining text
+recently matched. For example, to effectively capture text to a variable
+(in addition to C<$1>, C<$2>, etc.), replace C<(...)> with
+
+ (?:(...)(?{ $var = $^N }))
+
+By setting and then using C<$var> in this way relieves you from having to
+worry about exactly which numbered set of parentheses they are.
+
+This variable is dynamically scoped to the current BLOCK.
=item @LAST_MATCH_END
--- t/op/.orig/pat.t Thu Jun 28 20:17:51 2001
+++ t/op/pat.t Sat Jun 30 00:52:21 2001
@@ -6,7 +6,7 @@
$| = 1;
-print "1..639\n";
+print "1..660\n";
BEGIN {
chdir 't' if -d 't';
@@ -1854,3 +1854,38 @@
print "not " unless " " =~ /[[:print:]]/;
print "ok 639\n";
+##
+## Test basic $^N usage outside of a regex
+##
+$x = "abcdef";
+$T="ok 640\n";if ($x =~ /cde/ and not defined $^N) {print $T} else {print "not $T"};
+$T="ok 641\n";if ($x =~ /(cde)/ and $^N eq "cde") {print $T} else {print "not $T"};
+$T="ok 642\n";if ($x =~ /(c)(d)(e)/ and $^N eq "e") {print $T} else {print "not $T"};
+$T="ok 643\n";if ($x =~ /(c(d)e)/ and $^N eq "cde") {print $T} else {print "not $T"};
+$T="ok 644\n";if ($x =~ /(foo)|(c(d)e)/ and $^N eq "cde") {print $T} else {print "not $T"};
+$T="ok 645\n";if ($x =~ /(c(d)e)|(foo)/ and $^N eq "cde") {print $T} else {print "not $T"};
+$T="ok 646\n";if ($x =~ /(c(d)e)|(abc)/ and $^N eq "abc") {print $T} else {print "not $T"};
+$T="ok 647\n";if ($x =~ /(c(d)e)|(abc)x/ and $^N eq "cde") {print $T} else {print "not $T"};
+$T="ok 648\n";if ($x =~ /(c(d)e)(abc)?/ and $^N eq "cde") {print $T} else {print "not $T"};
+$T="ok 649\n";if ($x =~ /(?:c(d)e)/ and $^N eq "d" ) {print $T} else {print "not $T"};
+$T="ok 650\n";if ($x =~ /(?:c(d)e)(?:f)/ and $^N eq "d" ) {print $T} else {print "not $T"};
+$T="ok 651\n";if ($x =~ /(?:([abc])|([def]))*/ and $^N eq "f" ){print $T} else {print "not $T"};
+$T="ok 652\n";if ($x =~ /(?:([ace])|([bdf]))*/ and $^N eq "f" ){print $T} else {print "not $T"};
+$T="ok 653\n";if ($x =~ /(([ace])|([bd]))*/ and $^N eq "e" ){print $T} else {print "not $T"};
+{
+ $T="ok 654\n";if($x =~ /(([ace])|([bdf]))*/ and $^N eq "f" ){print $T} else {print "not $T"};
+}
+## test to see if $^N is automatically localized -- it should now
+## have the value set in test 653
+$T="ok 655\n";if ($^N eq "e" ){print $T} else {print "not $T"};
+
+##
+## Now test inside (?{...})
+##
+$T="ok 656\n";if ($x =~ /a([abc])(?{$y=$^N})c/ and $y eq "b" ){print $T} else {print "not $T"};
+$T="ok 657\n";if ($x =~ /a([abc]+)(?{$y=$^N})d/ and $y eq "bc"){print $T} else {print "not $T"};
+$T="ok 658\n";if ($x =~ /a([abcdefg]+)(?{$y=$^N})d/ and $y eq "bc"){print $T} else {print "not $T"};
+$T="ok 659\n";if ($x =~ /(a([abcdefg]+)(?{$y=$^N})d)(?{$z=$^N})e/ and $y eq "bc" and $z eq "abcd")
+ {print $T} else {print "not $T"};
+$T="ok 660\n";if ($x =~ /(a([abcdefg]+)(?{$y=$^N})de)(?{$z=$^N})/ and $y eq "bc" and $z eq "abcde")
+ {print $T} else {print "not $T"}; |
From @jhiThanks, applied. (base/lex.t need tweaking, too, since it assumed
|
From @jhiThanks, applied. |
From [Unknown Contact. See original ticket]On Jun 30, Jeffrey Friedl said:
Does that make $^N equal to substr(???, If so, you might want to include that in the docs for @- and/or @+. |
From [Unknown Contact. See original ticket]On Jul 26, Jeff 'japhy/Marillion' Pinyan said:
Hmm, that should be substr(???, I just found gross misinformation in perlvar.pod. Patch after sig. -- Inline Patch--- pod/perlvar.pod.old Thu Jul 26 19:58:37 2001
+++ pod/perlvar.pod Thu Jul 26 20:00:11 2001
@@ -472,10 +472,8 @@
successful submatches in the currently active dynamic scope.
C<$-[0]> is the offset into the string of the beginning of the
entire match. The I<n>th element of this array holds the offset
-of the I<n>th submatch, so C<$+[1]> is the offset where $1
-begins, C<$+[2]> the offset where $2 begins, and so on.
-You can use C<$#-> to determine how many subgroups were in the
-last successful match. Compare with the C<@+> variable.
+of the I<n>th submatch, so C<$-[1]> is the offset where $1
+begins, C<$-[2]> the offset where $2 begins, and so on.
After a match against some variable $var:
|
From [Unknown Contact. See original ticket]01-07-27 01.55, skrev Jeff 'japhy/Marillion' Pinyan på jeffp@crusoe.net
Aren't @- and @+ deprecated? -- |
From [Unknown Contact. See original ticket]On Jul 27, Arthur Bergman said:
Whoa. When did that happen? They just got here. I'd think they're -- |
From @jhiThanks, applied. |
@cwest - Status changed from 'open' to 'resolved' |
Migrated from rt.perl.org#7190 (status was 'resolved')
Searchable as RT7190$
The text was updated successfully, but these errors were encountered: