Skip to content

regex (?<name>...) capture-to-var paren, new $^N magic variable #4142

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
p5pRT opened this issue Jun 26, 2001 · 11 comments
Closed

regex (?<name>...) capture-to-var paren, new $^N magic variable #4142

p5pRT opened this issue Jun 26, 2001 · 11 comments

Comments

@p5pRT
Copy link

p5pRT commented Jun 26, 2001

Migrated from rt.perl.org#7190 (status was 'resolved')

Searchable as RT7190$

@p5pRT
Copy link
Author

p5pRT commented Jun 26, 2001

From [email protected]

SUMMARY​:
  1) Gee, it'd be nice to support (?<name>...) "named capture" parens
  within regular expressions.

  2) I created a new magic variable $^N, similar to $+.

  3) If this is of interest, is $^N a good name?

Feeling envy for highly advanced :-) languages like Java and Visual Basic,
whose regex languages allows named captures a'la

  (?<myval>\d+)

(sets $myval to the digits captured), I thought I'd use regex overloading
to convert this syntaxt to

  (\d+) (?{ $varname = $+ })

(capture to normal parens, then use $+ to access it and assign to the variable)

This works fine except for a few details, such as being able to have
capturing parens nested within, since $+ refers to the *hightest-numbered*
set of parens used so far, not the most-recently *closed* set of parens
used so far.

So, I created a new magic variable, $^N, that pretty much parallels $+
except it does indeed refer to the most-recently *closed* set of parens.

Now, converting to

  (\d+) (?{ $varname = $^N })

works even with nesting. This is very nice.

I can send the patches if wanted. But if wanted, what is a good variable name?
I picked $^N simply because I saw it was free.

For those interested, I've appended my package to allow this, and a short
test program.

There are still "issues" with my overloading -- the variables named are not
checked at runtime ('use strict' doesn't save you from referring to a
nonexistant variable), and the variables are not "protected", so even if
during a match a variable is set, it won't be unset if the match later
fails.

So, it really would be nice if named captures were officially supported. I
spent some hours digging through regcomp and regexec, and succeeded only in
killing massive amounts of neurons )-​:, so I don't think I'll be able to
add it.

  Jeffrey

Here is a short test script​:

---snip------------------------------------------------------------------
  use strict;
  use warnings;
  use Regex​::SupportNamedCapture;

  my $areacode; ## both lexical
  our $exchange; ## and global
  my $number; ## variables work fine.

  "My number is 408-555-1212." =~ m{
  \b
  (?<areacode>\d\d\d)
  -
  (?<exchange>\d\d\d)
  -
  (?<number>\d\d\d\d \b)
  }x;

  print "phone number is​: ($areacode) $exchange-$number\n";

  my $fullnumber;

  "My number is 408-555-1212." =~ m{
  \b
  (?<fullnumber>
  (?<areacode>\d\d\d)
  -
  (?<exchange>\d\d\d)
  -
  (?<number>\d\d\d\d \b)
  )
  }x;

  if ($fullnumber ne "$areacode-$exchange-$number")
  {
  print "Bummer, you don't have \$^N support​: fullnumber is [$fullnumber]\n";
  }
---snip------------------------------------------------------------------

Here is the package​:

---snip------------------------------------------------------------------
package Regex​::SupportNamedCapture;

##
## This package allows regular expressions to have named captures, a'la
##
## (?<varname>...)
## sets $varname to the result of what's matched by the /.../
##
## If your Perl supports the $^N "most-recently-closed-paren text" variable,
## the /.../ part may itself contain capturing parens. Otherwise, it
## shouldn't.
##
## jfriedl@​yahoo.com
## 6/2001
##
use strict;
use warnings;
use re 'eval';
use overload;
sub import { overload​::constant 'qr' => \&convert }

##
## Test to see if my proposed $^N is supported.
## Set $GutsResult to $^N if so, set to $+ if not.
##
my $GutsResult = do {
  no warnings;
  "1" =~ m/(1)/;
  if ($^N) {
  '$^N';
  } else {
  '$+';
  }
};

our $OpenParens; ## needed for matching nested parens

my $NestedParenGuts = qr{

  (?{ local $OpenParens = 0 }) ## counts the number of nested opens waiting to close

  (?>
  (?>
  ## stuff not parens, not escaped
  [^()\\]+

  ## escaped stuff
  | (?s​: \\. )

  # another opening paren
  | \( (?{ $OpenParens++ })

  # a closing paren, if we're expecting any
  | (?(?{ $OpenParens }) (?{ $OpenParens-- }) \))
  )*
  )
}x;

##
## Mimics named capturing parens by converting something like
## (?<varname>...)
## to
## (?​: (...) (?{ $varname = $^N }) )
##
##
sub convert
{
  my $re = shift; ## regex to mangle

  $re =~ s{
  (?<! \\ ) # an unescaped...
  \(\? # "(?"
  < # '<'
  (\w+) # $1 - an identifier
  > # '>'
  ($NestedParenGuts) # $2 - regex guts
  \) # ')'
  }{
  my $id = '$' . $1;
  my $guts = convert($2);
  "(?​:($guts)(?{ $id=$GutsResult }))";
  }exg;

  return $re; ## return mangled regex
}

1;
---snip------------------------------------------------------------------

Perl Info

Flags:
    category=core
    severity=wishlist

Site configuration information for perl v5.7.1:

Configured by jfriedl at Sun Jun 17 23:29:39 PDT 2001.

Summary of my perl5 (revision 5.0 version 7 subversion 17) configuration:
  Platform:
    osname=linux, osvers=2.4.5, archname=i686-linux
    uname='linux fummy.telocity.com 2.4.5 #3 smp mon jun 4 22:43:14 pdt 2001 i686 unknown '
    config_args='-e -s -O -D optimize=-O2 -g'
    hint=previous, useposix=true, d_sigaction=define
    usethreads=undef use5005threads=undef useithreads=undef usemultiplicity=undef
    useperlio=define d_sfio=undef uselargefiles=define usesocks=undef
    use64bitint=undef use64bitall=undef uselongdouble=undef
  Compiler:
    cc='cc', ccflags ='-Wall -DDEBUGGING -fno-strict-aliasing -I/usr/local/include -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64',
    optimize='-O2 -g',
    cppflags='-Wall -DDEBUGGING -fno-strict-aliasing -I/usr/local/include -Wall -DDEBUGGING -fno-strict-aliasing -I/usr/local/include -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64'
    ccversion='', gccversion='2.95.3 20010315 (release)', gccosandvers=''
    intsize=4, longsize=4, ptrsize=4, doublesize=8, byteorder=1234
    d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=12
    ivtype='long', ivsize=4, nvtype='double', nvsize=8, Off_t='off_t', lseeksize=8
    alignbytes=4, usemymalloc=n, prototype=define
  Linker and Libraries:
    ld='cc', ldflags =' -L/usr/local/lib'
    libpth=/usr/local/lib /lib /usr/lib
    libs=-lnsl -lndbm -lgdbm -ldb -ldl -lm -lc -lposix -lcrypt -lutil
    perllibs=-lnsl -ldl -lm -lc -lposix -lcrypt -lutil
    libc=/lib/libc-2.2.3.so, so=so, useshrplib=false, libperl=libperl.a
  Dynamic Linking:
    dlsrc=dl_dlopen.xs, dlext=so, d_dlsymun=undef, ccdlflags='-rdynamic'
    cccdlflags='-fpic', lddlflags='-shared -L/usr/local/lib'

Locally applied patches:
    DEVEL10654


@INC for perl v5.7.1:
    /home/jfriedl/lib/perl
    /home/jfriedl/lib/perl/yahoo
    /usr/local/lib/perl5/5.7.1/i686-linux
    /usr/local/lib/perl5/5.7.1
    /usr/local/lib/perl5/site_perl/5.7.1/i686-linux
    /usr/local/lib/perl5/site_perl/5.7.1
    /usr/local/lib/perl5/site_perl/5.6.1/i686-linux
    /usr/local/lib/perl5/site_perl/5.6.1
    /usr/local/lib/perl5/site_perl/5.6.0/i686-linux
    /usr/local/lib/perl5/site_perl/5.6.0
    /usr/local/lib/perl5/site_perl
    .


Environment for perl v5.7.1:
    HOME=/home/jfriedl
    LANG (unset)
    LANGUAGE (unset)
    LD_LIBRARY_PATH=/usr/local/pgsql/lib:/home/jfriedl/src/rvplayer5.0
    LOGDIR (unset)
    PATH=/home/jfriedl/bin:/home/jfriedl/common/bin:.:/usr/local/pgsql/bin:/usr/local/bin:/usr/X11R6/bin:/bin:/usr/bin:/usr/sbin:/sbin:/home/jfriedl/src/rvplayer5.0:/usr/local/prod/bin:/usr/local/java/bin
    PERLLIB=/home/jfriedl/lib/perl:/home/jfriedl/lib/perl/yahoo
    PERL_BADLANG (unset)
    SHELL=/bin/tcsh

@p5pRT
Copy link
Author

p5pRT commented Jun 29, 2001

From [Unknown Contact. See original ticket]

Well, since there has been no objection, here is the patch.
  Jeffrey


Inline Patch
diff -u -r .orig/embedvar.h ./embedvar.h
--- .orig/embedvar.h	Wed Jun 20 11:35:50 2001
+++ ./embedvar.h	Tue Jun 26 12:16:35 2001
@@ -113,6 +113,7 @@
 #define PL_regint_start		(vTHX->Tregint_start)
 #define PL_regint_string	(vTHX->Tregint_string)
 #define PL_reginterp_cnt	(vTHX->Treginterp_cnt)
+#define PL_reglastcloseparen	(vTHX->Treglastcloseparen)
 #define PL_reglastparen		(vTHX->Treglastparen)
 #define PL_regnarrate		(vTHX->Tregnarrate)
 #define PL_regnaughty		(vTHX->Tregnaughty)
@@ -821,6 +822,7 @@
 #define PL_regint_start		(aTHXo->interp.Tregint_start)
 #define PL_regint_string	(aTHXo->interp.Tregint_string)
 #define PL_reginterp_cnt	(aTHXo->interp.Treginterp_cnt)
+#define PL_reglastcloseparen	(aTHXo->interp.Treglastcloseparen)
 #define PL_reglastparen		(aTHXo->interp.Treglastparen)
 #define PL_regnarrate		(aTHXo->interp.Tregnarrate)
 #define PL_regnaughty		(aTHXo->interp.Tregnaughty)
@@ -1518,6 +1520,7 @@
 #define PL_regint_start		(aTHX->Tregint_start)
 #define PL_regint_string	(aTHX->Tregint_string)
 #define PL_reginterp_cnt	(aTHX->Treginterp_cnt)
+#define PL_reglastcloseparen	(aTHX->Treglastcloseparen)
 #define PL_reglastparen		(aTHX->Treglastparen)
 #define PL_regnarrate		(aTHX->Tregnarrate)
 #define PL_regnaughty		(aTHX->Tregnaughty)
@@ -1654,6 +1657,7 @@
 #define PL_Tregint_start	PL_regint_start
 #define PL_Tregint_string	PL_regint_string
 #define PL_Treginterp_cnt	PL_reginterp_cnt
+#define PL_Treglastcloseparen	PL_reglastcloseparen
 #define PL_Treglastparen	PL_reglastparen
 #define PL_Tregnarrate		PL_regnarrate
 #define PL_Tregnaughty		PL_regnaughty
diff -u -r .orig/gv.c ./gv.c
--- .orig/gv.c	Mon Jun 25 08:08:15 2001
+++ ./gv.c	Tue Jun 26 12:35:10 2001
@@ -895,6 +895,7 @@
     case '\006':	/* $^F */
     case '\010':	/* $^H */
     case '\011':	/* $^I, NOT \t in EBCDIC */
+    case '\016':        /* $^N */
     case '\020':	/* $^P */
     case '\024':	/* $^T */
 	if (len > 1)
@@ -1764,6 +1765,7 @@
     case '\010':   /* $^H */
     case '\011':   /* $^I, NOT \t in EBCDIC */
     case '\014':   /* $^L */
+    case '\016':   /* $^N */
     case '\020':   /* $^P */
     case '\023':   /* $^S */
     case '\024':   /* $^T */
diff -u -r .orig/mg.c ./mg.c
--- .orig/mg.c	Mon Jun 25 07:48:03 2001
+++ ./mg.c	Tue Jun 26 12:36:16 2001
@@ -435,6 +435,13 @@
 		goto getparen;
 	}
 	return 0;
+    case '\016': /* ^N */
+	if (PL_curpm && (rx = PM_GETRE(PL_curpm))) {
+	    paren = rx->lastcloseparen;
+	    if (paren)
+		goto getparen;
+	}
+	return 0;
     case '`':
 	if (PL_curpm && (rx = PM_GETRE(PL_curpm))) {
 	    if (rx->startp[0] != -1) {
@@ -655,6 +662,14 @@
     case '+':
 	if (PL_curpm && (rx = PM_GETRE(PL_curpm))) {
 	    paren = rx->lastparen;
+	    if (paren)
+		goto getparen;
+	}
+	sv_setsv(sv,&PL_sv_undef);
+	break;
+    case '\016':		/* ^N */
+	if (PL_curpm && (rx = PM_GETRE(PL_curpm))) {
+	    paren = rx->lastcloseparen;
 	    if (paren)
 		goto getparen;
 	}
diff -u -r .orig/perlapi.h ./perlapi.h
--- .orig/perlapi.h	Wed Jun 20 11:35:50 2001
+++ ./perlapi.h	Tue Jun 26 12:16:35 2001
@@ -802,6 +802,8 @@
 #define PL_regint_string	(*Perl_Tregint_string_ptr(aTHXo))
 #undef  PL_reginterp_cnt
 #define PL_reginterp_cnt	(*Perl_Treginterp_cnt_ptr(aTHXo))
+#undef  PL_reglastcloseparen
+#define PL_reglastcloseparen	(*Perl_Treglastcloseparen_ptr(aTHXo))
 #undef  PL_reglastparen
 #define PL_reglastparen		(*Perl_Treglastparen_ptr(aTHXo))
 #undef  PL_regnarrate
diff -u -r .orig/regexec.c ./regexec.c
--- .orig/regexec.c	Thu Jun 21 07:16:49 2001
+++ ./regexec.c	Tue Jun 26 12:28:14 2001
@@ -147,7 +147,7 @@
     if (paren_elems_to_push < 0)
 	Perl_croak(aTHX_ "panic: paren_elems_to_push < 0");
 
-#define REGCP_OTHER_ELEMS 5
+#define REGCP_OTHER_ELEMS 6
     SSCHECK(paren_elems_to_push + REGCP_OTHER_ELEMS);
     for (p = PL_regsize; p > parenfloor; p--) {
 /* REGCP_PARENS_ELEMS are pushed per pairs of parentheses. */
@@ -159,6 +159,7 @@
 /* REGCP_OTHER_ELEMS are pushed in any case, parentheses or no. */
     SSPUSHINT(PL_regsize);
     SSPUSHINT(*PL_reglastparen);
+    SSPUSHINT(*PL_reglastcloseparen);
     SSPUSHPTR(PL_reginput);
 #define REGCP_FRAME_ELEMS 2
 /* REGCP_FRAME_ELEMS are part of the REGCP_OTHER_ELEMS and
@@ -192,6 +193,7 @@
     assert(i == SAVEt_REGCONTEXT); /* Check that the magic cookie is there. */
     i = SSPOPINT; /* Parentheses elements to pop. */
     input = (char *) SSPOPPTR;
+    *PL_reglastcloseparen = SSPOPINT;
     *PL_reglastparen = SSPOPINT;
     PL_regsize = SSPOPINT;
 
@@ -1871,6 +1873,7 @@
     PL_regstartp = prog->startp;
     PL_regendp = prog->endp;
     PL_reglastparen = &prog->lastparen;
+    PL_reglastcloseparen = &prog->lastcloseparen;
     prog->lastparen = 0;
     PL_regsize = 0;
     DEBUG_r(PL_reg_starttry = startpos);
@@ -2562,6 +2565,7 @@
 		    cache_re(re);
 		    state.ss = PL_savestack_ix;
 		    *PL_reglastparen = 0;
+		    *PL_reglastcloseparen = 0;
 		    PL_reg_call_cc = &state;
 		    PL_reginput = locinput;
 
@@ -2619,6 +2623,7 @@
 	    PL_regendp[n] = locinput - PL_bostr;
 	    if (n > *PL_reglastparen)
 		*PL_reglastparen = n;
+	    *PL_reglastcloseparen = n;
 	    break;
 	case GROUPP:
 	    n = ARG(scan);  /* which paren pair */
diff -u -r .orig/regexp.h ./regexp.h
--- .orig/regexp.h	Sun Apr 22 09:12:37 2001
+++ ./regexp.h	Tue Jun 26 12:10:53 2001
@@ -37,6 +37,7 @@
 	I32 prelen;		/* length of precomp */
 	U32 nparens;		/* number of parentheses */
 	U32 lastparen;		/* last paren matched */
+	U32 lastcloseparen;	/* last paren matched */
 	U32 reganch;		/* Internal use only +
 				   Tainted information used by regexec? */
 	regnode program[1];	/* Unwarranted chumminess with compiler. */
diff -u -r .orig/thrdvar.h ./thrdvar.h
--- .orig/thrdvar.h	Mon Apr 30 05:29:37 2001
+++ ./thrdvar.h	Tue Jun 26 12:12:52 2001
@@ -182,6 +182,7 @@
 PERLVAR(Tregstartp,	I32 *)		/* Pointer to startp array. */
 PERLVAR(Tregendp,	I32 *)		/* Ditto for endp. */
 PERLVAR(Treglastparen,	U32 *)		/* Similarly for lastparen. */
+PERLVAR(Treglastcloseparen, U32 *)	/* Similarly for lastcloseparen. */
 PERLVAR(Tregtill,	char *)		/* How far we are required to go. */
 PERLVAR(Tregcompat1,	char)		/* used to be regprev1 */
 PERLVAR(Treg_start_tmp,	char **)	/* from regexec.c */

@p5pRT
Copy link
Author

p5pRT commented Jun 30, 2001

From [Unknown Contact. See original ticket]

Inline Patch
diff -ru pod/.orig/perlretut.pod pod/perlretut.pod
--- pod/.orig/perlretut.pod	Tue Jun 12 18:39:57 2001
+++ pod/perlretut.pod	Fri Jun 29 23:51:13 2001
@@ -710,9 +710,12 @@
     /(ab(cd|ef)((gi)|j))/;
      1  2      34
 
-so that if the regexp matched, e.g., C<$2> would contain 'cd' or 'ef'.
-For convenience, perl sets C<$+> to the highest numbered C<$1>, C<$2>,
-... that got assigned.
+so that if the regexp matched, e.g., C<$2> would contain 'cd' or 'ef'. For
+convenience, perl sets C<$+> to the string held by the highest numbered
+C<$1>, C<$2>, ... that got assigned (and, somewhat related, C<$^N> to the
+value of the C<$1>, C<$2>, ... most-recently assigned; i.e. the C<$1>,
+C<$2>, ... associated with the rightmost closing parenthesis used in the
+match).
 
 Closely associated with the matching variables C<$1>, C<$2>, ... are
 the B<backreferences> C<\1>, C<\2>, ... .  Backreferences are simply
diff -ru pod/.orig/perltoc.pod pod/perltoc.pod
--- pod/.orig/perltoc.pod	Tue Jun 26 08:00:08 2001
+++ pod/perltoc.pod	Fri Jun 29 23:45:32 2001
@@ -904,7 +904,7 @@
 $EVAL_ERROR, $@, $PROCESS_ID, $PID, $$, $REAL_USER_ID, $UID, $<,
 $EFFECTIVE_USER_ID, $EUID, $>, $REAL_GROUP_ID, $GID, $(,
 $EFFECTIVE_GROUP_ID, $EGID, $), $PROGRAM_NAME, $0, $[, $], $COMPILING, $^C,
-$DEBUGGING, $^D, $SYSTEM_FD_MAX, $^F, $^H, %^H, $INPLACE_EDIT, $^I, $^M,
+$DEBUGGING, $^D, $SYSTEM_FD_MAX, $^F, $^H, %^H, $INPLACE_EDIT, $^I, $^M, $^N,
 $OSNAME, $^O, $PERLDB, $^P, 0x01, 0x02, 0x04, 0x08, 0x10, 0x20, 0x40, 0x80,
 0x100, 0x200, $LAST_REGEXP_CODE_RESULT, $^R, $EXCEPTIONS_BEING_CAUGHT, $^S,
 $BASETIME, $^T, $PERL_VERSION, $^V, $WARNING, $^W, ${^WARNING_BITS},
diff -ru pod/.orig/perlvar.pod pod/perlvar.pod
--- pod/.orig/perlvar.pod	Mon Jun 18 07:42:24 2001
+++ pod/perlvar.pod	Sat Jun 30 00:04:05 2001
@@ -180,14 +180,29 @@
 
 =item $+
 
-The last bracket matched by the last search pattern.  This is useful if
-you don't know which one of a set of alternative patterns matched.  For
-example:
+The text matched by the last bracket of the last successful search pattern.
+This is useful if you don't know which one of a set of alternative patterns
+matched. For example:
 
     /Version: (.*)|Revision: (.*)/ && ($rev = $+);
 
 (Mnemonic: be positive and forward looking.)
 This variable is read-only and dynamically scoped to the current BLOCK.
+
+=item $^N
+
+The text matched by the used group most-recently closed (i.e. the group
+with the rightmost closing parenthesis) of the last successful search
+pattern. This is primarly used inside C<(?{...})> blocks for examining text
+recently matched. For example, to effectively capture text to a variable
+(in addition to C<$1>, C<$2>, etc.), replace C<(...)> with
+
+     (?:(...)(?{ $var = $^N }))
+
+By setting and then using C<$var> in this way relieves you from having to
+worry about exactly which numbered set of parentheses they are.
+
+This variable is dynamically scoped to the current BLOCK.
 
 =item @LAST_MATCH_END
 
--- t/op/.orig/pat.t	Thu Jun 28 20:17:51 2001
+++ t/op/pat.t	Sat Jun 30 00:52:21 2001
@@ -6,7 +6,7 @@
 
 $| = 1;
 
-print "1..639\n";
+print "1..660\n";
 
 BEGIN {
     chdir 't' if -d 't';
@@ -1854,3 +1854,38 @@
 print "not " unless " " =~ /[[:print:]]/;
 print "ok 639\n";
 
+##
+## Test basic $^N usage outside of a regex
+##
+$x = "abcdef";
+$T="ok 640\n";if ($x =~ /cde/ and not defined $^N)         {print $T} else {print "not $T"};
+$T="ok 641\n";if ($x =~ /(cde)/          and $^N eq "cde") {print $T} else {print "not $T"};
+$T="ok 642\n";if ($x =~ /(c)(d)(e)/      and $^N eq   "e") {print $T} else {print "not $T"};
+$T="ok 643\n";if ($x =~ /(c(d)e)/        and $^N eq "cde") {print $T} else {print "not $T"};
+$T="ok 644\n";if ($x =~ /(foo)|(c(d)e)/  and $^N eq "cde") {print $T} else {print "not $T"};
+$T="ok 645\n";if ($x =~ /(c(d)e)|(foo)/  and $^N eq "cde") {print $T} else {print "not $T"};
+$T="ok 646\n";if ($x =~ /(c(d)e)|(abc)/  and $^N eq "abc") {print $T} else {print "not $T"};
+$T="ok 647\n";if ($x =~ /(c(d)e)|(abc)x/ and $^N eq "cde") {print $T} else {print "not $T"};
+$T="ok 648\n";if ($x =~ /(c(d)e)(abc)?/  and $^N eq "cde") {print $T} else {print "not $T"};
+$T="ok 649\n";if ($x =~ /(?:c(d)e)/      and $^N eq  "d" ) {print $T} else {print "not $T"};
+$T="ok 650\n";if ($x =~ /(?:c(d)e)(?:f)/ and $^N eq  "d" ) {print $T} else {print "not $T"};
+$T="ok 651\n";if ($x =~ /(?:([abc])|([def]))*/ and $^N eq  "f" ){print $T} else {print "not $T"};
+$T="ok 652\n";if ($x =~ /(?:([ace])|([bdf]))*/ and $^N eq  "f" ){print $T} else {print "not $T"};
+$T="ok 653\n";if ($x =~ /(([ace])|([bd]))*/    and $^N eq  "e" ){print $T} else {print "not $T"};
+{
+ $T="ok 654\n";if($x =~ /(([ace])|([bdf]))*/   and $^N eq  "f" ){print $T} else {print "not $T"};
+}
+## test to see if $^N is automatically localized -- it should now
+## have the value set in test 653
+$T="ok 655\n";if ($^N eq  "e" ){print $T} else {print "not $T"};
+
+##
+## Now test inside (?{...})
+##
+$T="ok 656\n";if ($x =~ /a([abc])(?{$y=$^N})c/      and $y eq "b" ){print $T} else {print "not $T"};
+$T="ok 657\n";if ($x =~ /a([abc]+)(?{$y=$^N})d/     and $y eq "bc"){print $T} else {print "not $T"};
+$T="ok 658\n";if ($x =~ /a([abcdefg]+)(?{$y=$^N})d/ and $y eq "bc"){print $T} else {print "not $T"};
+$T="ok 659\n";if ($x =~ /(a([abcdefg]+)(?{$y=$^N})d)(?{$z=$^N})e/ and $y eq "bc" and $z eq "abcd")
+              {print $T} else {print "not $T"};
+$T="ok 660\n";if ($x =~ /(a([abcdefg]+)(?{$y=$^N})de)(?{$z=$^N})/ and $y eq "bc" and $z eq "abcde")
+              {print $T} else {print "not $T"};

@p5pRT
Copy link
Author

p5pRT commented Jun 30, 2001

From @jhi

Thanks, applied. (base/lex.t need tweaking, too, since it assumed
$^N is an unused variable...)

-----------------------------------------------------------------

diff -u -r .orig/embedvar.h ./embedvar.h
--- .orig/embedvar.h Wed Jun 20 11​:35​:50 2001
+++ ./embedvar.h Tue Jun 26 12​:16​:35 2001
@​@​ -113,6 +113,7 @​@​
#define PL_regint_start (vTHX->Tregint_start)
#define PL_regint_string (vTHX->Tregint_string)
#define PL_reginterp_cnt (vTHX->Treginterp_cnt)
+#define PL_reglastcloseparen (vTHX->Treglastcloseparen)
#define PL_reglastparen (vTHX->Treglastparen)
#define PL_regnarrate (vTHX->Tregnarrate)
#define PL_regnaughty (vTHX->Tregnaughty)
@​@​ -821,6 +822,7 @​@​
#define PL_regint_start (aTHXo->interp.Tregint_start)
#define PL_regint_string (aTHXo->interp.Tregint_string)
#define PL_reginterp_cnt (aTHXo->interp.Treginterp_cnt)
+#define PL_reglastcloseparen (aTHXo->interp.Treglastcloseparen)
#define PL_reglastparen (aTHXo->interp.Treglastparen)
#define PL_regnarrate (aTHXo->interp.Tregnarrate)
#define PL_regnaughty (aTHXo->interp.Tregnaughty)
@​@​ -1518,6 +1520,7 @​@​
#define PL_regint_start (aTHX->Tregint_start)
#define PL_regint_string (aTHX->Tregint_string)
#define PL_reginterp_cnt (aTHX->Treginterp_cnt)
+#define PL_reglastcloseparen (aTHX->Treglastcloseparen)
#define PL_reglastparen (aTHX->Treglastparen)
#define PL_regnarrate (aTHX->Tregnarrate)
#define PL_regnaughty (aTHX->Tregnaughty)
@​@​ -1654,6 +1657,7 @​@​
#define PL_Tregint_start PL_regint_start
#define PL_Tregint_string PL_regint_string
#define PL_Treginterp_cnt PL_reginterp_cnt
+#define PL_Treglastcloseparen PL_reglastcloseparen
#define PL_Treglastparen PL_reglastparen
#define PL_Tregnarrate PL_regnarrate
#define PL_Tregnaughty PL_regnaughty
diff -u -r .orig/gv.c ./gv.c
--- .orig/gv.c Mon Jun 25 08​:08​:15 2001
+++ ./gv.c Tue Jun 26 12​:35​:10 2001
@​@​ -895,6 +895,7 @​@​
case '\006'​: /* $^F */
case '\010'​: /* $^H */
case '\011'​: /* $^I, NOT \t in EBCDIC */
+ case '\016'​: /* $^N */
case '\020'​: /* $^P */
case '\024'​: /* $^T */
if (len > 1)
@​@​ -1764,6 +1765,7 @​@​
case '\010'​: /* $^H */
case '\011'​: /* $^I, NOT \t in EBCDIC */
case '\014'​: /* $^L */
+ case '\016'​: /* $^N */
case '\020'​: /* $^P */
case '\023'​: /* $^S */
case '\024'​: /* $^T */
diff -u -r .orig/mg.c ./mg.c
--- .orig/mg.c Mon Jun 25 07​:48​:03 2001
+++ ./mg.c Tue Jun 26 12​:36​:16 2001
@​@​ -435,6 +435,13 @​@​
goto getparen;
}
return 0;
+ case '\016'​: /* ^N */
+ if (PL_curpm && (rx = PM_GETRE(PL_curpm))) {
+ paren = rx->lastcloseparen;
+ if (paren)
+ goto getparen;
+ }
+ return 0;
case '`'​:
if (PL_curpm && (rx = PM_GETRE(PL_curpm))) {
if (rx->startp[0] != -1) {
@​@​ -655,6 +662,14 @​@​
case '+'​:
if (PL_curpm && (rx = PM_GETRE(PL_curpm))) {
paren = rx->lastparen;
+ if (paren)
+ goto getparen;
+ }
+ sv_setsv(sv,&PL_sv_undef);
+ break;
+ case '\016'​: /* ^N */
+ if (PL_curpm && (rx = PM_GETRE(PL_curpm))) {
+ paren = rx->lastcloseparen;
if (paren)
goto getparen;
}
diff -u -r .orig/perlapi.h ./perlapi.h
--- .orig/perlapi.h Wed Jun 20 11​:35​:50 2001
+++ ./perlapi.h Tue Jun 26 12​:16​:35 2001
@​@​ -802,6 +802,8 @​@​
#define PL_regint_string (*Perl_Tregint_string_ptr(aTHXo))
#undef PL_reginterp_cnt
#define PL_reginterp_cnt (*Perl_Treginterp_cnt_ptr(aTHXo))
+#undef PL_reglastcloseparen
+#define PL_reglastcloseparen (*Perl_Treglastcloseparen_ptr(aTHXo))
#undef PL_reglastparen
#define PL_reglastparen (*Perl_Treglastparen_ptr(aTHXo))
#undef PL_regnarrate
diff -u -r .orig/regexec.c ./regexec.c
--- .orig/regexec.c Thu Jun 21 07​:16​:49 2001
+++ ./regexec.c Tue Jun 26 12​:28​:14 2001
@​@​ -147,7 +147,7 @​@​
if (paren_elems_to_push < 0)
Perl_croak(aTHX_ "panic​: paren_elems_to_push < 0");

-#define REGCP_OTHER_ELEMS 5
+#define REGCP_OTHER_ELEMS 6
SSCHECK(paren_elems_to_push + REGCP_OTHER_ELEMS);
for (p = PL_regsize; p > parenfloor; p--) {
/* REGCP_PARENS_ELEMS are pushed per pairs of parentheses. */
@​@​ -159,6 +159,7 @​@​
/* REGCP_OTHER_ELEMS are pushed in any case, parentheses or no. */
SSPUSHINT(PL_regsize);
SSPUSHINT(*PL_reglastparen);
+ SSPUSHINT(*PL_reglastcloseparen);
SSPUSHPTR(PL_reginput);
#define REGCP_FRAME_ELEMS 2
/* REGCP_FRAME_ELEMS are part of the REGCP_OTHER_ELEMS and
@​@​ -192,6 +193,7 @​@​
assert(i == SAVEt_REGCONTEXT); /* Check that the magic cookie is there. */
i = SSPOPINT; /* Parentheses elements to pop. */
input = (char *) SSPOPPTR;
+ *PL_reglastcloseparen = SSPOPINT;
*PL_reglastparen = SSPOPINT;
PL_regsize = SSPOPINT;

@​@​ -1871,6 +1873,7 @​@​
PL_regstartp = prog->startp;
PL_regendp = prog->endp;
PL_reglastparen = &prog->lastparen;
+ PL_reglastcloseparen = &prog->lastcloseparen;
prog->lastparen = 0;
PL_regsize = 0;
DEBUG_r(PL_reg_starttry = startpos);
@​@​ -2562,6 +2565,7 @​@​
cache_re(re);
state.ss = PL_savestack_ix;
*PL_reglastparen = 0;
+ *PL_reglastcloseparen = 0;
PL_reg_call_cc = &state;
PL_reginput = locinput;

@​@​ -2619,6 +2623,7 @​@​
PL_regendp[n] = locinput - PL_bostr;
if (n > *PL_reglastparen)
*PL_reglastparen = n;
+ *PL_reglastcloseparen = n;
break;
case GROUPP​:
n = ARG(scan); /* which paren pair */
diff -u -r .orig/regexp.h ./regexp.h
--- .orig/regexp.h Sun Apr 22 09​:12​:37 2001
+++ ./regexp.h Tue Jun 26 12​:10​:53 2001
@​@​ -37,6 +37,7 @​@​
I32 prelen; /* length of precomp */
U32 nparens; /* number of parentheses */
U32 lastparen; /* last paren matched */
+ U32 lastcloseparen; /* last paren matched */
U32 reganch; /* Internal use only +
Tainted information used by regexec? */
regnode program[1]; /* Unwarranted chumminess with compiler. */
diff -u -r .orig/thrdvar.h ./thrdvar.h
--- .orig/thrdvar.h Mon Apr 30 05​:29​:37 2001
+++ ./thrdvar.h Tue Jun 26 12​:12​:52 2001
@​@​ -182,6 +182,7 @​@​
PERLVAR(Tregstartp, I32 *) /* Pointer to startp array. */
PERLVAR(Tregendp, I32 *) /* Ditto for endp. */
PERLVAR(Treglastparen, U32 *) /* Similarly for lastparen. */
+PERLVAR(Treglastcloseparen, U32 *) /* Similarly for lastcloseparen. */
PERLVAR(Tregtill, char *) /* How far we are required to go. */
PERLVAR(Tregcompat1, char) /* used to be regprev1 */
PERLVAR(Treg_start_tmp, char **) /* from regexec.c */

@p5pRT
Copy link
Author

p5pRT commented Jun 30, 2001

From @jhi

Thanks, applied.

@p5pRT
Copy link
Author

p5pRT commented Jul 26, 2001

From [Unknown Contact. See original ticket]

On Jun 30, Jeffrey Friedl said​:

+so that if the regexp matched, e.g., C<$2> would contain 'cd' or 'ef'. For
+convenience, perl sets C<$+> to the string held by the highest numbered
+C<$1>, C<$2>, ... that got assigned (and, somewhat related, C<$^N> to the
+value of the C<$1>, C<$2>, ... most-recently assigned; i.e. the C<$1>,
+C<$2>, ... associated with the rightmost closing parenthesis used in the
+match).

Does that make $^N equal to

  substr(???, $-[-1], $+[-1] - $-[-1]);

If so, you might want to include that in the docs for @​- and/or @​+.

@p5pRT
Copy link
Author

p5pRT commented Jul 26, 2001

From [Unknown Contact. See original ticket]

On Jul 26, Jeff 'japhy/Marillion' Pinyan said​:

On Jun 30, Jeffrey Friedl said​:

+so that if the regexp matched, e.g., C<$2> would contain 'cd' or 'ef'. For
+convenience, perl sets C<$+> to the string held by the highest numbered
+C<$1>, C<$2>, ... that got assigned (and, somewhat related, C<$^N> to the
+value of the C<$1>, C<$2>, ... most-recently assigned; i.e. the C<$1>,
+C<$2>, ... associated with the rightmost closing parenthesis used in the
+match).

Does that make $^N equal to

substr(???, $-[-1], $+[-1] - $-[-1]);

Hmm, that should be substr(???, $-[-1], $+[$#-] - $-[-1]). Just in case
my logic with $^N is wrong, I've not included it in the following
patch. Oh, speaking of the following patch...

I just found gross misinformation in perlvar.pod. Patch after sig.

--
Jeff "japhy" Pinyan japhy@​pobox.com http​://www.pobox.com/~japhy/
I am Marillion, the wielder of Ringril, known as Hesinaur, the Winter-Sun.
Are you a Monk? http​://www.perlmonks.com/ http​://forums.perlguru.com/
Perl Programmer at RiskMetrics Group, Inc. http​://www.riskmetrics.com/
Acacia Fraternity, Rensselaer Chapter. Brother #734
** Manning Publications, Co, is publishing my Perl Regex book **

Inline Patch
--- pod/perlvar.pod.old Thu Jul 26 19:58:37 2001
+++ pod/perlvar.pod     Thu Jul 26 20:00:11 2001
@@ -472,10 +472,8 @@
 successful submatches in the currently active dynamic scope.
 C<$-[0]> is the offset into the string of the beginning of the
 entire match.  The I<n>th element of this array holds the offset
-of the I<n>th submatch, so C<$+[1]> is the offset where $1
-begins, C<$+[2]> the offset where $2 begins, and so on.
-You can use C<$#-> to determine how many subgroups were in the
-last successful match.  Compare with the C<@+> variable.
+of the I<n>th submatch, so C<$-[1]> is the offset where $1
+begins, C<$-[2]> the offset where $2 begins, and so on.
 
 After a match against some variable $var:
 

@p5pRT
Copy link
Author

p5pRT commented Jul 26, 2001

From [Unknown Contact. See original ticket]

01-07-27 01.55, skrev Jeff 'japhy/Marillion' Pinyan pÃ¥ jeffp@​crusoe.net
följande​:

If so, you might want to include that in the docs for @​- and/or @​+.

Aren't @​- and @​+ deprecated?

--
Arthur

@p5pRT
Copy link
Author

p5pRT commented Jul 27, 2001

From [Unknown Contact. See original ticket]

On Jul 27, Arthur Bergman said​:

01-07-27 01.55, skrev Jeff 'japhy/Marillion' Pinyan på jeffp@​crusoe.net
följande​:

If so, you might want to include that in the docs for @​- and/or @​+.

Aren't @​- and @​+ deprecated?

Whoa. When did that happen? They just got here. I'd think they're
totally UN-deprecated. You can get at $` and $&amp; and $' by using the
offsets they hold.

--
Jeff "japhy" Pinyan japhy@​pobox.com http​://www.pobox.com/~japhy/
I am Marillion, the wielder of Ringril, known as Hesinaur, the Winter-Sun.
Are you a Monk? http​://www.perlmonks.com/ http​://forums.perlguru.com/
Perl Programmer at RiskMetrics Group, Inc. http​://www.riskmetrics.com/
Acacia Fraternity, Rensselaer Chapter. Brother #734
** Manning Publications, Co, is publishing my Perl Regex book **

@p5pRT
Copy link
Author

p5pRT commented Jul 29, 2001

From @jhi

Thanks, applied.

@p5pRT
Copy link
Author

p5pRT commented May 9, 2003

@cwest - Status changed from 'open' to 'resolved'

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant