Skip to content

regular-expression parser does not see '(' character #8167

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
p5pRT opened this issue Oct 25, 2005 · 10 comments
Closed

regular-expression parser does not see '(' character #8167

p5pRT opened this issue Oct 25, 2005 · 10 comments

Comments

@p5pRT
Copy link

p5pRT commented Oct 25, 2005

Migrated from rt.perl.org#37527 (status was 'rejected')

Searchable as RT37527$

@p5pRT
Copy link
Author

p5pRT commented Oct 25, 2005

From [email protected]

Created by [email protected]

Running perl on both debian stable as debian unstable. I'm using the
-MO​::Deparse for preparsing scripts (this bugreport is _NOT_ about
Deparse; It only generates the perl code)

Perl code​:
  if ($line =~ m[^\[([A-Za-z0-9_/​:\-<>\.=~]+)\]$]) {
  ...
  }

The next regular expressions also won't work​:
  m[\\\[(\[x\]+)\\\]]
  m[\\[(\[x\]+)\\]]
  m[\[(\[x\]+)\]]

However, what does work​: (note the extra '(')
  m[^\[(((([A-Za-z0-9_/​:\-<>\.=~]+\]$]

when i replace the m[..] with m{..}, everything works fine.

Perl Info

Flags:
    category=core
    severity=medium

Site configuration information for perl v5.8.4:

Configured by Debian Project at Tue Mar  8 20:31:23 EST 2005.

Summary of my perl5 (revision 5 version 8 subversion 4) configuration:
  Platform:
    osname=linux, osvers=2.4.27-ti1211, archname=i386-linux-thread-multi
    uname='linux kosh 2.4.27-ti1211 #1 sun sep 19 18:17:45 est 2004 i686 gnulinux '
    config_args='-Dusethreads -Duselargefiles -Dccflags=-DDEBIAN -Dcccdlflags=-fPIC -Darchname=i386-linux -Dprefix=/usr -Dprivlib=/usr/share/perl/5.8 -Darchlib=/usr/lib/perl/5.8 -Dvendorprefix=/usr -Dvendorlib=/usr/share/perl5 -Dvendorarch=/usr/lib/perl5 -Dsiteprefix=/usr/local -Dsitelib=/usr/local/share/perl/5.8.4 -Dsitearch=/usr/local/lib/perl/5.8.4 -Dman1dir=/usr/share/man/man1 -Dman3dir=/usr/share/man/man3 -Dsiteman1dir=/usr/local/man/man1 -Dsiteman3dir=/usr/local/man/man3 -Dman1ext=1 -Dman3ext=3perl -Dpager=/usr/bin/sensible-pager -Uafs -Ud_csh -Uusesfio -Uusenm -Duseshrplib -Dlibperl=libperl.so.5.8.4 -Dd_dosuid -des'
    hint=recommended, useposix=true, d_sigaction=define
    usethreads=define use5005threads=undef useithreads=define usemultiplicity=define
    useperlio=define d_sfio=undef uselargefiles=define usesocks=undef
    use64bitint=undef use64bitall=undef uselongdouble=undef
    usemymalloc=n, bincompat5005=undef
  Compiler:
    cc='cc', ccflags ='-D_REENTRANT -D_GNU_SOURCE -DTHREADS_HAVE_PIDS -DDEBIAN -fno-strict-aliasing -I/usr/local/include -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64',
    optimize='-O2',
    cppflags='-D_REENTRANT -D_GNU_SOURCE -DTHREADS_HAVE_PIDS -DDEBIAN -fno-strict-aliasing -I/usr/local/include'
    ccversion='', gccversion='3.3.5 (Debian 1:3.3.5-9)', gccosandvers=''
    intsize=4, longsize=4, ptrsize=4, doublesize=8, byteorder=1234
    d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=12
    ivtype='long', ivsize=4, nvtype='double', nvsize=8, Off_t='off_t', lseeksize=8
    alignbytes=4, prototype=define
  Linker and Libraries:
    ld='cc', ldflags =' -L/usr/local/lib'
    libpth=/usr/local/lib /lib /usr/lib
    libs=-lgdbm -lgdbm_compat -ldb -ldl -lm -lpthread -lc -lcrypt
    perllibs=-ldl -lm -lpthread -lc -lcrypt
    libc=/lib/libc-2.3.2.so, so=so, useshrplib=true, libperl=libperl.so.5.8.4
    gnulibc_version='2.3.2'
  Dynamic Linking:
    dlsrc=dl_dlopen.xs, dlext=so, d_dlsymun=undef, ccdlflags='-Wl,-E'
    cccdlflags='-fPIC', lddlflags='-shared -L/usr/local/lib'

Locally applied patches:
    


@INC for perl v5.8.4:
    /etc/perl
    /usr/local/lib/perl/5.8.4
    /usr/local/share/perl/5.8.4
    /usr/lib/perl5
    /usr/share/perl5
    /usr/lib/perl/5.8
    /usr/share/perl/5.8
    /usr/local/lib/site_perl
    .


Environment for perl v5.8.4:
    HOME=/home/bas
    LANG (unset)
    LANGUAGE (unset)
    LD_LIBRARY_PATH (unset)
    LOGDIR (unset)
    PATH=/home/bas/bin:/usr/local/bin:/usr/bin:/bin:/usr/bin/X11:/usr/games:/sbin:/usr/sbin:/sbin:/usr/sbin
    PERL_BADLANG (unset)
    SHELL=/bin/bash

@p5pRT
Copy link
Author

p5pRT commented Oct 25, 2005

From [email protected]

On Oct 25, via RT and UNEXPECTED_DATA_AFTERbas@​quarantainenet.nlsaid​:

if ($line =~ m[^\[([A-Za-z0-9_/​:\-<>\.=~]+)\]$]) {

This is a bug that has to do with using m[...] and trying to include a
non-character-class-starting [ in the regex.

This m[abc[def]ghi] is the same as /abc[def]ghi/. Likewise,
m[abc\[def\]ghi] is ALSO the same as /abc[def]ghi/. The backslashing
process makes it impossible for you to get a [ to mean "match a [" easily.
Here's a work-around​:

  m[abc[\[]ghi]

That's like /abc\[ghi/, except that [\[] is really a char class of one
char, '['.

It's got to do with how perl handles backslashes when the character being
backslashed is the delimiting character of the string.

--
Jeff "japhy" Pinyan % How can we ever be the sold short or
RPI Acacia Brother #734 % the cheated, we who for every service
http​://www.perlmonks.org/ % have long ago been overpaid?
http​://princeton.pm.org/ % -- Meister Eckhart

@p5pRT
Copy link
Author

p5pRT commented Oct 25, 2005

The RT System itself - Status changed from 'new' to 'open'

@p5pRT
Copy link
Author

p5pRT commented Oct 25, 2005

From @rgs

Jeff 'japhy' Pinyan wrote​:

It's got to do with how perl handles backslashes when the character being
backslashed is the delimiting character of the string.

Yes, and I'm under the impression it's properly documented in
perlop, "Gory details of parsing quoted constructs". Although
not very clearly...

@p5pRT
Copy link
Author

p5pRT commented Oct 25, 2005

@rgs - Status changed from 'open' to 'rejected'

@p5pRT p5pRT closed this as completed Oct 25, 2005
@p5pRT
Copy link
Author

p5pRT commented Oct 26, 2005

From [email protected]

On 25/10/05 16​:37, japhy@​perlmonk.org via RT wrote​:

On Oct 25, via RT and bas@​quarantainenet.nl said​:

if ($line =~ m[^\[([A-Za-z0-9_/​:\-<>\.=~]+)\]$]) {

This is a bug that has to do with using m[...] and trying to include a
non-character-class-starting [ in the regex.

This m[abc[def]ghi] is the same as /abc[def]ghi/. Likewise,
m[abc\[def\]ghi] is ALSO the same as /abc[def]ghi/. The backslashing
process makes it impossible for you to get a [ to mean "match a [" easily.
Here's a work-around​:

m[abc[\[]ghi]

That's like /abc\[ghi/, except that [\[] is really a char class of one
char, '['.

It's got to do with how perl handles backslashes when the character being
backslashed is the delimiting character of the string.

I understand now. Problem solved by patching B​::Deparse, I'll send
Stephen McCamant the patch. Thanks for your fast response, and sorry for
filing a false bug report.

=====
#!/usr/bin/perl -w
use strict;
my $s = 'aa(abbb{3}c[ccd)d]d';
$s =~ m((a.*b.*)) and print "'$&amp;'\n" or print "no match\n";
$s =~ m(\(a.*b.*\)) and print "'$&amp;'\n" or print "no match\n";
$s =~ m(\x28a.*b.*\x29) and print "'$&amp;'\n" or print "no match\n";
print "\n";
$s =~ m[[c.*d.*]] and print "'$&amp;'\n" or print "no match\n";
$s =~ m[\[c.*d.*\]] and print "'$&amp;'\n" or print "no match\n";
$s =~ m[\x5bc.*d.*\x5d] and print "'$&amp;'\n" or print "no match\n";
print "\n";
$s =~ m{b{3}} and print "'$&amp;'\n" or print "no match\n";
$s =~ m{b\{3\}} and print "'$&amp;'\n" or print "no match\n";
$s =~ m{b\x7b3\x7d} and print "'$&amp;'\n" or print "no match\n";

--
Bas van Sisseren <bas@​quarantainenet.nl>
Quarantainenet

@p5pRT
Copy link
Author

p5pRT commented Oct 26, 2005

From @rgs

Bas van Sisseren wrote​:

I understand now. Problem solved by patching B​::Deparse, I'll send
Stephen McCamant the patch. Thanks for your fast response, and sorry for
filing a false bug report.

No problem; but send the patch here too. (B​::Deparse being a core module.)

@p5pRT
Copy link
Author

p5pRT commented Oct 26, 2005

From [email protected]

On 26/10/05 13​:22, Rafael Garcia-Suarez via RT wrote​:

Bas van Sisseren wrote​:

I understand now. Problem solved by patching B​::Deparse, I'll send
Stephen McCamant the patch. Thanks for your fast response, and sorry for
filing a false bug report.

No problem; but send the patch here too. (B​::Deparse being a core module.)

Ok, no problem. I've added an extra check in the balanced_delim function
(which is used for checking whether a specific delimiter can be used).

When a '\'.<delimiter character> in the string is found, $fail is set to 1.

The code which breaks with the unpatched B​::Deparse​:

use B​::Deparse;
my $deparse = B​::Deparse->new('-p', '-sC');
eval "sub ".$deparse->coderef2text( sub { /^\[([a-z\/]+)\]$/ } );
warn $@​ if $@​;

--
Bas van Sisseren <bas@​quarantainenet.nl>
Quarantainenet

@p5pRT
Copy link
Author

p5pRT commented Oct 26, 2005

From [email protected]

b__deparse_re-fix.patch
--- B/Deparse.pm.orig	2005-10-17 18:17:01.000000000 +0200
+++ B/Deparse.pm	2005-10-26 10:22:25.000000000 +0200
@@ -3367,14 +3367,16 @@
 sub balanced_delim {
     my($str) = @_;
     my @str = split //, $str;
-    my($ar, $open, $close, $fail, $c, $cnt);
+    my($ar, $open, $close, $fail, $c, $cnt, $last_bs);
     for $ar (['[',']'], ['(',')'], ['<','>'], ['{','}']) {
 	($open, $close) = @$ar;
-	$fail = 0; $cnt = 0;
+	$fail = 0; $cnt = 0; $last_bs = 0;
 	for $c (@str) {
 	    if ($c eq $open) {
+		$fail = 1 if $last_bs;
 		$cnt++;
 	    } elsif ($c eq $close) {
+		$fail = 1 if $last_bs;
 		$cnt--;
 		if ($cnt < 0) {
 		    # qq()() isn't ")("
@@ -3382,6 +3384,7 @@
 		    last;
 		}
 	    }
+	    $last_bs = $c eq '\\';
 	}
 	$fail = 1 if $cnt != 0;
 	return ($open, "$open$str$close") if not $fail;

@p5pRT
Copy link
Author

p5pRT commented Oct 31, 2005

From @rgs

Bas van Sisseren wrote​:

No problem; but send the patch here too. (B​::Deparse being a core module.)

Ok, no problem. I've added an extra check in the balanced_delim function
(which is used for checking whether a specific delimiter can be used).

When a '\'.<delimiter character> in the string is found, $fail is set to 1.

Thanks, I applied your change to the development copy of perl
as change #25934.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant