Skip to content

Bug in split, when EXPRESSION only contains PATTERN #6931

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
p5pRT opened this issue Nov 17, 2003 · 3 comments
Closed

Bug in split, when EXPRESSION only contains PATTERN #6931

p5pRT opened this issue Nov 17, 2003 · 3 comments

Comments

@p5pRT
Copy link

p5pRT commented Nov 17, 2003

Migrated from rt.perl.org#24507 (status was 'resolved')

Searchable as RT24507$

@p5pRT
Copy link
Author

p5pRT commented Nov 17, 2003

From [email protected]

split / /; (or split /a/; , or similar) produces the empty list if
expression contains only PATTERN (once or multiple times).

The documentation on split says the following​:
"Empty leading (or trailing) fields are produced when there are positive
width matches at the beginning (or end) of the string" and "split(/ /) will
give you as many null initial fields as there are leading spaces".
Therefore I expect that split /a/,'aa'; should produce 2 empty strings as
result list, but not the empty list. There is no mentioning that the
generation of empty strings is valid only if there are further characters
not equal PATTERN following.

For some test cases please see test program below.

This bug exists for a long time (I think at least in early 5.6.1) and still
persists up to 5.8.1.

###################### start test program ###############################
#!/usr/local/bin/perl
# test script to show split bug (description see end of program)
# Wolf-Dietrich Moeller, 2003-11-14,
<mailto​:wolf-dietrich.moeller@​siemens.com>
# tested on Perl 5.8.0_806 and 5.8.1_807 Win32 ActiveState,
# also on Perl 5.6.1_633 ActiveState Win32,
# and even older Perl 5.6.1 under Apache webserver and freeBSD (source
distribution)
# output is (command Line and CGI-script)​:
#######################################################
# -- split(/a/)
# 01​: in=undefined @​val=0
# 02​: in='' @​val=0
# 03​: in='a' @​val=0
# 04​: in='aa' @​val=0
# 05​: in=' a' @​val=1 value(length)​: ' ' (1),
# 06​: in='a ' @​val=2 value(length)​: '' (0), ' ' (1),
# 07​: in='aa ' @​val=3 value(length)​: '' (0), '' (0), ' ' (1),
# -- split(/ /)
# 08​: in=undefined @​val=0
# 09​: in='' @​val=0
# 10​: in=' ' @​val=0
# 11​: in=' ' @​val=0
# 12​: in='a ' @​val=1 value(length)​: 'a' (1),
# 13​: in=' a' @​val=2 value(length)​: '' (0), 'a' (1),
# 14​: in=' a' @​val=3 value(length)​: '' (0), '' (0), 'a' (1),
#######################################################
use strict;
binmode STDOUT;
print "Content-Type​: text/plain\x0D\x0A\x0D\x0A";
#
my @​val;
my $j = 0;
print "# -- split(/a/)\x0D\x0A";
for (undef,'','a','aa',' a','a ','aa ') {
if (length(++$j) < 2) { $j = '0'.$j }
@​val = split(/a/);
print '# ',$j,'​: in=',(defined($_)?'\''.$_.'\''​:'undefined'),'
@​val=',scalar @​val;
if (@​val) {
  print ' value(length)​: ';
  for (@​val) { print '\'',$_,'\' (',length($_),'), ' }
  }
print "\x0D\x0A";
}
#
print "# -- split(/ /)\x0D\x0A";
for (undef,'',' ',' ','a ',' a',' a') {
if (length(++$j) < 2) { $j = '0'.$j }
@​val = split(/ /);
print '# ',$j,'​: in=',(defined($_)?'\''.$_.'\''​:'undefined'),'
@​val=',scalar @​val;
if (@​val) {
  print ' value(length)​: ';
  for (@​val) { print '\'',$_,'\' (',length($_),'), ' }
  }
print "\x0D\x0A";
}
#
print join "\x0D\x0A",
'#',
'# error in lines 3 + 4 (and 10 + 11)',
'# there should be one or two empty strings in @​val according to doc on
split​:',
'# "Empty leading (or trailing) fields are produced when there are
positive',
'# width matches at the beginning (or end) of the string" and',
'# "split(/ /) will give you as many null initial fields as there are
leading',
'# spaces". There is no mentioning that this is valid only if there are
further',
'# characters not equal PATTERN following (as in line 6 + 7 and 13 + 14).',
'#';
######################## end test program ####################


Dr. Wolf-Dietrich Moeller
Siemens AG, CT IC 3, D-81730 München
Corporate Technology Department Security
Mch P, Tel. +49 89 636-53391, Fax -48000
mailto​:wolf-dietrich.moeller@​siemens.com
Intranet https://security.ct.siemens.de/

@p5pRT
Copy link
Author

p5pRT commented Nov 18, 2003

From [email protected]

On Nov 17, Moeller Wolf-Dietrich said​:

split / /; (or split /a/; , or similar) produces the empty list if
expression contains only PATTERN (once or multiple times).

"Empty leading (or trailing) fields are produced when there are positive
width matches at the beginning (or end) of the string" and "split(/ /) will
give you as many null initial fields as there are leading spaces".

Therefore I expect that split /a/,'aa'; should produce 2 empty strings as
result list, but not the empty list. There is no mentioning that the
generation of empty strings is valid only if there are further characters
not equal PATTERN following.

The documentation also says​:

  If LIMIT is specified and positive, splits into no more than that
  many fields (though it may split into fewer). If LIMIT is unspecified
  or zero, trailing null fields are stripped (which potential users
  of C<pop()> would do well to remember). If LIMIT is negative, it is
  treated as if an arbitrarily large LIMIT had been specified.

Therefore, split(/a/, "aaa", -1) would return ("", "", "", ""), but
split(/a/, "aaa") returns (), because all the empty fields are trailing.
You might argue that they're leading, but they're trailing.

--
Jeff "japhy" Pinyan japhy@​pobox.com http​://www.pobox.com/~japhy/
RPI Acacia brother #734 http​://www.perlmonks.org/ http​://www.cpan.org/
<stu> what does y/// stand for? <tenderpuss> why, yansliterate of course.
[ I'm looking for programming work. If you like my work, let me know. ]

@p5pRT p5pRT closed this as completed Nov 18, 2003
@p5pRT
Copy link
Author

p5pRT commented Nov 18, 2003

@rgs - Status changed from 'new' to 'resolved'

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant