Skip to content

regular expressions matching lines read from an in-memory scalar is extremely slow in cygwin/MSYS2 perls  #21877

Open
@Old-Green-Man

Description

@Old-Green-Man

Description
With cygwin/MSYS2 perl when you do regular expression matching on lines read from an in-memory scalar and the regex matches something it takes orders of magnitude longer than matching the same lines read from a disk file.

Steps to Reproduce
Below is code to demonstrate the behavior. An example file is attached to run with the code. In this file about 16% of the lines match.

#!/usr/bin/env perl

use warnings;
use strict;
use Time::HiRes qw( time );

my $file = shift @ARGV;
my ($fh, $time, $n);

open $fh, "<", $file;
$time = time;
$n = 0;
while(<$fh>) {
  /^ ?Query/;
  $n++;
}
printf "%f read lines from disk and do RE; n=$n.\n", time - $time;

seek $fh, 0, 0;
my $s = "";
while(<$fh>) {
  $s .= $_;
}
# my $s = do {
#   local $/;
#   <$fh>;
# };

open $fh, "<", \$s;
$time = time;
$n = 0;
while(<$fh>) {
  /^ ?Query/;
  $n++;
}
printf "%f read lines from in-memory file and do RE; n=$n.\n", time - $time;

On my cygwin system this prints:

0.122725 read lines from disk and do RE; n=570694.
27.238712 read lines from in-memory file and do RE; n=570694.

So the in-memory file is about 300 times slower.

Expected behavior
I would expect the times to be roughly in the same ball park.

Perl configuration

Summary of my perl5 (revision 5 version 36 subversion 3) configuration:

  Platform:
    osname=cygwin
    osvers=3.4.10-1.x86_64
    archname=x86_64-cygwin-threads-multi
    uname='cygwin_nt-10.0-22631 walter 3.4.10-1.x86_64 2023-11-29 12:12 utc x86_64 cygwin '
    config_args='-des -Dprefix=/usr -Dmksymlinks -Darchname=x86_64-cygwin-threads -Dlibperl=cygperl5_36.dll -Dcc=gcc -Dld=g++ -Accflags=-ggdb -O2 -pipe -Wall -Werror=format-security -D_FORTIFY_SOURCE=2 -fstack-protector-strong --param=ssp-buffer-size=4 -fdebug-prefix-map=/mnt/share/cygpkgs/perl/perl.x86_64/build=/usr/src/debug/perl-5.36.3-1 -fdebug-prefix-map=/mnt/share/cygpkgs/perl/perl.x86_64/src/perl-5.36.3=/usr/src/debug/perl-5.36.3-1 -fwrapv'
    hint=recommended
    useposix=true
    d_sigaction=define
    useithreads=define
    usemultiplicity=define
    use64bitint=define
    use64bitall=define
    uselongdouble=undef
    usemymalloc=n
    default_inc_excludes_dot=define
  Compiler:
    cc='gcc'
    ccflags ='-DPERL_USE_SAFE_PUTENV -U__STRICT_ANSI__ -D_GNU_SOURCE -ggdb -O2 -pipe -Wall -Werror=format-security -D_FORTIFY_SOURCE=2 -fstack-protector-strong --param=ssp-buffer-size=4 -fdebug-prefix-map=/mnt/share/cygpkgs/perl/perl.x86_64/build=/usr/src/debug/perl-5.36.3-1 -fdebug-prefix-map=/mnt/share/cygpkgs/perl/perl.x86_64/src/perl-5.36.3=/usr/src/debug/perl-5.36.3-1 -fwrapv -fno-strict-aliasing'
    optimize='-O3'
    cppflags='-DPERL_USE_SAFE_PUTENV -U__STRICT_ANSI__ -D_GNU_SOURCE -ggdb -O2 -pipe -Wall -Werror=format-security -D_FORTIFY_SOURCE=2 -fstack-protector-strong --param=ssp-buffer-size=4 -fdebug-prefix-map=/mnt/share/cygpkgs/perl/perl.x86_64/build=/usr/src/debug/perl-5.36.3-1 -fdebug-prefix-map=/mnt/share/cygpkgs/perl/perl.x86_64/src/perl-5.36.3=/usr/src/debug/perl-5.36.3-1 -fwrapv -fno-strict-aliasing'
    ccversion=''
    gccversion='11.4.0'
    gccosandvers=''
    intsize=4
    longsize=8
    ptrsize=8
    doublesize=8
    byteorder=12345678
    doublekind=3
    d_longlong=define
    longlongsize=8
    d_longdbl=define
    longdblsize=16
    longdblkind=3
    ivtype='long'
    ivsize=8
    nvtype='double'
    nvsize=8
    Off_t='off_t'
    lseeksize=8
    alignbytes=8
    prototype=define
  Linker and Libraries:
    ld='g++'
    ldflags =' -Wl,--enable-auto-import -Wl,--export-all-symbols -Wl,--enable-auto-image-base -fstack-protector-strong'
    libpth=/usr/lib
    libs=-lpthread -lnsl -lgdbm -ldb -ldl -lcrypt -lgdbm_compat
    perllibs=-lpthread -lnsl -ldl -lcrypt
    libc=/usr/lib/libcygwin.a
    so=dll
    useshrplib=true
    libperl=cygperl5_36.dll
    gnulibc_version=''
  Dynamic Linking:
    dlsrc=dl_dlopen.xs
    dlext=dll
    d_dlsymun=undef
    ccdlflags=' '
    cccdlflags=' '
    lddlflags=' --shared  -Wl,--enable-auto-import -Wl,--export-all-symbols -Wl,--enable-auto-image-base -fstack-protector-strong'


Characteristics of this binary (from libperl):
  Compile-time options:
    HAS_TIMES
    MULTIPLICITY
    PERLIO_LAYERS
    PERL_COPY_ON_WRITE
    PERL_DONT_CREATE_GVSV
    PERL_OP_PARENT
    PERL_PRESERVE_IVUV
    PERL_USE_SAFE_PUTENV
    USE_64_BIT_ALL
    USE_64_BIT_INT
    USE_ITHREADS
    USE_LARGE_FILES
    USE_LOCALE
    USE_LOCALE_COLLATE
    USE_LOCALE_CTYPE
    USE_LOCALE_NUMERIC
    USE_LOCALE_TIME
    USE_PERLIO
    USE_PERL_ATOF
    USE_REENTRANT_API
    USE_THREAD_SAFE_LOCALE
  Locally applied patches:
    Cygwin: README
    Cygwin: use auto-image-base instead of fixed DLL base address
    Cygwin: modify hints
    Cygwin: Configure correct libsearch
    Cygwin: Configure correct libpth
    Cygwin: Win32 correct UTF8 handling
  Built under cygwin
  Compiled at Nov 30 2023 21:40:29
  %ENV:
    PERL5LIB="/home/dwrice/perl"
    CYGWIN="winsymlinks:nativestrict"
  @INC:
    /home/dwrice/perl
    /usr/local/lib/perl5/site_perl/5.36/x86_64-cygwin-threads
    /usr/local/share/perl5/site_perl/5.36
    /usr/lib/perl5/vendor_perl/5.36/x86_64-cygwin-threads
    /usr/share/perl5/vendor_perl/5.36
    /usr/lib/perl5/5.36/x86_64-cygwin-threads
    /usr/share/perl5/5.36

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions