Skip to content

open() on ref to many MBs long scalar, then <$fh>, causes fatal OOM in pp_readline and sv_gets, probably Win32-only #22623

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
bulk88 opened this issue Sep 25, 2024 · 5 comments

Comments

@bulk88
Copy link
Contributor

bulk88 commented Sep 25, 2024

open() on a large scalar (MBs worth), with LF lines, NOT CRLF lines, then calling <$fh> aka pp_readline(), causes a mem leak and a fatal "Out of memory!" on 5.32. 5.41 error is similar but more verbose. Tested both 5.41.5 32b and strawberry-perl 5.32.1.1 32b. Both fatal OOM.

Call stack at realloc() fail return NULL and with 5.41.5 blead perl. Line numbers are fuzzy since this is a -O1/release perl build.

	perl541.dll!VMem::Realloc(void * pMem, unsigned int size) Line 199	C++
 	[Inline Frame] perl541.dll!CPerlHost::Realloc(void *) Line 60	C++
 	perl541.dll!PerlMemRealloc(IPerlMem * piPerl, void * ptr, unsigned int size) Line 303	C++
 	perl541.dll!Perl_safesysrealloc(void * where, unsigned int size) Line 307	C
 	perl541.dll!Perl_sv_grow(interpreter * my_perl, sv * const sv, unsigned int newlen) Line 1425	C
 	perl541.dll!Perl_sv_gets(interpreter * my_perl, sv * const sv, _PerlIO * * const fp, int append) Line 9116	C
 	perl541.dll!Perl_do_readline(interpreter * my_perl) Line 4215	C
 	perl541.dll!Perl_runops_standard(interpreter * my_perl) Line 41	C
 	perl541.dll!S_run_body(interpreter * my_perl, long oldscope) Line 2873	C
 	perl541.dll!perl_run(interpreter * my_perl) Line 2779	C
 	perl541.dll!RunPerl(int argc, char * * argv, char * * env) Line 202	C++
 	[Inline Frame] perl.exe!invoke_main() Line 78	C++
 	perl.exe!__scrt_common_main_seh() Line 288	C++
 	kernel32.dll!@BaseThreadInitThunk@12�()	Unknown
 	ntdll.dll!___RtlUserThreadStart@8�()	Unknown
 	ntdll.dll!__RtlUserThreadStart@8�()	Unknown

The toxic sv_grow() is called from

        if (shortbuffered) {		/* oh well, must extend */
            /* we didn't have enough room to fit the line into the target buffer
             * so we must extend the target buffer and keep going */
            cnt = shortbuffered;
            shortbuffered = 0;
            bpx = bp - (STDCHAR*)SvPVX_const(sv); /* box up before relocation */
            SvCUR_set(sv, bpx);
            /* extned the target sv's buffer so it can hold the full read-ahead buffer */
            SvGROW(sv, SvLEN(sv) + append + cnt + 2);
            bp = (STDCHAR*)SvPVX_const(sv) + bpx; /* unbox after relocation */
            continue;
        }

The 14MB realloc count I THINK comes from

    /* get the number of bytes remaining in the read-ahead buffer
     * on first call on a given fp this will return 0.*/
    cnt = PerlIO_get_cnt(fp);

since the open() on $scalar, the PerlIO backend is just returning the whole scalar's len. I'm not sure that readline() having an infinity buffer. is the best design, since in my case, for a private biz app, I was doing open() on $scalar, and $scalar was a RO mmap string from File::Map. The bug does not involve File::Map. My repro script doesn't use File::Map. The bug is more specifically the loops in readline() or sv_gets() .

        if (gimme == G_LIST) {
            if (SvLEN(sv) - SvCUR(sv) > 20) {
                SvPV_shrink_to_cur(sv);
            }
            /* XXX on RC builds, push on stack rather than mortalize ? */
            sv = sv_2mortal(newSV(80));
            continue;
        }

I suspect the SvPV_shrink_to_cur(sv); isn't really working on Win32 or there is a perl level leak until next FREETMPS/NEXTSTATE, in the 2 loops (in pp_readline() and sv_gets()). Note in the attached screenshots, the 14 MB alloc over and over. Im not able to fully debug this, but either the Perl SVPV shrink is broken, and the block isn't being shrunk by malloc()/HeapAlloc/RtlHeapWhatever, and stays at the full 14MB for the rest of the lifetime of the malloc block. Or the 2 loops are actually leaking.

Note Win32 realloc/HeapReAlloc MIGHT BE totally incapable of shrinking buffers (an "optimization"), or HeapReAlloc adds the "14MB-80??? bytes" "released" "free-ed 4096 byte memory pages" to the <= 16KB/64KB/128KB/512KB user mode alloc pools ( https://www.blackhat.com/docs/us-16/materials/us-16-Yason-Windows-10-Segment-Heap-Internals-wp.pdf ). But Perl keeps asking for "14MB" chunks, and Win32 Heap by design/policy/API/ABI (oh no on ABI) back compat, all 14MB allocs must be raw rounded up 4096 pages VM and must come from the kernel raw VM page allocator. So maybe "fragmentation" or the 512KB rule causes OOM on 32b perl. I didn't test the repro on 64b perl.

To repeat, IDK if this is a Perl level leak until FREETMPS or a Win32 Heap API problem, I have 3 hypothesis tho described above. In any case its an OOM and a pretty bad bug, since it is rooted in CRLF/LF and pretty simple to accidentally trigger, but someone could argue that open() on a scalar is very rare and perf degrad/poor design, since you might as well use index() and substr() for perf if the whole file is already in a scalar string.

I did observe the realloc() len arg, was slowly dropping 1 or 100 ish or 500 ish bytes per loop iteration. Screenshot shows the VM OS alloc eventually starts dropping in 4KB pages which confirms the pp_readline() or sv_gets() leak is O(n^2) ish but slowly drops.

Im leaving this crash to someone else, there are too many optimizations and tricks in pp_readline and sv_gets, and I didn't spot an obv quick fix to the leak. Plus my quick glance left more design questions, then answers, because of all the lvalue PV buffer swapping/save stacking/IDK what optimizations in there. I thought/hypo-ed that changing pp_readline() to a 80 byte or 256 or 1024ish byte buffer limit might superficially stop the OOM, but still be leaving leaks until NEXTSTATE/FREETMPS in the code, so I dont see a instant simple fix. Plus design debate, about malloc/memcpying, the ENTIRE!!! PerlIO open( )on $scalar's backend SVPV, from rvalue to lvalue, Even if src backend SVPV is 100's of MBs long!!!! hence this tkt.

Steps to Reproduce

Note the 32b perl, the 16MB input file len , and the LF input file, and binmode :raw. Since I didnt test this on Linux, this might be a Win32 only Perl bug and not repro on Linux, since Perl default record sep must be CRLF, and my input is LF. I didn't test the crash script with a 16MB CRLF input .csv. The "14MB" input I mention, was the original private CSV file that lead to this bug. The 16MB .csv in the repro script same style input.

use File::Slurp;
if(!-e "bigcsv.csv") {
  my @lines;
  for(0..0xFFFF) {
    my $line = '';


    do {
      $line .= (int(rand(0x8000000))+"");
    } while (length $line < 256);
    push(@lines, substr($line,0, 255));
  }
  write_file("bigcsv.csv", { binmode => ':raw' }, join("\x0a", @lines));
  print "made bigcsv run this script again";
  exit 0;
}
my $fh;
my $bin = read_file("bigcsv.csv", { binmode => ':raw' });
die "open" if !open($fh, '<', \$bin);
foreach(<$fh>){
  print $_;
}

5.32 fatal OOM output

C:\sources\plrl>perl crash.pl
Out of memory!

C:\sources\plrl>

Expected behavior

<$fh> returns separated string lines, or returns undef. Not fatal OOM.

Perl configuration

Also OOM fails on 32b 5.41.5.

C:\Users\Owner>perl -V
Summary of my perl5 (revision 5 version 32 subversion 1) configuration:

  Platform:
    osname=MSWin32
    osvers=10.0.19042.746
    archname=MSWin32-x86-multi-thread-64int
    uname='Win32 strawberry-perl 5.32.1.1 #1 Sun Jan 24 12:17:47 2021 i386'
    config_args='undef'
    hint=recommended
    useposix=true
    d_sigaction=undef
    useithreads=define
    usemultiplicity=define
    use64bitint=define
    use64bitall=undef
    uselongdouble=undef
    usemymalloc=n
    default_inc_excludes_dot=define
    bincompat5005=undef
  Compiler:
    cc='gcc'
    ccflags =' -DWIN32 -D__USE_MINGW_ANSI_STDIO -DPERL_TEXTMODE_SCRIPTS -DPERL_I
MPLICIT_CONTEXT -DPERL_IMPLICIT_SYS -DUSE_PERLIO -fwrapv -fno-strict-aliasing -m
ms-bitfields'
    optimize='-s -O2'
    cppflags='-DWIN32'
    ccversion=''
    gccversion='8.3.0'
    gccosandvers=''
    intsize=4
    longsize=4
    ptrsize=4
    doublesize=8
    byteorder=12345678
    doublekind=3
    d_longlong=define
    longlongsize=8
    d_longdbl=define
    longdblsize=12
    longdblkind=3
    ivtype='long long'
    ivsize=8
    nvtype='double'
    nvsize=8
    Off_t='long long'
    lseeksize=8
    alignbytes=8
    prototype=define
  Linker and Libraries:
    ld='g++'
    ldflags ='-s -L"C:\STRAWB~1\perl\lib\CORE" -L"C:\STRAWB~1\c\lib"'
    libpth=C:\STRAWB~1\c\lib C:\STRAWB~1\c\i686-w64-mingw32\lib C:\STRAWB~1\c\li
b\gcc\i686-w64-mingw32\8.3.0
    libs= -lmoldname -lkernel32 -luser32 -lgdi32 -lwinspool -lcomdlg32 -ladvapi3
2 -lshell32 -lole32 -loleaut32 -lnetapi32 -luuid -lws2_32 -lmpr -lwinmm -lversio
n -lodbc32 -lodbccp32 -lcomctl32
    perllibs= -lmoldname -lkernel32 -luser32 -lgdi32 -lwinspool -lcomdlg32 -ladv
api32 -lshell32 -lole32 -loleaut32 -lnetapi32 -luuid -lws2_32 -lmpr -lwinmm -lve
rsion -lodbc32 -lodbccp32 -lcomctl32
    libc=
    so=dll
    useshrplib=true
    libperl=libperl532.a
    gnulibc_version=''
  Dynamic Linking:
    dlsrc=dl_win32.xs
    dlext=xs.dll
    d_dlsymun=undef
    ccdlflags=' '
    cccdlflags=' '
    lddlflags='-mdll -s -L"C:\STRAWB~1\perl\lib\CORE" -L"C:\STRAWB~1\c\lib"'


Characteristics of this binary (from libperl):
  Compile-time options:
    HAS_TIMES
    HAVE_INTERP_INTERN
    MULTIPLICITY
    PERLIO_LAYERS
    PERL_COPY_ON_WRITE
    PERL_DONT_CREATE_GVSV
    PERL_IMPLICIT_CONTEXT
    PERL_IMPLICIT_SYS
    PERL_MALLOC_WRAP
    PERL_OP_PARENT
    PERL_PRESERVE_IVUV
    USE_64_BIT_INT
    USE_ITHREADS
    USE_LARGE_FILES
    USE_LOCALE
    USE_LOCALE_COLLATE
    USE_LOCALE_CTYPE
    USE_LOCALE_NUMERIC
    USE_LOCALE_TIME
    USE_PERLIO
    USE_PERL_ATOF
  Built under MSWin32
  Compiled at Jan 24 2021 12:22:49
  @INC:
    C:/Strawberry/perl/site/lib
    C:/Strawberry/perl/vendor/lib
    C:/Strawberry/perl/lib
C:\Users\Owner>

leak2
leak3
leak4
leak1

@tonycoz
Copy link
Contributor

tonycoz commented Sep 25, 2024

It's part of #21877 and not Windows specific, though it appears to cause worse performance on Windows than on Linux (probably due to Linux not committing all allocations by default)

@bulk88
Copy link
Contributor Author

bulk88 commented Jan 11, 2025

#21654 related

copy pasting relevent parts of #5454 (comment) below

Nothing really changed in 10 years Windows perf wise, I can still reproduce the pathologic move 200KB, every 4096 bytes of alloc size increase. updated code

// setup init your own   c static LARGE_INTEGER g_Frequency; var
/* BTIME = BENCH TIME*/
#define BTIMESTART do { \
    LARGE_INTEGER StartingTime, EndingTime, ElapsedMicroseconds; \
    QueryPerformanceCounter(&StartingTime)

#define BTIMEEND(label) \
    QueryPerformanceCounter(&EndingTime); \
    ElapsedMicroseconds.QuadPart = EndingTime.QuadPart - StartingTime.QuadPart; \
    ElapsedMicroseconds.QuadPart *= 1000000000; \
    ElapsedMicroseconds.QuadPart /= ((LARGE_INTEGER*)(&g_Frequency))->QuadPart; \
    printf("%-30s %10I64u us Ln %u\n", label, ElapsedMicroseconds.QuadPart, __LINE__); \
} while(0)

void
CAllocIncrement(UV times, bool prtmsg)
PREINIT:
    void * ptr;
    void * oldptr;
    UV i;
PPCODE:
    BTIMESTART;
    ptr = malloc(1);
    oldptr = ptr;
    for(i=1;i<times;i++){
        ptr = realloc(ptr, i);
        if(oldptr != ptr && prtmsg) {
            printf("ptr changed at %" UVf ", ptr=%p\n",i, ptr);
        }
        oldptr = ptr;
    }
    BTIMEEND("AllocIncrement");
    free(ptr);

void
PLAllocIncrement(UV times, bool prtmsg)
PREINIT:
    void * ptr;
    void * oldptr;
    UV i;
PPCODE:
    BTIMESTART;
    Newx(ptr, 1, char);
    oldptr = ptr;
    for(i=1;i<times;i++){
        Renew(ptr, i, char);
        if(oldptr != ptr && prtmsg) {
            printf("ptr changed at %" UVf ", ptr=%p\n",i, ptr);
        }
        oldptr = ptr;
    }
    BTIMEEND("PlAllocIncrement");
    Safefree(ptr);

void
WinAllocIncrement(UV times, bool prtmsg)
PREINIT:
    void * ptr;
    void * oldptr;
    UV i;
    HANDLE procheap;
PPCODE:
    procheap = GetProcessHeap();
    BTIMESTART;
    ptr = HeapAlloc(procheap, 0, 1);
    oldptr = ptr;
    for(i=1;i<times;i++){
        ptr = HeapReAlloc(procheap, 0, ptr, i);
        if(oldptr != ptr && prtmsg) {
            printf("ptr changed at %" UVf ", ptr=%p\n",i, ptr);
        }
        oldptr = ptr;
    }
    BTIMEEND("WinAllocIncrement");
    HeapFree(procheap, 0, ptr);

NOTICE PERL's Newx IS 68% slower than MS's native malloc() front end HeapAlloc(). This is C code against C code. No Perl high level data structures.

C:\sources\perl5-xst>perl -Mblib -MXst -e"Xst::CRTAllocIncre
ment(4000000,0)"
CRTAllocIncrement               988066786 us Ln 2885

C:\sources\perl5-xst>perl -Mblib -MXst -e"Xst::PLAllocIncrem
ent(4000000,0)"
PLAllocIncrement               1427796251 us Ln 2906

C:\sources\perl5-xst>perl -Mblib -MXst -e"Xst::WinAllocIncre
ment(4000000,0)"
WinAllocIncrement               971460963 us Ln 2928
C:\sources\perl5-xst>perl -Mblib -MXst -e"Xst::WinAllocIncre
ment(4000000,1)" | more
ptr changed at 1, ptr=000000000037F860
ptr changed at 2, ptr=000000000037F850
ptr changed at 3, ptr=000000000037F860
ptr changed at 4, ptr=000000000037F850
ptr changed at 5, ptr=000000000037F860
ptr changed at 6, ptr=000000000037F850
ptr changed at 7, ptr=000000000037F860
ptr changed at 8, ptr=000000000037F850
ptr changed at 9, ptr=000000000037F110
ptr changed at 10, ptr=000000000037F130
ptr changed at 11, ptr=000000000037F110
ptr changed at 12, ptr=000000000037F130
ptr changed at 13, ptr=000000000037F110
ptr changed at 14, ptr=000000000037F130
ptr changed at 15, ptr=000000000037F110
ptr changed at 16, ptr=000000000037F130
ptr changed at 17, ptr=000000000037F110
ptr changed at 18, ptr=000000000037F130
ptr changed at 19, ptr=000000000037F110
ptr changed at 20, ptr=000000000037F130
ptr changed at 21, ptr=000000000037F110
ptr changed at 22, ptr=000000000037F130
ptr changed at 23, ptr=000000000037F110
ptr changed at 24, ptr=000000000037F130
ptr changed at 25, ptr=00000000003758F0
ptr changed at 26, ptr=0000000000375980
ptr changed at 27, ptr=00000000003758F0
ptr changed at 28, ptr=0000000000375980
ptr changed at 29, ptr=00000000003758F0
ptr changed at 30, ptr=0000000000375980
ptr changed at 31, ptr=00000000003758F0
ptr changed at 32, ptr=0000000000375980
ptr changed at 33, ptr=00000000003758F0
ptr changed at 34, ptr=0000000000375980
ptr changed at 35, ptr=00000000003758F0
ptr changed at 36, ptr=0000000000375980
ptr changed at 37, ptr=00000000003758F0
ptr changed at 38, ptr=0000000000375980
ptr changed at 39, ptr=00000000003758F0
ptr changed at 40, ptr=0000000000375980
ptr changed at 41, ptr=000000000037FC10
ptr changed at 42, ptr=0000000000380050
ptr changed at 43, ptr=000000000037FC10
ptr changed at 44, ptr=0000000000380050
ptr changed at 45, ptr=000000000037FC10
ptr changed at 46, ptr=0000000000380050
CUT
ptr changed at 3915681, ptr=0000000002970040
ptr changed at 3919777, ptr=00000000025B0040
ptr changed at 3923873, ptr=0000000002970040
ptr changed at 3927969, ptr=00000000025B0040
...........
ptr changed at 3968929, ptr=0000000002980040
ptr changed at 3973025, ptr=00000000025B0040
ptr changed at 3977121, ptr=0000000002980040
ptr changed at 3981217, ptr=00000000025B0040
ptr changed at 3985313, ptr=0000000002980040
ptr changed at 3989409, ptr=00000000025B0040
ptr changed at 3993505, ptr=0000000002980040
ptr changed at 3997601, ptr=0000000002D50040

This is still a problem in 5.41.8/Win64/Win7. IDK if MS ever improved this in Win10/11.

@bulk88
Copy link
Contributor Author

bulk88 commented Jan 11, 2025

PL5418
miniperl.exe 5.41.8, no threads. by chance i created this gui layout, and scrolled into this area of memory. Since this is no threads perl, bytes 24-32 on the right must be kernel32's malloc header, not perl's header. I see a large, large, amount of memory wasted. This screen shot really is advocating for @iabyn short strings experiment discussed here https://www.nntp.perl.org/group/perl.perl5.porters/2017/03/msg243827.html but i cant find any more discussion on it.

@tonycoz
Copy link
Contributor

tonycoz commented Jan 20, 2025

re: malloc performance: I'd noticed when doing some profiling that the ucrt malloc was using a remarkably low amount of processing time (the error number saving, which requires GetLastError()/SetLastError() was significant) of a perl allocation call.

Unfortunately the wrappers add cost.

large realloc() performance: unfortunately fixing this would require that NT implemented something like mremap(), I suspect the NT memory management model doesn't allow you to remap part of a memory mapping, while on Linux the following succeeds:

#define _GNU_SOURCE
#include <stdio.h>
#include <sys/mman.h>

int main() {
  char *p = mmap(NULL, 0x10000, PROT_READ | PROT_WRITE, MAP_ANON | MAP_PRIVATE, -1, 0);

  if (p == (void*)-1) {
    perror("mmap");
    return 1;
  }

  char *p2 = mremap(p+0x8000, 0x8000, 0x20000, MREMAP_MAYMOVE);
  if (p2 == (void *)-1) {
    perror("mremap");
    return 1;
  }

  printf("success %p %p!\n", p, p2);
  return 0;
}

@bulk88
Copy link
Contributor Author

bulk88 commented Apr 15, 2025

re: malloc performance: I'd noticed when doing some profiling that the ucrt malloc was using a remarkably low amount of processing time (the error number saving, which requires GetLastError()/SetLastError() was significant) of a perl allocation call.

Correct, I single stepped my Win 7 HeapAlloc() in LFH mode (OS default since >= Vista), a totally random <=64 bytes callsite I used for this instruction count, RtlAllocateHeap() was exactly 92 x64 instructions, from the very beginning, which is mov rax, rsp; push rbp; push rsi; push rdi; push r12; sub rsp, 0E8h; to the return opcode. MS spent a huge amount of R&D/$/resources on their modern (>= NT 6.0/6.1) HeapAlloc() impl to make it fast and they were successful.

Unfortunately the wrappers add cost.

I have a PR 80% finished turning WinPerl's Perl_get_context() into being 4 to 7 instructions big/long. 4 ins is the fast path and 100% of perl.exe processes will go down the 4 instruction branch.

large realloc() performance: unfortunately fixing this would require that NT implemented something like mremap(), I suspect the NT memory management model doesn't allow you to remap part of a memory mapping, while on Linux the following succeeds:

I think its sort of possible to do on NT but multiple roadblocks exist.

First one is, once a NT mmap "view"/"address space" refcounted kernel object is created, its start pointer and end pointer can never be changed for the rest of the life of that "address space" object. To "grow" it, there are 2 choices, make a 65KB long 2nd address space adjacent "address space" refcounted kernel object, or create a 2nd longer in bytes "address space" refcounted kernel object at a random new address, and include the "virtual unmapped pages" from previous address space object's backing storage object as the new bottom half of the new "address space" kernel object. "backing storage object" is obviously either the official Windows paging file, or another temp file on the disk, possibly being an NTFS anonymous/or NTFS delete_pending flagged file (think unix).

2nd roadblock is, "address space" refcounted kernel objects come in contiguous units of 65KB and are aligned at 65KB in addr space. The 4KB pages inside a unit of 65KB can be manipulated (commit->reserved->ro->cow->rw commit->reserved etc) individually at runtime, but using only the first 4KB page of a 65KB object, and keeping the rest marked as reserved/no access, will blow out 2GB/3.5GB of i386 address space pretty quick, and cause a system wide GUI near-lockup/ultra-slow GUI on x64, and 100% CPU on 1 core in "System process". I know Linux mmap allows units of 4KB, of a random length of units, aligned at 4KB, as address space objects. NT kernel only allow 65KB units, but you can still flip the MMU bits individually on the 4KB units that make up the 65KB units. Basically MS decided in 1980s/1993, for performance reasons, they will not "hash table track" the 1st page table layer, on i386, which is 1024 long == 2^10 or 2^12 if I read this https://wiki.osdev.org/Paging. Or another quick theory, 2^4=16, 4KB * 16 == 65 KB, and 2^12, takes 12 bits away, leaving bits 13, 14, 15, 16 unused inside a U16. So WinNT picked data locality/least mem pointer derefs as a design goal during a page fault. Unix/POSIX spec/Linux picked developer friendliness instead of performance as a design goal. Probably Raymond Chen has gossip on the design choice, but it the reasons/rational are irrelevant, Its not changing any time soon.

For Perl 5's purposes, if WinPerl wants to do massive grow/shrink cycles in deltas/units of 100s of KBs/MBs at a time, the question is why. 2 ways this can go I think

Either there are serious algorithmic bugs of why the buffer was so over-estimated or over-extended originally and those algo bugs are bugs, not optimizations, as originally described by their authors.

Or conceptual SvCUR() used to be that big but naturally got smaller, and now that "tail" is wasted and a significant (no criteria) amount of waste and needs to be GCed back to MS CRT/HeapAlloc/p5p's malloc.c/AnyOSLibC/WinPerl's CPHost malloc.

If you will grep the MS CRT source code, you might find a non-sequitor src code comment, that describes a MS internal/private CI test that "verifies" the MS CRT will not SEGV/hang/deadlock, with exactly 2^32-8096 or 2^32-(4096+2048) free malloc()/HeapAlloc() bytes left in the process, and that MS has a "contractual obligation" with one of their very large customers to run safely/stable under that extreme environment.

Ive wondered about that comment in the UCRT for a while, I can only guess that means, the PC has run out of disk space and therefore paging file space, so NtAllocateVirtualMemory() fails, but 8KB isn't 4KB or 65KB (NT kernel MMU object size). I recently thought of something else is going on......

Which app developer/company, decided that executing free() is for idiots and fools and INTENTIONALLY NEVER free() any malloc blocks, until malloc() returns a NULL ptr, and ONLY THEN they run their GC code/object tree walking code in their process and start calling free()!!!!!

That ^^^^ design is improper IMO for P5 interp to be doing.

If the PP VM/C or XS VM state really really needs to "shrink" a malloc block on WinPerl, WinPerl needs special casing to malloc() a new smaller block and copy over the data, and call the free() on the old block. Not using the UB from POSIX through ptr = realloc(ptr, -curlen);.

int main() {
  char *p = mmap(NULL, 0x10000, PROT_READ | PROT_WRITE, MAP_ANON | MAP_PRIVATE, -1, 0);

  if (p == (void*)-1) {
    perror("mmap");
    return 1;
  }

  char *p2 = mremap(p+0x8000, 0x8000, 0x20000, MREMAP_MAYMOVE);
  if (p2 == (void *)-1) {
    perror("mremap");
    return 1;
  }

Recent update,

Seems like this HeapReAlloc()/CRT realloc() "refuse to go backwards"/"shrink in place" like Unix malloc() bug is MUCH more wide spread inside the interp. I profiles miniperl.exe running /lib/unicore/mktables and it seems 6-12% of all cpu time is spent on the permutation of HeapReAlloc() && WinPerl interp core trying to shrink a buffer.

And it seems to me on my Win7 OS, for 16-128 byte long blocks, HeapReAlloc() just memcpy()s the data back and forth on each call, between a pair of always the same 2 identical identical memory addresses in the Win32 Heap pool in the same "size bucket". Free the old ptr , and handing out a new ptr, next realloc(), it frees the new ptr, and hands out the old ptr, on the next realloc(), go back to stage 1,

Ofc it does a memcpy() in the realloc() sequence, as design/normal.

Here 1 problematic backtrace, I included more in an attachment, all of them are regexp engine related.

regexp_engine_mktables_realloc_cpu_burn_call_stacks.txt

RtlReAllocateHeap	ntdll	[unknown]	0
realloc_base	ucrtbase	[unknown]	0
S_invlist_trim	miniperl	C:\sources\perl5\regcomp_invlist.c	256
Perl__invlist_intersection_maybe_complement_2nd	miniperl	C:\sources\perl5\regcomp_invlist.c	1072
S_optimize_regclass	miniperl	C:\sources\perl5\regcomp.c	12011
S_regclass	miniperl	C:\sources\perl5\regcomp.c	11250
S_regatom	miniperl	C:\sources\perl5\regcomp.c	5648
S_regpiece	miniperl	C:\sources\perl5\regcomp.c	4798
S_regbranch	miniperl	C:\sources\perl5\regcomp.c	4563
S_reg	miniperl	C:\sources\perl5\regcomp.c	4286
S_regatom	miniperl	C:\sources\perl5\regcomp.c	5669
S_regpiece	miniperl	C:\sources\perl5\regcomp.c	4798
S_regbranch	miniperl	C:\sources\perl5\regcomp.c	4563
S_reg	miniperl	C:\sources\perl5\regcomp.c	4235
S_regatom	miniperl	C:\sources\perl5\regcomp.c	5669
S_regpiece	miniperl	C:\sources\perl5\regcomp.c	4798
S_regbranch	miniperl	C:\sources\perl5\regcomp.c	4563
S_reg	miniperl	C:\sources\perl5\regcomp.c	4235
S_regatom	miniperl	C:\sources\perl5\regcomp.c	5669
S_regpiece	miniperl	C:\sources\perl5\regcomp.c	4798
S_regbranch	miniperl	C:\sources\perl5\regcomp.c	4563
S_reg	miniperl	C:\sources\perl5\regcomp.c	4235
Perl_re_op_compile	miniperl	C:\sources\perl5\regcomp.c	1781
Perl_pp_regcomp	miniperl	C:\sources\perl5\pp_ctl.c	128
Perl_runops_standard	miniperl	C:\sources\perl5\run.c	41
S_run_body	miniperl	C:\sources\perl5\win32\perl.c	2893
perl_run	miniperl	C:\sources\perl5\win32\perl.c	2806
main	miniperl	C:\sources\perl5\miniperlmain.c	137
__scrt_common_main_seh	miniperl	D:\a\_work\1\s\src\vctools\crt\vcstartup\src\startup\exe_common.inl	288
BaseThreadInitThunk	kernel32	[unknown]	0
RtlUserThreadStart	ntdll	[unknown]	0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants