Skip to content

[PATCH] clean up inline string comparisons in gv_fetchpvn_flags #12565

Closed
@p5pRT

Description

@p5pRT

Migrated from rt.perl.org#115746 (status was 'rejected')

Searchable as RT115746$

Activity

p5pRT

p5pRT commented on Nov 15, 2012

@p5pRT
Author

From @bulk88

Created by @bulk88

See attached patch. There is a small chance of this code not working on
big endian and on non-ASCII platforms. I have neither to test this with.

Perl Info

Flags:
    category=core
    severity=low

Site configuration information for perl 5.17.6:

Configured by Owner at Tue Nov 13 02:44:40 2012.

Summary of my perl5 (revision 5 version 17 subversion 6 patch blead 
2012-11-12.23:49:19 a36462570c37e1b6544fc8746e99db3d683e2ac1 
v5.17.5-458-ga364625) configuration:
  Snapshot of: a36462570c37e1b6544fc8746e99db3d683e2ac1
  Platform:
    osname=MSWin32, osvers=5.1, archname=MSWin32-x86-multi-thread
    uname=''
    config_args='undef'
    hint=recommended, useposix=true, d_sigaction=undef
    useithreads=define, usemultiplicity=define
    useperlio=define, d_sfio=undef, uselargefiles=define, usesocks=undef
    use64bitint=undef, use64bitall=undef, uselongdouble=undef
    usemymalloc=n, bincompat5005=undef
  Compiler:
    cc='cl', ccflags ='-nologo -GF -W3 -MD -Zi -DNDEBUG -Od -G7 -GL 
-arch:SSE2 -DWIN32 -D_CONSOLE -DNO_STRICT  -DPERL_TEXTMODE_SCRIPTS 
-DPERL_IMPLICIT_CONTEXT -DPERL_IMPLICIT_SYS -DUSE_PERLIO 
-D_USE_32BIT_TIME_T',
    optimize='-MD -Zi -DNDEBUG -Od -G7 -GL -arch:SSE2',
    cppflags='-DWIN32'
    ccversion='13.10.6030', gccversion='', gccosandvers=''
    intsize=4, longsize=4, ptrsize=4, doublesize=8, byteorder=1234
    d_longlong=undef, longlongsize=8, d_longdbl=define, longdblsize=8
    ivtype='long', ivsize=4, nvtype='double', nvsize=8, Off_t='__int64', 
lseeksize=8
    alignbytes=8, prototype=define
  Linker and Libraries:
    ld='link', ldflags ='-nologo -nodefaultlib -debug -opt:ref,icf 
-ltcg  -libpath:"c:\perl517\lib\CORE"  -machine:x86'
    libpth="C:\Program Files\Microsoft Visual Studio .NET 2003\VC7\lib"
    libs=oldnames.lib kernel32.lib user32.lib gdi32.lib winspool.lib  
comdlg32.lib advapi32.lib shell32.lib ole32.lib oleaut32.lib  
netapi32.lib uuid.lib ws2_32.lib mpr.lib winmm.lib  version.lib 
odbc32.lib odbccp32.lib comctl32.lib msvcrt.lib
    perllibs=oldnames.lib kernel32.lib user32.lib gdi32.lib 
winspool.lib  comdlg32.lib advapi32.lib shell32.lib ole32.lib 
oleaut32.lib  netapi32.lib uuid.lib ws2_32.lib mpr.lib winmm.lib  
version.lib odbc32.lib odbccp32.lib comctl32.lib msvcrt.lib
    libc=msvcrt.lib, so=dll, useshrplib=true, libperl=perl517.lib
    gnulibc_version=''
  Dynamic Linking:
    dlsrc=dl_win32.xs, dlext=dll, d_dlsymun=undef, ccdlflags=' '
    cccdlflags=' ', lddlflags='-dll -nologo -nodefaultlib -debug 
-opt:ref,icf -ltcg  -libpath:"c:\perl517\lib\CORE"  -machine:x86'

Locally applied patches:
    


@INC for perl 5.17.6:
    C:/perl517/site/lib
    C:/perl517/lib
    .


Environment for perl 5.17.6:
    HOME (unset)
    LANG (unset)
    LANGUAGE (unset)
    LD_LIBRARY_PATH (unset)
    LOGDIR (unset)
    PATH=C:\perl517\bin;C:\Program Files\Microsoft Visual Studio .NET 
2003\Common7\IDE;C:\Program Files\Microsoft Visual Studio .NET 
2003\VC7\BIN;C:\Program Files\Microsoft Visual Studio .NET 
2003\Common7\Tools;C:\Program Files\Microsoft Visual Studio .NET 
2003\Common7\Tools\bin\prerelease;C:\WINDOWS\system32;C:\WINDOWS;C:\WINDOWS\system32\wbem;
    PERL_BADLANG (unset)
    SHELL (unset)


p5pRT

p5pRT commented on Nov 15, 2012

@p5pRT
Author

From @bulk88

Forgot to attach.

p5pRT

p5pRT commented on Nov 15, 2012

@p5pRT
Author

From @bulk88

0001-clean-up-inline-string-comparisons-in-gv_fetchpvn_fl.patch
From 1dfbb7f8430fcfa337023294961fc863208d0cee Mon Sep 17 00:00:00 2001
From: Daniel Dragan <bulk88@hotmail.com>
Date: Thu, 15 Nov 2012 14:39:16 -0500
Subject: [PATCH] clean up inline string comparisons in gv_fetchpvn_flags

The tree of char comparisons is not optimized at all in GCC and Visual C
on 32 bit x86. The machine code is exactly as written in C with ==s and
&&s. This commit converts the per char comparison to larger units. From a
note in handy.h, none of Perl's fixed length integers are fixed length so
size checking macros were added and the old code left as an alternate.
To get from an odd string to an even string length, it is assumed that
name is null terminated. Each inline comparison is length checked before it
is done (existing code). The ARGVOUT NV comparison was quite difficult
since it is highly dependent on the CPU and instruction set and compiler.
Whether the compiler wants to load a pointer to NV or load a constant NV
is upto it, trying to force it became too platform specific (an const but
not static array of 2 I32s cast to a NV * was tried). This saved 0x50 to
0x100 bytes of instructions I think. A different implementation would have
been to allocate an 8 char array, zero it, strncpy, then do only
double/I64 comparisons against it rather than 8/4/2 byte int comparisons
done now.
---
 gv.c |  116 +++++++++++++++++++++++++++++++++++++++++++++++++++++-------------
 1 files changed, 93 insertions(+), 23 deletions(-)

diff --git a/gv.c b/gv.c
index 9de8886..a415c20 100644
--- a/gv.c
+++ b/gv.c
@@ -1468,8 +1468,15 @@ Perl_gv_fetchpvn_flags(pTHX_ const char *nambeg, STRLEN full_len, I32 flags,
 		{
 		    stash = GvHV(gv) = newHV();
 		    if (!HvNAME_get(stash)) {
+#if I32SIZE == 4
+			const char COREs [4] = "CORE";
+#endif
 			if (GvSTASH(gv) == PL_defstash && len == 6
+#if I32SIZE == 4
+			 && *(I32 *)name == *(I32 *)COREs)
+#else
 			 && strnEQ(name, "CORE", 4))
+#endif
 			    hv_name_set(stash, "CORE", 4, 0);
 			else
 			    hv_name_set(
@@ -1507,33 +1514,81 @@ Perl_gv_fetchpvn_flags(pTHX_ const char *nambeg, STRLEN full_len, I32 flags,
 		if (*name == '_')
 		    global = TRUE;
 		break;
-	    case 3:
-		if ((name[0] == 'I' && name[1] == 'N' && name[2] == 'C')
-		    || (name[0] == 'E' && name[1] == 'N' && name[2] == 'V')
-		    || (name[0] == 'S' && name[1] == 'I' && name[2] == 'G'))
-		    global = TRUE;
+	    case 3: {
+#if I32SIZE == 4
+		    /* assume we have a term null even though len is 3 */
+		    const char INCs [4] = "INC\0"; 
+		    const char ENVs [4] = "ENV\0";
+		    const char SIGs [4] = "SIG\0";
+		    if (*(I32 *)name == *(I32 *)INCs
+			|| *(I32 *)name == *(I32 *)ENVs
+			|| *(I32 *)name == *(I32 *)SIGs)
+#else
+		    if ((name[0] == 'I' && name[1] == 'N' && name[2] == 'C')
+			|| (name[0] == 'E' && name[1] == 'N' && name[2] == 'V')
+			|| (name[0] == 'S' && name[1] == 'I' && name[2] == 'G'))
+#endif
+			 global = TRUE;
+		}
 		break;
-	    case 4:
-		if (name[0] == 'A' && name[1] == 'R' && name[2] == 'G'
-		    && name[3] == 'V')
-		    global = TRUE;
+	    case 4: {
+#if I32SIZE == 4
+		    const char ARGVs [4] = "ARGV";
+		    if (*(I32 *)name == *(I32 *)ARGVs)
+#else
+		    if (name[0] == 'A' && name[1] == 'R' && name[2] == 'G'
+			&& name[3] == 'V')
+#endif
+			global = TRUE;
+		}
 		break;
-	    case 5:
-		if (name[0] == 'S' && name[1] == 'T' && name[2] == 'D'
-		    && name[3] == 'I' && name[4] == 'N')
-		    global = TRUE;
+	    case 5: {
+#if I32SIZE == 4
+		    const char STDIs [4] = "STDI";
+		    if (*(I32 *)name == *(I32 *)STDIs && name[4] == 'N')
+#else
+		    if (name[0] == 'S' && name[1] == 'T' && name[2] == 'D'
+			&& name[3] == 'I' && name[4] == 'N')
+#endif
+			global = TRUE;
+		}
 		break;
-	    case 6:
-		if ((name[0] == 'S' && name[1] == 'T' && name[2] == 'D')
-		    &&((name[3] == 'O' && name[4] == 'U' && name[5] == 'T')
-		       ||(name[3] == 'E' && name[4] == 'R' && name[5] == 'R')))
-		    global = TRUE;
+	    case 6: { /* s2=string [part] 2 */
+#if I32SIZE == 4 && I16SIZE == 2
+		    const char STDOUTs1 [4] = "STDO";
+		    const char STDOUTs2 [2] = "UT";
+		    const char STDERRs1 [4] = "STDE";
+		    const char STDERRs2 [2] = "RR";
+		    if ((*(I32 *)name == *(I32 *)STDOUTs1 && *(I16 *)&name[4] == *(I16 *)STDOUTs2)
+			|| (*(I32 *)name == *(I32 *)STDERRs1 && *(I16 *)&name[4] == *(I16 *)STDERRs2))
+#else
+		    if ((name[0] == 'S' && name[1] == 'T' && name[2] == 'D')
+			&&((name[3] == 'O' && name[4] == 'U' && name[5] == 'T')
+			   ||(name[3] == 'E' && name[4] == 'R' && name[5] == 'R')))
+#endif
+			global = TRUE;
+		}
 		break;
-	    case 7:
-		if (name[0] == 'A' && name[1] == 'R' && name[2] == 'G'
-		    && name[3] == 'V' && name[4] == 'O' && name[5] == 'U'
-		    && name[6] == 'T')
-		    global = TRUE;
+	    case 7: {
+/* use only native machine i64s, not compiler emulated ones */
+#if defined(HAS_QUAD) && I64SIZE == 8 && PTRSIZE == 8
+		    const char ARGVOUTs [8] = "ARGVOUT\0";
+		    if (*(I64 *)name == *(I64 *)ARGVOUTs)
+#elif NVSIZE == 8
+		    /* least CPU and instruction set (X87, SSE2, etc) dependent way */
+		    if (*(NV *)name ==  *(NV *)"ARGVOUT")
+#elif I32SIZE == 4
+		    /* assume we have a term null even though len is 7 */
+		    const char ARGVOUTs1 [4] = "ARGV";
+		    const char ARGVOUTs2 [4] = "OUT\0";
+		    if (*(I32 *)name == *(I32 *)ARGVOUTs1 && *(I32 *)&name[4] == *(I32 *)ARGVOUTs2)
+#else
+		    if (name[0] == 'A' && name[1] == 'R' && name[2] == 'G'
+			&& name[3] == 'V' && name[4] == 'O' && name[5] == 'U'
+			&& name[6] == 'T')
+#endif
+			global = TRUE;
+		}
 		break;
 	    }
 
@@ -1623,6 +1678,9 @@ Perl_gv_fetchpvn_flags(pTHX_ const char *nambeg, STRLEN full_len, I32 flags,
 
     if (SvTYPE(gv) == SVt_PVGV) {
 	if (add) {
+#if I32SIZE == 4
+	    const char ISAs [4] = "ISA\0";
+#endif
 	    GvMULTI_on(gv);
 	    gv_init_svtype(gv, sv_type);
 	    if (len == 1 && stash == PL_defstash) {
@@ -1653,7 +1711,12 @@ Perl_gv_fetchpvn_flags(pTHX_ const char *nambeg, STRLEN full_len, I32 flags,
 	      }
 	    }
 	    else if (len == 3 && sv_type == SVt_PVAV
+#if I32SIZE == 4
+    /* assume we have a term null even though len is 3 */
+		  && *(I32 *)name == *(I32 *)ISAs
+#else
 	          && strnEQ(name, "ISA", 3)
+#endif
 	          && (!GvAV(gv) || !SvSMAGICAL(GvAV(gv))))
 		gv_magicalize_isa(gv);
 	}
@@ -1712,8 +1775,15 @@ Perl_gv_fetchpvn_flags(pTHX_ const char *nambeg, STRLEN full_len, I32 flags,
       try_core:
 	if (len > 1 /* shortest is uc */ && HvNAMELEN_get(stash) == 4) {
 	  /* Avoid null warning: */
+#if I32SIZE == 4
+	  const char COREs [4] = "CORE";
+#endif
 	  const char * const stashname = HvNAME(stash); assert(stashname);
+#if I32SIZE == 4
+	  if (*(I32 *)stashname == *(I32 *)COREs)
+#else
 	  if (strnEQ(stashname, "CORE", 4))
+#endif
 	    S_maybe_add_coresub(aTHX_ 0, gv, name, len);
 	}
     }
-- 
1.7.9.msysgit.0

p5pRT

p5pRT commented on Nov 15, 2012

@p5pRT
Author

@bulk88 - Status changed from 'new' to 'open'

p5pRT

p5pRT commented on Nov 16, 2012

@p5pRT
Author

From @tonycoz

On Thu, Nov 15, 2012 at 11​:48​:03AM -0800, bulk 88 via RT wrote​:

Forgot to attach.

---
via perlbug​: queue​: perl5 status​: new
https://rt-archive.perl.org/perl5/Ticket/Display.html?id=115746

  stash = GvHV(gv) = newHV();
  if (!HvNAME_get(stash)) {
+#if I32SIZE == 4
+ const char COREs [4] = "CORE";
+#endif
  if (GvSTASH(gv) == PL_defstash && len == 6
+#if I32SIZE == 4
+ && *(I32 *)name == *(I32 *)COREs)
+#else
  && strnEQ(name, "CORE", 4))
+#endif
  hv_name_set(stash, "CORE", 4, 0);
+#if I32SIZE == 4
+ /* assume we have a term null even though len is 3 */
+ const char INCs [4] = "INC\0";
+ const char ENVs [4] = "ENV\0";
+ const char SIGs [4] = "SIG\0";
+ if (*(I32 *)name == *(I32 *)INCs
+ || *(I32 *)name == *(I32 *)ENVs
+ || *(I32 *)name == *(I32 *)SIGs)
+#else
+

This code makes non-portable assumptions about alignment, or at least
about accessing unaligned memory as an I32.

Tony

p5pRT

p5pRT commented on Nov 16, 2012

@p5pRT
Author

From @khwilliamson

On 11/15/2012 05​:06 PM, Tony Cook wrote​:

On Thu, Nov 15, 2012 at 11​:48​:03AM -0800, bulk 88 via RT wrote​:

Forgot to attach.

---
via perlbug​: queue​: perl5 status​: new
https://rt-archive.perl.org/perl5/Ticket/Display.html?id=115746

                 stash = GvHV\(gv\) = newHV\(\);
                 if \(\!HvNAME\_get\(stash\)\) \{

+#if I32SIZE == 4
+ const char COREs [4] = "CORE";
+#endif
if (GvSTASH(gv) == PL_defstash && len == 6
+#if I32SIZE == 4
+ && *(I32 *)name == *(I32 *)COREs)
+#else
&& strnEQ(name, "CORE", 4))
+#endif
hv_name_set(stash, "CORE", 4, 0);
+#if I32SIZE == 4
+ /* assume we have a term null even though len is 3 */
+ const char INCs [4] = "INC\0";
+ const char ENVs [4] = "ENV\0";
+ const char SIGs [4] = "SIG\0";
+ if (*(I32 *)name == *(I32 *)INCs
+ || *(I32 *)name == *(I32 *)ENVs
+ || *(I32 *)name == *(I32 *)SIGs)
+#else
+

This code makes non-portable assumptions about alignment, or at least
about accessing unaligned memory as an I32.

Tony

This thread may have some bearing​:
http​://markmail.org/message/aspjihku5iaszw5q

p5pRT

p5pRT commented on Nov 16, 2012

@p5pRT
Author

From @bulk88

On Thu Nov 15 16​:34​:10 2012, public@​khwilliamson.com wrote​:

On 11/15/2012 05​:06 PM, Tony Cook wrote​:

This code makes non-portable assumptions about alignment, or at least
about accessing unaligned memory as an I32.

Tony

This thread may have some bearing​:
http​://markmail.org/message/aspjihku5iaszw5q

The description of U32_ALIGNMENT_REQUIRED sounds strange to me. It
implys that 32 bit casting and bitshift are required to manipulate
strings but that can't be the case. In Digest​::MD5 there is actual use
of U32_ALIGNMENT_REQUIRED, but Digest​::MD5's Makefile.PL defines it
using its own logic and with test compiles so it ignores config.h. So
globally, the U32_ALIGNMENT_REQUIRED define's description is either
wrong (probably), or Digest​::MD5's repurposing of it is wrong.

So is a new def required for what I want to do or not or is
U32_ALIGNMENT_REQUIRED the correct def to use for what I want to do?
Win32's config has U32_ALIGNMENT_REQUIRED defined, so I would need to
change that.

Another question is, what is faster? I did some benchmarking, and the
numbers are all over the place +/- 30% based on the sample data. Some
things that flip which is faster, if element "IOP" is 3 or 4 characters.
If there are elements after "IOP", O1 and O2, and 20 mil or 10 mil
iterations the results flip. Each one flips the result individually,
combined the results I think are random. One benchmark run untimed was
done to in theory preload the cache. I dont think there was a difference
between a run with the prerun and no prerun loop, or it was below the
per process run time noise. On O1 new func is 0x9D, old is 0x145. On O2
old is 0x148, new is 0xB8. Each test at 10 mil takes 1 to 2.5 secs. I
think the benchmark is flawed somehow (not random (branch predictor
optimizes away), sample data (arrays and strings) are in RO memory, not
enough never match strings/not enough matching strings, no runloop/cache
invalidation between runs) but I would need to play around with it more
to figure out exactly what is going on.

The reason I bring up the issue here is,
Perl_keyword/Devel​::Tokenizer​::C does the same thing that the old code
here does. Perl_keyword is the 3rd largest function in the interp at
~15-16KB 32bit x86. It could use the same modification that I did here
but on a larger scale.

______________________________________________________
void
StrEqTree()
PREINIT​:
  int i;
  int iarr;
#define W(x) x
  static const char * arr [] =
 
  {W("ARGVOUT"), W("ZOO"), W("ENV"), W("ARGV"), W("NO"), W("STDOUT"),
  W("_"), W("SIG"), W("INC"), W("*"),
W("FOO"), W("STDIN"),
  W("STDERR"), W("ARGV"), W("INC"),
W("STDOUT"), W("STDZER"),
  // W("IOP"), W("ZIP"), W("ENV"),
W("CAT"), W("ZOOF"), W("STDIN"),
  //W("ARGVOUT"), W("ENV"), W("ARGV"), W("STDOUT"),
  // W("_"), W("SIG"), W("INC"), W("*")
 
  };
#define W(x) sizeof(x)-1
  static const char arrlen [] =
  {W("ARGVOUT"), W("ZOO"), W("ENV"), W("ARGV"), W("NO"), W("STDOUT"),
  W("_"), W("SIG"), W("INC"), W("*"),
W("FOO"), W("STDIN"),
  W("STDERR"), W("ARGV"), W("INC"),
W("STDOUT"), W("STDZER"),
  // W("IOP"), W("ZIP"), W("ENV"),
W("CAT"), W("ZOOF"), W("STDIN"),
  //W("ARGVOUT"), W("ENV"), W("ARGV"), W("STDOUT"),
  // W("_"), W("SIG"), W("INC"), W("*")
 
  };
  LARGE_INTEGER my_beg;
  LARGE_INTEGER my_end;
  LARGE_INTEGER my_freq;
  int tc;
CODE​:
  QueryPerformanceFrequency(&my_freq);

  tc=0;
  for(i=0; i < 20000000; i++){
  for(iarr=0; iarr < sizeof(arrlen); iarr++){
  tc += streqold(arr[iarr], arrlen[iarr]);
  }
  }
 
  QueryPerformanceCounter(&my_beg);
  tc=0;
  for(i=0; i < 20000000; i++){
  for(iarr=0; iarr < sizeof(arrlen); iarr++){
  tc += streqnew(arr[iarr], arrlen[iarr]);
  }
  }
  QueryPerformanceCounter(&my_end);
  printf("new time=%f, opt=" OPTIMIZE " true=%i\n",
((double)(my_end.QuadPart-my_beg.QuadPart))/(double)my_freq.QuadPart, tc);
 
  QueryPerformanceCounter(&my_beg);
  tc=0;
  for(i=0; i < 10000000; i++){
  for(iarr=0; iarr < sizeof(arrlen); iarr++){
  tc += streqold(arr[iarr], arrlen[iarr]);
  }
  }
  QueryPerformanceCounter(&my_end);
  printf("old time=%f, opt=" OPTIMIZE " true=%i\n",
((double)(my_end.QuadPart-my_beg.QuadPart))/(double)my_freq.QuadPart, tc);
___________________________________________________________

bool streqold(const char * name, STRLEN len){
#undef NVSIZE
#undef I32SIZE
#define NVSIZE 9
#define I32SIZE 5
  bool global = FALSE;

  switch (len) {
  case 1​:
  if (*name == '_')
  global = TRUE;
  break;
  case 3​: {
#if I32SIZE == 4
  /* assume we have a term null even though len is 3 */
  const char INCs [4] = "INC\0";
  const char ENVs [4] = "ENV\0";
  const char SIGs [4] = "SIG\0";
  if (*(I32 *)name == *(I32 *)INCs
  || *(I32 *)name == *(I32 *)ENVs
  || *(I32 *)name == *(I32 *)SIGs)
#else
  if ((name[0] == 'I' && name[1] == 'N' && name[2] == 'C')
  || (name[0] == 'E' && name[1] == 'N' && name[2] == 'V')
  || (name[0] == 'S' && name[1] == 'I' && name[2] == 'G'))
#endif
  global = TRUE;
  }
  break;
  case 4​: {
#if I32SIZE == 4
  const char ARGVs [4] = "ARGV";
  if (*(I32 *)name == *(I32 *)ARGVs)
#else
  if (name[0] == 'A' && name[1] == 'R' && name[2] == 'G'
  && name[3] == 'V')
#endif
  global = TRUE;
  }
  break;
  case 5​: {
#if I32SIZE == 4
  const char STDIs [4] = "STDI";
  if (*(I32 *)name == *(I32 *)STDIs && name[4] == 'N')
#else
  if (name[0] == 'S' && name[1] == 'T' && name[2] == 'D'
  && name[3] == 'I' && name[4] == 'N')
#endif
  global = TRUE;
  }
  break;
  case 6​: { /* s2=string [part] 2 */
#if I32SIZE == 4 && I16SIZE == 2
  const char STDOUTs1 [4] = "STDO";
  const char STDOUTs2 [2] = "UT";
  const char STDERRs1 [4] = "STDE";
  const char STDERRs2 [2] = "RR";
  if ((*(I32 *)name == *(I32 *)STDOUTs1 && *(I16 *)&name[4] == *(I16
*)STDOUTs2)
  || (*(I32 *)name == *(I32 *)STDERRs1 && *(I16 *)&name[4] == *(I16
*)STDERRs2))
#else
  if ((name[0] == 'S' && name[1] == 'T' && name[2] == 'D')
  &&((name[3] == 'O' && name[4] == 'U' && name[5] == 'T')
  ||(name[3] == 'E' && name[4] == 'R' && name[5] == 'R')))
#endif
  global = TRUE;
  }
  break;
  case 7​: {
/* use only native machine i64s, not compiler emulated ones */
#if defined(HAS_QUAD) && I64SIZE == 8 && PTRSIZE == 8
  const char ARGVOUTs [8] = "ARGVOUT\0";
  if (*(I64 *)name == *(I64 *)ARGVOUTs)
#elif NVSIZE == 8
  /* least CPU and instruction set (X87, SSE2, etc) dependent way */
  if (*(NV *)name == *(NV *)"ARGVOUT")
#elif I32SIZE == 4
  /* assume we have a term null even though len is 7 */
  const char ARGVOUTs1 [4] = "ARGV";
  const char ARGVOUTs2 [4] = "OUT\0";
  if (*(I32 *)name == *(I32 *)ARGVOUTs1 && *(I32 *)&name[4] == *(I32
*)ARGVOUTs2)
#else
  if (name[0] == 'A' && name[1] == 'R' && name[2] == 'G'
  && name[3] == 'V' && name[4] == 'O' && name[5] == 'U'
  && name[6] == 'T')
#endif
  global = TRUE;
  }
  break;
  }
  return global;
#undef NVSIZE
#undef I32SIZE
#define NVSIZE 8
#define I32SIZE 4
}

bool streqnew(const char * name, STRLEN len){
  bool global = FALSE;

  switch (len) {
  case 1​:
  if (*name == '_')
  global = TRUE;
  break;
  case 3​: {
#if I32SIZE == 4
  /* assume we have a term null even though len is 3 */
  const char INCs [4] = "INC\0";
  const char ENVs [4] = "ENV\0";
  const char SIGs [4] = "SIG\0";
  if (*(I32 *)name == *(I32 *)INCs
  || *(I32 *)name == *(I32 *)ENVs
  || *(I32 *)name == *(I32 *)SIGs)
#else
  if ((name[0] == 'I' && name[1] == 'N' && name[2] == 'C')
  || (name[0] == 'E' && name[1] == 'N' && name[2] == 'V')
  || (name[0] == 'S' && name[1] == 'I' && name[2] == 'G'))
#endif
  global = TRUE;
  }
  break;
  case 4​: {
#if I32SIZE == 4
  const char ARGVs [4] = "ARGV";
  if (*(I32 *)name == *(I32 *)ARGVs)
#else
  if (name[0] == 'A' && name[1] == 'R' && name[2] == 'G'
  && name[3] == 'V')
#endif
  global = TRUE;
  }
  break;
  case 5​: {
#if I32SIZE == 4
  const char STDIs [4] = "STDI";
  if (*(I32 *)name == *(I32 *)STDIs && name[4] == 'N')
#else
  if (name[0] == 'S' && name[1] == 'T' && name[2] == 'D'
  && name[3] == 'I' && name[4] == 'N')
#endif
  global = TRUE;
  }
  break;
  case 6​: { /* s2=string [part] 2 */
#if I32SIZE == 4 && I16SIZE == 2
  const char STDOUTs1 [4] = "STDO";
  const char STDOUTs2 [2] = "UT";
  const char STDERRs1 [4] = "STDE";
  const char STDERRs2 [2] = "RR";
  if ((*(I32 *)name == *(I32 *)STDOUTs1 && *(I16 *)&name[4] == *(I16
*)STDOUTs2)
  || (*(I32 *)name == *(I32 *)STDERRs1 && *(I16 *)&name[4] == *(I16
*)STDERRs2))
#else
  if ((name[0] == 'S' && name[1] == 'T' && name[2] == 'D')
  &&((name[3] == 'O' && name[4] == 'U' && name[5] == 'T')
  ||(name[3] == 'E' && name[4] == 'R' && name[5] == 'R')))
#endif
  global = TRUE;
  }
  break;
  case 7​: {
/* use only native machine i64s, not compiler emulated ones */
#if defined(HAS_QUAD) && I64SIZE == 8 && PTRSIZE == 8
  const char ARGVOUTs [8] = "ARGVOUT\0";
  if (*(I64 *)name == *(I64 *)ARGVOUTs)
//#elif NVSIZE == 8
// /* least CPU and instruction set (X87, SSE2, etc) dependent way */
// if (*(NV *)name == *(NV *)"ARGVOUT")
#elif I32SIZE == 4
  /* assume we have a term null even though len is 7 */
  const char ARGVOUTs1 [4] = "ARGV";
  const char ARGVOUTs2 [4] = "OUT\0";
  if (*(I32 *)name == *(I32 *)ARGVOUTs1 && *(I32 *)&name[4] == *(I32
*)ARGVOUTs2)
#else
  if (name[0] == 'A' && name[1] == 'R' && name[2] == 'G'
  && name[3] == 'V' && name[4] == 'O' && name[5] == 'U'
  && name[6] == 'T')
#endif
  global = TRUE;
  }
  break;
  }
  return global;
}
____________________________________________________________

I won't post times yet since they are so unstable from the different
combination, that there is no clear winner, and no pattern to what is
faster, and the code is probably flawed as representing the "real world".

p5pRT

p5pRT commented on Nov 16, 2012

@p5pRT
Author

From @jandubois

On Fri, Nov 16, 2012 at 12​:33 PM, bulk 88 via RT
<perlbug-followup@​perl.org> wrote​:

The description of U32_ALIGNMENT_REQUIRED sounds strange to me. It
implys that 32 bit casting and bitshift are required to manipulate
strings but that can't be the case. In Digest​::MD5 there is actual use
of U32_ALIGNMENT_REQUIRED, but Digest​::MD5's Makefile.PL defines it
using its own logic and with test compiles so it ignores config.h. So
globally, the U32_ALIGNMENT_REQUIRED define's description is either
wrong (probably), or Digest​::MD5's repurposing of it is wrong.

Digest​::MD5 doesn't trust the config.h value because it can be
incorrect if Perl has been built on a different machine. I believe the
concrete example was Perl compiled on Solaris 8, which implicitly uses
-xmemalign=8i, but installed on Solaris 9, where cc uses -xmemalign=8s
by default. Compiling with 8i means that misaligned accesses are
trapped and interpreted, so U32_ALIGNMENT_REQUIRED is not set in
config.h.

When you then use this version of Perl on Solaris 9 and try to compile
the Digest​::MD5 module with the same compiler and compiler options, it
will now require alignment and generate SIGBUS errors for misaligned
access.

So to avoid future bug reports from users on Solaris 9+ running a Perl
built on Solaris 8, Digest​::MD5 doesn't trust config.h on this and
recomputes it at module build time.

Cheers,
-Jan

p5pRT

p5pRT commented on Nov 17, 2012

@p5pRT
Author

From @bulk88

This patch is also a syntax error in C++ because of cutting off the
(unnecessary) null char in the initializers so as is it can not be
applied. :-/ Other questions remain in this ticket without a conclusion.

p5pRT

p5pRT commented on Nov 17, 2012

@p5pRT
Author

From @bulk88

On Fri Nov 16 18​:31​:40 2012, bulk88 wrote​:

This patch is also a syntax error in C++ because of cutting off the
(unnecessary) null char in the initializers so as is it can not be
applied. :-/ Other questions remain in this ticket without a conclusion.

--
--
bulk88 ~ bulk88 at hotmail.com

The fix would be a INSTREQ_1,INSTREQ_2, INSTREQ_3, upto 8 I guess (after
that efficiency starts to go down hill on 64bit machine) macros since
variadic macros arent portable. That would also cut down in the large
amounts of ifdefs my patch has (and probably more if a solution is found
in this ticket).

p5pRT

p5pRT commented on Jul 30, 2013

@p5pRT
Author

From @tonycoz

On Fri Nov 16 18​:49​:50 2012, bulk88 wrote​:

On Fri Nov 16 18​:31​:40 2012, bulk88 wrote​:

This patch is also a syntax error in C++ because of cutting off the
(unnecessary) null char in the initializers so as is it can not be
applied. :-/ Other questions remain in this ticket without a conclusion.
The fix would be a INSTREQ_1,INSTREQ_2, INSTREQ_3, upto 8 I guess (after
that efficiency starts to go down hill on 64bit machine) macros since
variadic macros arent portable. That would also cut down in the large
amounts of ifdefs my patch has (and probably more if a solution is found
in this ticket).

The new code introduced does produce some nice generated code, but it's
unsafe (which could be worked around) and duplicative (which your macros
could fix.)

The main possible performance issue I can think of would be the extra
memory access (probably from the internal cache) required on platforms
that do allow unaligned memory acceses, potentially for both sides of
the comparison. This may explain the mixed results from benchmarking.

Where the name of the variable isn't one of the specials, the variable
name probably *won't* match on the first letter (except for _), so the
change won't produce a large performance win even ignoring the unaligned
accesses.

So I'm rejecting the patch on this ticket.

Tony

p5pRT

p5pRT commented on Oct 27, 2014

@p5pRT
Author

From @tonycoz

On Mon Jul 29 23​:26​:02 2013, tonyc wrote​:

On Fri Nov 16 18​:49​:50 2012, bulk88 wrote​:

On Fri Nov 16 18​:31​:40 2012, bulk88 wrote​:

This patch is also a syntax error in C++ because of cutting off the
(unnecessary) null char in the initializers so as is it can not be
applied. :-/ Other questions remain in this ticket without a conclusion.
The fix would be a INSTREQ_1,INSTREQ_2, INSTREQ_3, upto 8 I guess (after
that efficiency starts to go down hill on 64bit machine) macros since
variadic macros arent portable. That would also cut down in the large
amounts of ifdefs my patch has (and probably more if a solution is found
in this ticket).

The new code introduced does produce some nice generated code, but it's
unsafe (which could be worked around) and duplicative (which your macros
could fix.)

The main possible performance issue I can think of would be the extra
memory access (probably from the internal cache) required on platforms
that do allow unaligned memory acceses, potentially for both sides of
the comparison. This may explain the mixed results from benchmarking.

Where the name of the variable isn't one of the specials, the variable
name probably *won't* match on the first letter (except for _), so the
change won't produce a large performance win even ignoring the unaligned
accesses.

So I'm rejecting the patch on this ticket.

There's been no follow-up, so closing the ticket.

Tony

p5pRT

p5pRT commented on Oct 27, 2014

@p5pRT
Author

@tonycoz - Status changed from 'open' to 'rejected'

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

      Development

      No branches or pull requests

        Participants

        @p5pRT

        Issue actions

          [PATCH] clean up inline string comparisons in gv_fetchpvn_flags · Issue #12565 · Perl/perl5