[PATCH] clean up inline string comparisons in gv_fetchpvn_flags · Issue #12565

Author

From @bulk88

0001-clean-up-inline-string-comparisons-in-gv_fetchpvn_fl.patch

From 1dfbb7f8430fcfa337023294961fc863208d0cee Mon Sep 17 00:00:00 2001
From: Daniel Dragan <bulk88@hotmail.com>
Date: Thu, 15 Nov 2012 14:39:16 -0500
Subject: [PATCH] clean up inline string comparisons in gv_fetchpvn_flags

The tree of char comparisons is not optimized at all in GCC and Visual C
on 32 bit x86. The machine code is exactly as written in C with ==s and
&&s. This commit converts the per char comparison to larger units. From a
note in handy.h, none of Perl's fixed length integers are fixed length so
size checking macros were added and the old code left as an alternate.
To get from an odd string to an even string length, it is assumed that
name is null terminated. Each inline comparison is length checked before it
is done (existing code). The ARGVOUT NV comparison was quite difficult
since it is highly dependent on the CPU and instruction set and compiler.
Whether the compiler wants to load a pointer to NV or load a constant NV
is upto it, trying to force it became too platform specific (an const but
not static array of 2 I32s cast to a NV * was tried). This saved 0x50 to
0x100 bytes of instructions I think. A different implementation would have
been to allocate an 8 char array, zero it, strncpy, then do only
double/I64 comparisons against it rather than 8/4/2 byte int comparisons
done now.
---
 gv.c |  116 +++++++++++++++++++++++++++++++++++++++++++++++++++++-------------
 1 files changed, 93 insertions(+), 23 deletions(-)

diff --git a/gv.c b/gv.c
index 9de8886..a415c20 100644
--- a/gv.c
+++ b/gv.c
@@ -1468,8 +1468,15 @@ Perl_gv_fetchpvn_flags(pTHX_ const char *nambeg, STRLEN full_len, I32 flags,
 		{
 		    stash = GvHV(gv) = newHV();
 		    if (!HvNAME_get(stash)) {
+#if I32SIZE == 4
+			const char COREs [4] = "CORE";
+#endif
 			if (GvSTASH(gv) == PL_defstash && len == 6
+#if I32SIZE == 4
+			 && *(I32 *)name == *(I32 *)COREs)
+#else
 			 && strnEQ(name, "CORE", 4))
+#endif
 			    hv_name_set(stash, "CORE", 4, 0);
 			else
 			    hv_name_set(
@@ -1507,33 +1514,81 @@ Perl_gv_fetchpvn_flags(pTHX_ const char *nambeg, STRLEN full_len, I32 flags,
 		if (*name == '_')
 		    global = TRUE;
 		break;
-	    case 3:
-		if ((name[0] == 'I' && name[1] == 'N' && name[2] == 'C')
-		    || (name[0] == 'E' && name[1] == 'N' && name[2] == 'V')
-		    || (name[0] == 'S' && name[1] == 'I' && name[2] == 'G'))
-		    global = TRUE;
+	    case 3: {
+#if I32SIZE == 4
+		    /* assume we have a term null even though len is 3 */
+		    const char INCs [4] = "INC\0"; 
+		    const char ENVs [4] = "ENV\0";
+		    const char SIGs [4] = "SIG\0";
+		    if (*(I32 *)name == *(I32 *)INCs
+			|| *(I32 *)name == *(I32 *)ENVs
+			|| *(I32 *)name == *(I32 *)SIGs)
+#else
+		    if ((name[0] == 'I' && name[1] == 'N' && name[2] == 'C')
+			|| (name[0] == 'E' && name[1] == 'N' && name[2] == 'V')
+			|| (name[0] == 'S' && name[1] == 'I' && name[2] == 'G'))
+#endif
+			 global = TRUE;
+		}
 		break;
-	    case 4:
-		if (name[0] == 'A' && name[1] == 'R' && name[2] == 'G'
-		    && name[3] == 'V')
-		    global = TRUE;
+	    case 4: {
+#if I32SIZE == 4
+		    const char ARGVs [4] = "ARGV";
+		    if (*(I32 *)name == *(I32 *)ARGVs)
+#else
+		    if (name[0] == 'A' && name[1] == 'R' && name[2] == 'G'
+			&& name[3] == 'V')
+#endif
+			global = TRUE;
+		}
 		break;
-	    case 5:
-		if (name[0] == 'S' && name[1] == 'T' && name[2] == 'D'
-		    && name[3] == 'I' && name[4] == 'N')
-		    global = TRUE;
+	    case 5: {
+#if I32SIZE == 4
+		    const char STDIs [4] = "STDI";
+		    if (*(I32 *)name == *(I32 *)STDIs && name[4] == 'N')
+#else
+		    if (name[0] == 'S' && name[1] == 'T' && name[2] == 'D'
+			&& name[3] == 'I' && name[4] == 'N')
+#endif
+			global = TRUE;
+		}
 		break;
-	    case 6:
-		if ((name[0] == 'S' && name[1] == 'T' && name[2] == 'D')
-		    &&((name[3] == 'O' && name[4] == 'U' && name[5] == 'T')
-		       ||(name[3] == 'E' && name[4] == 'R' && name[5] == 'R')))
-		    global = TRUE;
+	    case 6: { /* s2=string [part] 2 */
+#if I32SIZE == 4 && I16SIZE == 2
+		    const char STDOUTs1 [4] = "STDO";
+		    const char STDOUTs2 [2] = "UT";
+		    const char STDERRs1 [4] = "STDE";
+		    const char STDERRs2 [2] = "RR";
+		    if ((*(I32 *)name == *(I32 *)STDOUTs1 && *(I16 *)&name[4] == *(I16 *)STDOUTs2)
+			|| (*(I32 *)name == *(I32 *)STDERRs1 && *(I16 *)&name[4] == *(I16 *)STDERRs2))
+#else
+		    if ((name[0] == 'S' && name[1] == 'T' && name[2] == 'D')
+			&&((name[3] == 'O' && name[4] == 'U' && name[5] == 'T')
+			   ||(name[3] == 'E' && name[4] == 'R' && name[5] == 'R')))
+#endif
+			global = TRUE;
+		}
 		break;
-	    case 7:
-		if (name[0] == 'A' && name[1] == 'R' && name[2] == 'G'
-		    && name[3] == 'V' && name[4] == 'O' && name[5] == 'U'
-		    && name[6] == 'T')
-		    global = TRUE;
+	    case 7: {
+/* use only native machine i64s, not compiler emulated ones */
+#if defined(HAS_QUAD) && I64SIZE == 8 && PTRSIZE == 8
+		    const char ARGVOUTs [8] = "ARGVOUT\0";
+		    if (*(I64 *)name == *(I64 *)ARGVOUTs)
+#elif NVSIZE == 8
+		    /* least CPU and instruction set (X87, SSE2, etc) dependent way */
+		    if (*(NV *)name ==  *(NV *)"ARGVOUT")
+#elif I32SIZE == 4
+		    /* assume we have a term null even though len is 7 */
+		    const char ARGVOUTs1 [4] = "ARGV";
+		    const char ARGVOUTs2 [4] = "OUT\0";
+		    if (*(I32 *)name == *(I32 *)ARGVOUTs1 && *(I32 *)&name[4] == *(I32 *)ARGVOUTs2)
+#else
+		    if (name[0] == 'A' && name[1] == 'R' && name[2] == 'G'
+			&& name[3] == 'V' && name[4] == 'O' && name[5] == 'U'
+			&& name[6] == 'T')
+#endif
+			global = TRUE;
+		}
 		break;
 	    }
 
@@ -1623,6 +1678,9 @@ Perl_gv_fetchpvn_flags(pTHX_ const char *nambeg, STRLEN full_len, I32 flags,
 
     if (SvTYPE(gv) == SVt_PVGV) {
 	if (add) {
+#if I32SIZE == 4
+	    const char ISAs [4] = "ISA\0";
+#endif
 	    GvMULTI_on(gv);
 	    gv_init_svtype(gv, sv_type);
 	    if (len == 1 && stash == PL_defstash) {
@@ -1653,7 +1711,12 @@ Perl_gv_fetchpvn_flags(pTHX_ const char *nambeg, STRLEN full_len, I32 flags,
 	      }
 	    }
 	    else if (len == 3 && sv_type == SVt_PVAV
+#if I32SIZE == 4
+    /* assume we have a term null even though len is 3 */
+		  && *(I32 *)name == *(I32 *)ISAs
+#else
 	          && strnEQ(name, "ISA", 3)
+#endif
 	          && (!GvAV(gv) || !SvSMAGICAL(GvAV(gv))))
 		gv_magicalize_isa(gv);
 	}
@@ -1712,8 +1775,15 @@ Perl_gv_fetchpvn_flags(pTHX_ const char *nambeg, STRLEN full_len, I32 flags,
       try_core:
 	if (len > 1 /* shortest is uc */ && HvNAMELEN_get(stash) == 4) {
 	  /* Avoid null warning: */
+#if I32SIZE == 4
+	  const char COREs [4] = "CORE";
+#endif
 	  const char * const stashname = HvNAME(stash); assert(stashname);
+#if I32SIZE == 4
+	  if (*(I32 *)stashname == *(I32 *)COREs)
+#else
 	  if (strnEQ(stashname, "CORE", 4))
+#endif
 	    S_maybe_add_coresub(aTHX_ 0, gv, name, len);
 	}
     }
-- 
1.7.9.msysgit.0

p5pRT

Author

@bulk88 - Status changed from 'new' to 'open'

p5pRT

Author

From @tonycoz

On Thu, Nov 15, 2012 at 11:48:03AM -0800, bulk 88 via RT wrote:

Forgot to attach.

---
via perlbug: queue: perl5 status: new
https://rt-archive.perl.org/perl5/Ticket/Display.html?id=115746

stash = GvHV(gv) = newHV();
if (!HvNAME_get(stash)) {
+#if I32SIZE == 4
+ const char COREs [4] = "CORE";
+#endif
if (GvSTASH(gv) == PL_defstash && len == 6
+#if I32SIZE == 4
+ && *(I32 *)name == *(I32 *)COREs)
+#else
&& strnEQ(name, "CORE", 4))
+#endif
hv_name_set(stash, "CORE", 4, 0);
+#if I32SIZE == 4
+ /* assume we have a term null even though len is 3 */
+ const char INCs [4] = "INC\0";
+ const char ENVs [4] = "ENV\0";
+ const char SIGs [4] = "SIG\0";
+ if (*(I32 *)name == *(I32 *)INCs
+ || *(I32 *)name == *(I32 *)ENVs
+ || *(I32 *)name == *(I32 *)SIGs)
+#else
+

This code makes non-portable assumptions about alignment, or at least
about accessing unaligned memory as an I32.

Tony

p5pRT

Author

From @khwilliamson

On 11/15/2012 05:06 PM, Tony Cook wrote:

On Thu, Nov 15, 2012 at 11:48:03AM -0800, bulk 88 via RT wrote:

Forgot to attach.

---
via perlbug: queue: perl5 status: new
https://rt-archive.perl.org/perl5/Ticket/Display.html?id=115746
                 stash = GvHV\(gv\) = newHV\(\);
                 if \(\!HvNAME\_get\(stash\)\) \{
+#if I32SIZE == 4
+ const char COREs [4] = "CORE";
+#endif
if (GvSTASH(gv) == PL_defstash && len == 6
+#if I32SIZE == 4
+ && *(I32 *)name == *(I32 *)COREs)
+#else
&& strnEQ(name, "CORE", 4))
+#endif
hv_name_set(stash, "CORE", 4, 0);
+#if I32SIZE == 4
+ /* assume we have a term null even though len is 3 */
+ const char INCs [4] = "INC\0";
+ const char ENVs [4] = "ENV\0";
+ const char SIGs [4] = "SIG\0";
+ if (*(I32 *)name == *(I32 *)INCs
+ || *(I32 *)name == *(I32 *)ENVs
+ || *(I32 *)name == *(I32 *)SIGs)
+#else
+

This code makes non-portable assumptions about alignment, or at least
about accessing unaligned memory as an I32.

Tony

This thread may have some bearing:
http://markmail.org/message/aspjihku5iaszw5q

p5pRT

Author

From @bulk88

On Thu Nov 15 16:34:10 2012, public@khwilliamson.com wrote:

On 11/15/2012 05:06 PM, Tony Cook wrote:

This code makes non-portable assumptions about alignment, or at least
about accessing unaligned memory as an I32.

Tony

This thread may have some bearing:
http://markmail.org/message/aspjihku5iaszw5q

The description of U32_ALIGNMENT_REQUIRED sounds strange to me. It
implys that 32 bit casting and bitshift are required to manipulate
strings but that can't be the case. In Digest::MD5 there is actual use
of U32_ALIGNMENT_REQUIRED, but Digest::MD5's Makefile.PL defines it
using its own logic and with test compiles so it ignores config.h. So
globally, the U32_ALIGNMENT_REQUIRED define's description is either
wrong (probably), or Digest::MD5's repurposing of it is wrong.

So is a new def required for what I want to do or not or is
U32_ALIGNMENT_REQUIRED the correct def to use for what I want to do?
Win32's config has U32_ALIGNMENT_REQUIRED defined, so I would need to
change that.

Another question is, what is faster? I did some benchmarking, and the
numbers are all over the place +/- 30% based on the sample data. Some
things that flip which is faster, if element "IOP" is 3 or 4 characters.
If there are elements after "IOP", O1 and O2, and 20 mil or 10 mil
iterations the results flip. Each one flips the result individually,
combined the results I think are random. One benchmark run untimed was
done to in theory preload the cache. I dont think there was a difference
between a run with the prerun and no prerun loop, or it was below the
per process run time noise. On O1 new func is 0x9D, old is 0x145. On O2
old is 0x148, new is 0xB8. Each test at 10 mil takes 1 to 2.5 secs. I
think the benchmark is flawed somehow (not random (branch predictor
optimizes away), sample data (arrays and strings) are in RO memory, not
enough never match strings/not enough matching strings, no runloop/cache
invalidation between runs) but I would need to play around with it more
to figure out exactly what is going on.

The reason I bring up the issue here is,
Perl_keyword/Devel::Tokenizer::C does the same thing that the old code
here does. Perl_keyword is the 3rd largest function in the interp at
~15-16KB 32bit x86. It could use the same modification that I did here
but on a larger scale.

______________________________________________________
void
StrEqTree()
PREINIT:
int i;
int iarr;
#define W(x) x
static const char * arr [] =

{W("ARGVOUT"), W("ZOO"), W("ENV"), W("ARGV"), W("NO"), W("STDOUT"),
W("_"), W("SIG"), W("INC"), W("*"),
W("FOO"), W("STDIN"),
W("STDERR"), W("ARGV"), W("INC"),
W("STDOUT"), W("STDZER"),
// W("IOP"), W("ZIP"), W("ENV"),
W("CAT"), W("ZOOF"), W("STDIN"),
//W("ARGVOUT"), W("ENV"), W("ARGV"), W("STDOUT"),
// W("_"), W("SIG"), W("INC"), W("*")

};
#define W(x) sizeof(x)-1
static const char arrlen [] =
{W("ARGVOUT"), W("ZOO"), W("ENV"), W("ARGV"), W("NO"), W("STDOUT"),
W("_"), W("SIG"), W("INC"), W("*"),
W("FOO"), W("STDIN"),
W("STDERR"), W("ARGV"), W("INC"),
W("STDOUT"), W("STDZER"),
// W("IOP"), W("ZIP"), W("ENV"),
W("CAT"), W("ZOOF"), W("STDIN"),
//W("ARGVOUT"), W("ENV"), W("ARGV"), W("STDOUT"),
// W("_"), W("SIG"), W("INC"), W("*")

};
LARGE_INTEGER my_beg;
LARGE_INTEGER my_end;
LARGE_INTEGER my_freq;
int tc;
CODE:
QueryPerformanceFrequency(&my_freq);

tc=0;
for(i=0; i < 20000000; i++){
for(iarr=0; iarr < sizeof(arrlen); iarr++){
tc += streqold(arr[iarr], arrlen[iarr]);
}
}

QueryPerformanceCounter(&my_beg);
tc=0;
for(i=0; i < 20000000; i++){
for(iarr=0; iarr < sizeof(arrlen); iarr++){
tc += streqnew(arr[iarr], arrlen[iarr]);
}
}
QueryPerformanceCounter(&my_end);
printf("new time=%f, opt=" OPTIMIZE " true=%i\n",
((double)(my_end.QuadPart-my_beg.QuadPart))/(double)my_freq.QuadPart, tc);

QueryPerformanceCounter(&my_beg);
tc=0;
for(i=0; i < 10000000; i++){
for(iarr=0; iarr < sizeof(arrlen); iarr++){
tc += streqold(arr[iarr], arrlen[iarr]);
}
}
QueryPerformanceCounter(&my_end);
printf("old time=%f, opt=" OPTIMIZE " true=%i\n",
((double)(my_end.QuadPart-my_beg.QuadPart))/(double)my_freq.QuadPart, tc);
___________________________________________________________

bool streqold(const char * name, STRLEN len){
#undef NVSIZE
#undef I32SIZE
#define NVSIZE 9
#define I32SIZE 5
bool global = FALSE;

switch (len) {
case 1:
if (*name == '_')
global = TRUE;
break;
case 3: {
#if I32SIZE == 4
/* assume we have a term null even though len is 3 */
const char INCs [4] = "INC\0";
const char ENVs [4] = "ENV\0";
const char SIGs [4] = "SIG\0";
if (*(I32 *)name == *(I32 *)INCs
|| *(I32 *)name == *(I32 *)ENVs
|| *(I32 *)name == *(I32 *)SIGs)
#else
if ((name[0] == 'I' && name[1] == 'N' && name[2] == 'C')
|| (name[0] == 'E' && name[1] == 'N' && name[2] == 'V')
|| (name[0] == 'S' && name[1] == 'I' && name[2] == 'G'))
#endif
global = TRUE;
}
break;
case 4: {
#if I32SIZE == 4
const char ARGVs [4] = "ARGV";
if (*(I32 *)name == *(I32 *)ARGVs)
#else
if (name[0] == 'A' && name[1] == 'R' && name[2] == 'G'
&& name[3] == 'V')
#endif
global = TRUE;
}
break;
case 5: {
#if I32SIZE == 4
const char STDIs [4] = "STDI";
if (*(I32 *)name == *(I32 *)STDIs && name[4] == 'N')
#else
if (name[0] == 'S' && name[1] == 'T' && name[2] == 'D'
&& name[3] == 'I' && name[4] == 'N')
#endif
global = TRUE;
}
break;
case 6: { /* s2=string [part] 2 */
#if I32SIZE == 4 && I16SIZE == 2
const char STDOUTs1 [4] = "STDO";
const char STDOUTs2 [2] = "UT";
const char STDERRs1 [4] = "STDE";
const char STDERRs2 [2] = "RR";
if ((*(I32 *)name == *(I32 *)STDOUTs1 && *(I16 *)&name[4] == *(I16
*)STDOUTs2)
|| (*(I32 *)name == *(I32 *)STDERRs1 && *(I16 *)&name[4] == *(I16
*)STDERRs2))
#else
if ((name[0] == 'S' && name[1] == 'T' && name[2] == 'D')
&&((name[3] == 'O' && name[4] == 'U' && name[5] == 'T')
||(name[3] == 'E' && name[4] == 'R' && name[5] == 'R')))
#endif
global = TRUE;
}
break;
case 7: {
/* use only native machine i64s, not compiler emulated ones */
#if defined(HAS_QUAD) && I64SIZE == 8 && PTRSIZE == 8
const char ARGVOUTs [8] = "ARGVOUT\0";
if (*(I64 *)name == *(I64 *)ARGVOUTs)
#elif NVSIZE == 8
/* least CPU and instruction set (X87, SSE2, etc) dependent way */
if (*(NV *)name == *(NV *)"ARGVOUT")
#elif I32SIZE == 4
/* assume we have a term null even though len is 7 */
const char ARGVOUTs1 [4] = "ARGV";
const char ARGVOUTs2 [4] = "OUT\0";
if (*(I32 *)name == *(I32 *)ARGVOUTs1 && *(I32 *)&name[4] == *(I32
*)ARGVOUTs2)
#else
if (name[0] == 'A' && name[1] == 'R' && name[2] == 'G'
&& name[3] == 'V' && name[4] == 'O' && name[5] == 'U'
&& name[6] == 'T')
#endif
global = TRUE;
}
break;
}
return global;
#undef NVSIZE
#undef I32SIZE
#define NVSIZE 8
#define I32SIZE 4
}

bool streqnew(const char * name, STRLEN len){
bool global = FALSE;

switch (len) {
case 1:
if (*name == '_')
global = TRUE;
break;
case 3: {
#if I32SIZE == 4
/* assume we have a term null even though len is 3 */
const char INCs [4] = "INC\0";
const char ENVs [4] = "ENV\0";
const char SIGs [4] = "SIG\0";
if (*(I32 *)name == *(I32 *)INCs
|| *(I32 *)name == *(I32 *)ENVs
|| *(I32 *)name == *(I32 *)SIGs)
#else
if ((name[0] == 'I' && name[1] == 'N' && name[2] == 'C')
|| (name[0] == 'E' && name[1] == 'N' && name[2] == 'V')
|| (name[0] == 'S' && name[1] == 'I' && name[2] == 'G'))
#endif
global = TRUE;
}
break;
case 4: {
#if I32SIZE == 4
const char ARGVs [4] = "ARGV";
if (*(I32 *)name == *(I32 *)ARGVs)
#else
if (name[0] == 'A' && name[1] == 'R' && name[2] == 'G'
&& name[3] == 'V')
#endif
global = TRUE;
}
break;
case 5: {
#if I32SIZE == 4
const char STDIs [4] = "STDI";
if (*(I32 *)name == *(I32 *)STDIs && name[4] == 'N')
#else
if (name[0] == 'S' && name[1] == 'T' && name[2] == 'D'
&& name[3] == 'I' && name[4] == 'N')
#endif
global = TRUE;
}
break;
case 6: { /* s2=string [part] 2 */
#if I32SIZE == 4 && I16SIZE == 2
const char STDOUTs1 [4] = "STDO";
const char STDOUTs2 [2] = "UT";
const char STDERRs1 [4] = "STDE";
const char STDERRs2 [2] = "RR";
if ((*(I32 *)name == *(I32 *)STDOUTs1 && *(I16 *)&name[4] == *(I16
*)STDOUTs2)
|| (*(I32 *)name == *(I32 *)STDERRs1 && *(I16 *)&name[4] == *(I16
*)STDERRs2))
#else
if ((name[0] == 'S' && name[1] == 'T' && name[2] == 'D')
&&((name[3] == 'O' && name[4] == 'U' && name[5] == 'T')
||(name[3] == 'E' && name[4] == 'R' && name[5] == 'R')))
#endif
global = TRUE;
}
break;
case 7: {
/* use only native machine i64s, not compiler emulated ones */
#if defined(HAS_QUAD) && I64SIZE == 8 && PTRSIZE == 8
const char ARGVOUTs [8] = "ARGVOUT\0";
if (*(I64 *)name == *(I64 *)ARGVOUTs)
//#elif NVSIZE == 8
// /* least CPU and instruction set (X87, SSE2, etc) dependent way */
// if (*(NV *)name == *(NV *)"ARGVOUT")
#elif I32SIZE == 4
/* assume we have a term null even though len is 7 */
const char ARGVOUTs1 [4] = "ARGV";
const char ARGVOUTs2 [4] = "OUT\0";
if (*(I32 *)name == *(I32 *)ARGVOUTs1 && *(I32 *)&name[4] == *(I32
*)ARGVOUTs2)
#else
if (name[0] == 'A' && name[1] == 'R' && name[2] == 'G'
&& name[3] == 'V' && name[4] == 'O' && name[5] == 'U'
&& name[6] == 'T')
#endif
global = TRUE;
}
break;
}
return global;
}
____________________________________________________________

I won't post times yet since they are so unstable from the different
combination, that there is no clear winner, and no pattern to what is
faster, and the code is probably flawed as representing the "real world".

p5pRT

Author

From @jandubois

On Fri, Nov 16, 2012 at 12:33 PM, bulk 88 via RT
<perlbug-followup@perl.org> wrote:

The description of U32_ALIGNMENT_REQUIRED sounds strange to me. It
implys that 32 bit casting and bitshift are required to manipulate
strings but that can't be the case. In Digest::MD5 there is actual use
of U32_ALIGNMENT_REQUIRED, but Digest::MD5's Makefile.PL defines it
using its own logic and with test compiles so it ignores config.h. So
globally, the U32_ALIGNMENT_REQUIRED define's description is either
wrong (probably), or Digest::MD5's repurposing of it is wrong.

Digest::MD5 doesn't trust the config.h value because it can be
incorrect if Perl has been built on a different machine. I believe the
concrete example was Perl compiled on Solaris 8, which implicitly uses
-xmemalign=8i, but installed on Solaris 9, where cc uses -xmemalign=8s
by default. Compiling with 8i means that misaligned accesses are
trapped and interpreted, so U32_ALIGNMENT_REQUIRED is not set in
config.h.

When you then use this version of Perl on Solaris 9 and try to compile
the Digest::MD5 module with the same compiler and compiler options, it
will now require alignment and generate SIGBUS errors for misaligned
access.

So to avoid future bug reports from users on Solaris 9+ running a Perl
built on Solaris 8, Digest::MD5 doesn't trust config.h on this and
recomputes it at module build time.

Cheers,
-Jan

p5pRT

Author

From @bulk88

This patch is also a syntax error in C++ because of cutting off the
(unnecessary) null char in the initializers so as is it can not be
applied. :-/ Other questions remain in this ticket without a conclusion.

p5pRT

Author

From @bulk88

On Fri Nov 16 18:31:40 2012, bulk88 wrote:

This patch is also a syntax error in C++ because of cutting off the
(unnecessary) null char in the initializers so as is it can not be
applied. :-/ Other questions remain in this ticket without a conclusion.

--
--
bulk88 ~ bulk88 at hotmail.com

The fix would be a INSTREQ_1,INSTREQ_2, INSTREQ_3, upto 8 I guess (after
that efficiency starts to go down hill on 64bit machine) macros since
variadic macros arent portable. That would also cut down in the large
amounts of ifdefs my patch has (and probably more if a solution is found
in this ticket).

p5pRT

Author

From @tonycoz

On Fri Nov 16 18:49:50 2012, bulk88 wrote:

On Fri Nov 16 18:31:40 2012, bulk88 wrote:

This patch is also a syntax error in C++ because of cutting off the
(unnecessary) null char in the initializers so as is it can not be
applied. :-/ Other questions remain in this ticket without a conclusion.
The fix would be a INSTREQ_1,INSTREQ_2, INSTREQ_3, upto 8 I guess (after
that efficiency starts to go down hill on 64bit machine) macros since
variadic macros arent portable. That would also cut down in the large
amounts of ifdefs my patch has (and probably more if a solution is found
in this ticket).

The new code introduced does produce some nice generated code, but it's
unsafe (which could be worked around) and duplicative (which your macros
could fix.)

The main possible performance issue I can think of would be the extra
memory access (probably from the internal cache) required on platforms
that do allow unaligned memory acceses, potentially for both sides of
the comparison. This may explain the mixed results from benchmarking.

Where the name of the variable isn't one of the specials, the variable
name probably *won't* match on the first letter (except for _), so the
change won't produce a large performance win even ignoring the unaligned
accesses.

So I'm rejecting the patch on this ticket.

Tony

p5pRT

Author

From @tonycoz

On Mon Jul 29 23:26:02 2013, tonyc wrote:

On Fri Nov 16 18:49:50 2012, bulk88 wrote:

On Fri Nov 16 18:31:40 2012, bulk88 wrote:

This patch is also a syntax error in C++ because of cutting off the
(unnecessary) null char in the initializers so as is it can not be
applied. :-/ Other questions remain in this ticket without a conclusion.
The fix would be a INSTREQ_1,INSTREQ_2, INSTREQ_3, upto 8 I guess (after
that efficiency starts to go down hill on 64bit machine) macros since
variadic macros arent portable. That would also cut down in the large
amounts of ifdefs my patch has (and probably more if a solution is found
in this ticket).

The new code introduced does produce some nice generated code, but it's
unsafe (which could be worked around) and duplicative (which your macros
could fix.)

The main possible performance issue I can think of would be the extra
memory access (probably from the internal cache) required on platforms
that do allow unaligned memory acceses, potentially for both sides of
the comparison. This may explain the mixed results from benchmarking.

Where the name of the variable isn't one of the specials, the variable
name probably *won't* match on the first letter (except for _), so the
change won't produce a large performance win even ignoring the unaligned
accesses.

So I'm rejecting the patch on this ticket.

There's been no follow-up, so closing the ticket.

Tony

p5pRT

Author

@tonycoz - Status changed from 'open' to 'rejected'

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[PATCH] clean up inline string comparisons in gv_fetchpvn_flags #12565

From @bulk88

Created by @bulk88

From @bulk88

From @bulk88

From @tonycoz

From @khwilliamson

From @bulk88

From @jandubois

From @bulk88

From @bulk88

From @tonycoz

From @tonycoz

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Participants

[PATCH] clean up inline string comparisons in gv_fetchpvn_flags #12565

Description

Activity

p5pRT commented on Nov 15, 2012

From @bulk88

Created by @bulk88

p5pRT commented on Nov 15, 2012

From @bulk88

p5pRT commented on Nov 15, 2012

From @bulk88

p5pRT commented on Nov 15, 2012

p5pRT commented on Nov 16, 2012

From @tonycoz

p5pRT commented on Nov 16, 2012

From @khwilliamson

p5pRT commented on Nov 16, 2012

From @bulk88

p5pRT commented on Nov 16, 2012

From @jandubois

p5pRT commented on Nov 17, 2012

From @bulk88

p5pRT commented on Nov 17, 2012

From @bulk88

p5pRT commented on Jul 30, 2013

From @tonycoz

p5pRT commented on Oct 27, 2014

From @tonycoz

p5pRT commented on Oct 27, 2014

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Participants

Issue actions