You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
On Thu Nov 15 16:34:10 2012, public@khwilliamson.com wrote:
On 11/15/2012 05:06 PM, Tony Cook wrote:
This code makes non-portable assumptions about alignment, or at least
about accessing unaligned memory as an I32.
Tony
This thread may have some bearing:
http://markmail.org/message/aspjihku5iaszw5q
The description of U32_ALIGNMENT_REQUIRED sounds strange to me. It
implys that 32 bit casting and bitshift are required to manipulate
strings but that can't be the case. In Digest::MD5 there is actual use
of U32_ALIGNMENT_REQUIRED, but Digest::MD5's Makefile.PL defines it
using its own logic and with test compiles so it ignores config.h. So
globally, the U32_ALIGNMENT_REQUIRED define's description is either
wrong (probably), or Digest::MD5's repurposing of it is wrong.
So is a new def required for what I want to do or not or is
U32_ALIGNMENT_REQUIRED the correct def to use for what I want to do?
Win32's config has U32_ALIGNMENT_REQUIRED defined, so I would need to
change that.
Another question is, what is faster? I did some benchmarking, and the
numbers are all over the place +/- 30% based on the sample data. Some
things that flip which is faster, if element "IOP" is 3 or 4 characters.
If there are elements after "IOP", O1 and O2, and 20 mil or 10 mil
iterations the results flip. Each one flips the result individually,
combined the results I think are random. One benchmark run untimed was
done to in theory preload the cache. I dont think there was a difference
between a run with the prerun and no prerun loop, or it was below the
per process run time noise. On O1 new func is 0x9D, old is 0x145. On O2
old is 0x148, new is 0xB8. Each test at 10 mil takes 1 to 2.5 secs. I
think the benchmark is flawed somehow (not random (branch predictor
optimizes away), sample data (arrays and strings) are in RO memory, not
enough never match strings/not enough matching strings, no runloop/cache
invalidation between runs) but I would need to play around with it more
to figure out exactly what is going on.
The reason I bring up the issue here is,
Perl_keyword/Devel::Tokenizer::C does the same thing that the old code
here does. Perl_keyword is the 3rd largest function in the interp at
~15-16KB 32bit x86. It could use the same modification that I did here
but on a larger scale.
______________________________________________________
void
StrEqTree()
PREINIT:
int i;
int iarr;
#define W(x) x
static const char * arr [] =
switch (len) {
case 1:
if (*name == '_')
global = TRUE;
break;
case 3: {
#if I32SIZE == 4
/* assume we have a term null even though len is 3 */
const char INCs [4] = "INC\0";
const char ENVs [4] = "ENV\0";
const char SIGs [4] = "SIG\0";
if (*(I32 *)name == *(I32 *)INCs
|| *(I32 *)name == *(I32 *)ENVs
|| *(I32 *)name == *(I32 *)SIGs)
#else
if ((name[0] == 'I' && name[1] == 'N' && name[2] == 'C')
|| (name[0] == 'E' && name[1] == 'N' && name[2] == 'V')
|| (name[0] == 'S' && name[1] == 'I' && name[2] == 'G'))
#endif
global = TRUE;
}
break;
case 4: {
#if I32SIZE == 4
const char ARGVs [4] = "ARGV";
if (*(I32 *)name == *(I32 *)ARGVs)
#else
if (name[0] == 'A' && name[1] == 'R' && name[2] == 'G'
&& name[3] == 'V')
#endif
global = TRUE;
}
break;
case 5: {
#if I32SIZE == 4
const char STDIs [4] = "STDI";
if (*(I32 *)name == *(I32 *)STDIs && name[4] == 'N')
#else
if (name[0] == 'S' && name[1] == 'T' && name[2] == 'D'
&& name[3] == 'I' && name[4] == 'N')
#endif
global = TRUE;
}
break;
case 6: { /* s2=string [part] 2 */
#if I32SIZE == 4 && I16SIZE == 2
const char STDOUTs1 [4] = "STDO";
const char STDOUTs2 [2] = "UT";
const char STDERRs1 [4] = "STDE";
const char STDERRs2 [2] = "RR";
if ((*(I32 *)name == *(I32 *)STDOUTs1 && *(I16 *)&name[4] == *(I16
*)STDOUTs2)
|| (*(I32 *)name == *(I32 *)STDERRs1 && *(I16 *)&name[4] == *(I16
*)STDERRs2))
#else
if ((name[0] == 'S' && name[1] == 'T' && name[2] == 'D')
&&((name[3] == 'O' && name[4] == 'U' && name[5] == 'T')
||(name[3] == 'E' && name[4] == 'R' && name[5] == 'R')))
#endif
global = TRUE;
}
break;
case 7: {
/* use only native machine i64s, not compiler emulated ones */
#if defined(HAS_QUAD) && I64SIZE == 8 && PTRSIZE == 8
const char ARGVOUTs [8] = "ARGVOUT\0";
if (*(I64 *)name == *(I64 *)ARGVOUTs)
//#elif NVSIZE == 8
// /* least CPU and instruction set (X87, SSE2, etc) dependent way */
// if (*(NV *)name == *(NV *)"ARGVOUT")
#elif I32SIZE == 4
/* assume we have a term null even though len is 7 */
const char ARGVOUTs1 [4] = "ARGV";
const char ARGVOUTs2 [4] = "OUT\0";
if (*(I32 *)name == *(I32 *)ARGVOUTs1 && *(I32 *)&name[4] == *(I32
*)ARGVOUTs2)
#else
if (name[0] == 'A' && name[1] == 'R' && name[2] == 'G'
&& name[3] == 'V' && name[4] == 'O' && name[5] == 'U'
&& name[6] == 'T')
#endif
global = TRUE;
}
break;
}
return global;
}
____________________________________________________________
I won't post times yet since they are so unstable from the different
combination, that there is no clear winner, and no pattern to what is
faster, and the code is probably flawed as representing the "real world".
On Fri, Nov 16, 2012 at 12:33 PM, bulk 88 via RT
<perlbug-followup@perl.org> wrote:
The description of U32_ALIGNMENT_REQUIRED sounds strange to me. It
implys that 32 bit casting and bitshift are required to manipulate
strings but that can't be the case. In Digest::MD5 there is actual use
of U32_ALIGNMENT_REQUIRED, but Digest::MD5's Makefile.PL defines it
using its own logic and with test compiles so it ignores config.h. So
globally, the U32_ALIGNMENT_REQUIRED define's description is either
wrong (probably), or Digest::MD5's repurposing of it is wrong.
Digest::MD5 doesn't trust the config.h value because it can be
incorrect if Perl has been built on a different machine. I believe the
concrete example was Perl compiled on Solaris 8, which implicitly uses
-xmemalign=8i, but installed on Solaris 9, where cc uses -xmemalign=8s
by default. Compiling with 8i means that misaligned accesses are
trapped and interpreted, so U32_ALIGNMENT_REQUIRED is not set in
config.h.
When you then use this version of Perl on Solaris 9 and try to compile
the Digest::MD5 module with the same compiler and compiler options, it
will now require alignment and generate SIGBUS errors for misaligned
access.
So to avoid future bug reports from users on Solaris 9+ running a Perl
built on Solaris 8, Digest::MD5 doesn't trust config.h on this and
recomputes it at module build time.
This patch is also a syntax error in C++ because of cutting off the
(unnecessary) null char in the initializers so as is it can not be
applied. :-/ Other questions remain in this ticket without a conclusion.
This patch is also a syntax error in C++ because of cutting off the
(unnecessary) null char in the initializers so as is it can not be
applied. :-/ Other questions remain in this ticket without a conclusion.
--
--
bulk88 ~ bulk88 at hotmail.com
The fix would be a INSTREQ_1,INSTREQ_2, INSTREQ_3, upto 8 I guess (after
that efficiency starts to go down hill on 64bit machine) macros since
variadic macros arent portable. That would also cut down in the large
amounts of ifdefs my patch has (and probably more if a solution is found
in this ticket).
This patch is also a syntax error in C++ because of cutting off the
(unnecessary) null char in the initializers so as is it can not be
applied. :-/ Other questions remain in this ticket without a conclusion.
The fix would be a INSTREQ_1,INSTREQ_2, INSTREQ_3, upto 8 I guess (after
that efficiency starts to go down hill on 64bit machine) macros since
variadic macros arent portable. That would also cut down in the large
amounts of ifdefs my patch has (and probably more if a solution is found
in this ticket).
The new code introduced does produce some nice generated code, but it's
unsafe (which could be worked around) and duplicative (which your macros
could fix.)
The main possible performance issue I can think of would be the extra
memory access (probably from the internal cache) required on platforms
that do allow unaligned memory acceses, potentially for both sides of
the comparison. This may explain the mixed results from benchmarking.
Where the name of the variable isn't one of the specials, the variable
name probably *won't* match on the first letter (except for _), so the
change won't produce a large performance win even ignoring the unaligned
accesses.
This patch is also a syntax error in C++ because of cutting off the
(unnecessary) null char in the initializers so as is it can not be
applied. :-/ Other questions remain in this ticket without a conclusion.
The fix would be a INSTREQ_1,INSTREQ_2, INSTREQ_3, upto 8 I guess (after
that efficiency starts to go down hill on 64bit machine) macros since
variadic macros arent portable. That would also cut down in the large
amounts of ifdefs my patch has (and probably more if a solution is found
in this ticket).
The new code introduced does produce some nice generated code, but it's
unsafe (which could be worked around) and duplicative (which your macros
could fix.)
The main possible performance issue I can think of would be the extra
memory access (probably from the internal cache) required on platforms
that do allow unaligned memory acceses, potentially for both sides of
the comparison. This may explain the mixed results from benchmarking.
Where the name of the variable isn't one of the specials, the variable
name probably *won't* match on the first letter (except for _), so the
change won't produce a large performance win even ignoring the unaligned
accesses.
Activity
p5pRT commentedon Nov 15, 2012
From @bulk88
Created by @bulk88
See attached patch. There is a small chance of this code not working on
big endian and on non-ASCII platforms. I have neither to test this with.
Perl Info
p5pRT commentedon Nov 15, 2012
From @bulk88
Forgot to attach.
p5pRT commentedon Nov 15, 2012
From @bulk88
0001-clean-up-inline-string-comparisons-in-gv_fetchpvn_fl.patch
p5pRT commentedon Nov 15, 2012
@bulk88 - Status changed from 'new' to 'open'
p5pRT commentedon Nov 16, 2012
From @tonycoz
On Thu, Nov 15, 2012 at 11:48:03AM -0800, bulk 88 via RT wrote:
stash = GvHV(gv) = newHV();
if (!HvNAME_get(stash)) {
+#if I32SIZE == 4
+ const char COREs [4] = "CORE";
+#endif
if (GvSTASH(gv) == PL_defstash && len == 6
+#if I32SIZE == 4
+ && *(I32 *)name == *(I32 *)COREs)
+#else
&& strnEQ(name, "CORE", 4))
+#endif
hv_name_set(stash, "CORE", 4, 0);
+#if I32SIZE == 4
+ /* assume we have a term null even though len is 3 */
+ const char INCs [4] = "INC\0";
+ const char ENVs [4] = "ENV\0";
+ const char SIGs [4] = "SIG\0";
+ if (*(I32 *)name == *(I32 *)INCs
+ || *(I32 *)name == *(I32 *)ENVs
+ || *(I32 *)name == *(I32 *)SIGs)
+#else
+
This code makes non-portable assumptions about alignment, or at least
about accessing unaligned memory as an I32.
Tony
p5pRT commentedon Nov 16, 2012
From @khwilliamson
On 11/15/2012 05:06 PM, Tony Cook wrote:
This thread may have some bearing:
http://markmail.org/message/aspjihku5iaszw5q
p5pRT commentedon Nov 16, 2012
From @bulk88
On Thu Nov 15 16:34:10 2012, public@khwilliamson.com wrote:
The description of U32_ALIGNMENT_REQUIRED sounds strange to me. It
implys that 32 bit casting and bitshift are required to manipulate
strings but that can't be the case. In Digest::MD5 there is actual use
of U32_ALIGNMENT_REQUIRED, but Digest::MD5's Makefile.PL defines it
using its own logic and with test compiles so it ignores config.h. So
globally, the U32_ALIGNMENT_REQUIRED define's description is either
wrong (probably), or Digest::MD5's repurposing of it is wrong.
So is a new def required for what I want to do or not or is
U32_ALIGNMENT_REQUIRED the correct def to use for what I want to do?
Win32's config has U32_ALIGNMENT_REQUIRED defined, so I would need to
change that.
Another question is, what is faster? I did some benchmarking, and the
numbers are all over the place +/- 30% based on the sample data. Some
things that flip which is faster, if element "IOP" is 3 or 4 characters.
If there are elements after "IOP", O1 and O2, and 20 mil or 10 mil
iterations the results flip. Each one flips the result individually,
combined the results I think are random. One benchmark run untimed was
done to in theory preload the cache. I dont think there was a difference
between a run with the prerun and no prerun loop, or it was below the
per process run time noise. On O1 new func is 0x9D, old is 0x145. On O2
old is 0x148, new is 0xB8. Each test at 10 mil takes 1 to 2.5 secs. I
think the benchmark is flawed somehow (not random (branch predictor
optimizes away), sample data (arrays and strings) are in RO memory, not
enough never match strings/not enough matching strings, no runloop/cache
invalidation between runs) but I would need to play around with it more
to figure out exactly what is going on.
The reason I bring up the issue here is,
Perl_keyword/Devel::Tokenizer::C does the same thing that the old code
here does. Perl_keyword is the 3rd largest function in the interp at
~15-16KB 32bit x86. It could use the same modification that I did here
but on a larger scale.
______________________________________________________
void
StrEqTree()
PREINIT:
int i;
int iarr;
#define W(x) x
static const char * arr [] =
{W("ARGVOUT"), W("ZOO"), W("ENV"), W("ARGV"), W("NO"), W("STDOUT"),
W("_"), W("SIG"), W("INC"), W("*"),
W("FOO"), W("STDIN"),
W("STDERR"), W("ARGV"), W("INC"),
W("STDOUT"), W("STDZER"),
// W("IOP"), W("ZIP"), W("ENV"),
W("CAT"), W("ZOOF"), W("STDIN"),
//W("ARGVOUT"), W("ENV"), W("ARGV"), W("STDOUT"),
// W("_"), W("SIG"), W("INC"), W("*")
};
#define W(x) sizeof(x)-1
static const char arrlen [] =
{W("ARGVOUT"), W("ZOO"), W("ENV"), W("ARGV"), W("NO"), W("STDOUT"),
W("_"), W("SIG"), W("INC"), W("*"),
W("FOO"), W("STDIN"),
W("STDERR"), W("ARGV"), W("INC"),
W("STDOUT"), W("STDZER"),
// W("IOP"), W("ZIP"), W("ENV"),
W("CAT"), W("ZOOF"), W("STDIN"),
//W("ARGVOUT"), W("ENV"), W("ARGV"), W("STDOUT"),
// W("_"), W("SIG"), W("INC"), W("*")
};
LARGE_INTEGER my_beg;
LARGE_INTEGER my_end;
LARGE_INTEGER my_freq;
int tc;
CODE:
QueryPerformanceFrequency(&my_freq);
tc=0;
for(i=0; i < 20000000; i++){
for(iarr=0; iarr < sizeof(arrlen); iarr++){
tc += streqold(arr[iarr], arrlen[iarr]);
}
}
QueryPerformanceCounter(&my_beg);
tc=0;
for(i=0; i < 20000000; i++){
for(iarr=0; iarr < sizeof(arrlen); iarr++){
tc += streqnew(arr[iarr], arrlen[iarr]);
}
}
QueryPerformanceCounter(&my_end);
printf("new time=%f, opt=" OPTIMIZE " true=%i\n",
((double)(my_end.QuadPart-my_beg.QuadPart))/(double)my_freq.QuadPart, tc);
QueryPerformanceCounter(&my_beg);
tc=0;
for(i=0; i < 10000000; i++){
for(iarr=0; iarr < sizeof(arrlen); iarr++){
tc += streqold(arr[iarr], arrlen[iarr]);
}
}
QueryPerformanceCounter(&my_end);
printf("old time=%f, opt=" OPTIMIZE " true=%i\n",
((double)(my_end.QuadPart-my_beg.QuadPart))/(double)my_freq.QuadPart, tc);
___________________________________________________________
bool streqold(const char * name, STRLEN len){
#undef NVSIZE
#undef I32SIZE
#define NVSIZE 9
#define I32SIZE 5
bool global = FALSE;
switch (len) {
case 1:
if (*name == '_')
global = TRUE;
break;
case 3: {
#if I32SIZE == 4
/* assume we have a term null even though len is 3 */
const char INCs [4] = "INC\0";
const char ENVs [4] = "ENV\0";
const char SIGs [4] = "SIG\0";
if (*(I32 *)name == *(I32 *)INCs
|| *(I32 *)name == *(I32 *)ENVs
|| *(I32 *)name == *(I32 *)SIGs)
#else
if ((name[0] == 'I' && name[1] == 'N' && name[2] == 'C')
|| (name[0] == 'E' && name[1] == 'N' && name[2] == 'V')
|| (name[0] == 'S' && name[1] == 'I' && name[2] == 'G'))
#endif
global = TRUE;
}
break;
case 4: {
#if I32SIZE == 4
const char ARGVs [4] = "ARGV";
if (*(I32 *)name == *(I32 *)ARGVs)
#else
if (name[0] == 'A' && name[1] == 'R' && name[2] == 'G'
&& name[3] == 'V')
#endif
global = TRUE;
}
break;
case 5: {
#if I32SIZE == 4
const char STDIs [4] = "STDI";
if (*(I32 *)name == *(I32 *)STDIs && name[4] == 'N')
#else
if (name[0] == 'S' && name[1] == 'T' && name[2] == 'D'
&& name[3] == 'I' && name[4] == 'N')
#endif
global = TRUE;
}
break;
case 6: { /* s2=string [part] 2 */
#if I32SIZE == 4 && I16SIZE == 2
const char STDOUTs1 [4] = "STDO";
const char STDOUTs2 [2] = "UT";
const char STDERRs1 [4] = "STDE";
const char STDERRs2 [2] = "RR";
if ((*(I32 *)name == *(I32 *)STDOUTs1 && *(I16 *)&name[4] == *(I16
*)STDOUTs2)
|| (*(I32 *)name == *(I32 *)STDERRs1 && *(I16 *)&name[4] == *(I16
*)STDERRs2))
#else
if ((name[0] == 'S' && name[1] == 'T' && name[2] == 'D')
&&((name[3] == 'O' && name[4] == 'U' && name[5] == 'T')
||(name[3] == 'E' && name[4] == 'R' && name[5] == 'R')))
#endif
global = TRUE;
}
break;
case 7: {
/* use only native machine i64s, not compiler emulated ones */
#if defined(HAS_QUAD) && I64SIZE == 8 && PTRSIZE == 8
const char ARGVOUTs [8] = "ARGVOUT\0";
if (*(I64 *)name == *(I64 *)ARGVOUTs)
#elif NVSIZE == 8
/* least CPU and instruction set (X87, SSE2, etc) dependent way */
if (*(NV *)name == *(NV *)"ARGVOUT")
#elif I32SIZE == 4
/* assume we have a term null even though len is 7 */
const char ARGVOUTs1 [4] = "ARGV";
const char ARGVOUTs2 [4] = "OUT\0";
if (*(I32 *)name == *(I32 *)ARGVOUTs1 && *(I32 *)&name[4] == *(I32
*)ARGVOUTs2)
#else
if (name[0] == 'A' && name[1] == 'R' && name[2] == 'G'
&& name[3] == 'V' && name[4] == 'O' && name[5] == 'U'
&& name[6] == 'T')
#endif
global = TRUE;
}
break;
}
return global;
#undef NVSIZE
#undef I32SIZE
#define NVSIZE 8
#define I32SIZE 4
}
bool streqnew(const char * name, STRLEN len){
bool global = FALSE;
switch (len) {
case 1:
if (*name == '_')
global = TRUE;
break;
case 3: {
#if I32SIZE == 4
/* assume we have a term null even though len is 3 */
const char INCs [4] = "INC\0";
const char ENVs [4] = "ENV\0";
const char SIGs [4] = "SIG\0";
if (*(I32 *)name == *(I32 *)INCs
|| *(I32 *)name == *(I32 *)ENVs
|| *(I32 *)name == *(I32 *)SIGs)
#else
if ((name[0] == 'I' && name[1] == 'N' && name[2] == 'C')
|| (name[0] == 'E' && name[1] == 'N' && name[2] == 'V')
|| (name[0] == 'S' && name[1] == 'I' && name[2] == 'G'))
#endif
global = TRUE;
}
break;
case 4: {
#if I32SIZE == 4
const char ARGVs [4] = "ARGV";
if (*(I32 *)name == *(I32 *)ARGVs)
#else
if (name[0] == 'A' && name[1] == 'R' && name[2] == 'G'
&& name[3] == 'V')
#endif
global = TRUE;
}
break;
case 5: {
#if I32SIZE == 4
const char STDIs [4] = "STDI";
if (*(I32 *)name == *(I32 *)STDIs && name[4] == 'N')
#else
if (name[0] == 'S' && name[1] == 'T' && name[2] == 'D'
&& name[3] == 'I' && name[4] == 'N')
#endif
global = TRUE;
}
break;
case 6: { /* s2=string [part] 2 */
#if I32SIZE == 4 && I16SIZE == 2
const char STDOUTs1 [4] = "STDO";
const char STDOUTs2 [2] = "UT";
const char STDERRs1 [4] = "STDE";
const char STDERRs2 [2] = "RR";
if ((*(I32 *)name == *(I32 *)STDOUTs1 && *(I16 *)&name[4] == *(I16
*)STDOUTs2)
|| (*(I32 *)name == *(I32 *)STDERRs1 && *(I16 *)&name[4] == *(I16
*)STDERRs2))
#else
if ((name[0] == 'S' && name[1] == 'T' && name[2] == 'D')
&&((name[3] == 'O' && name[4] == 'U' && name[5] == 'T')
||(name[3] == 'E' && name[4] == 'R' && name[5] == 'R')))
#endif
global = TRUE;
}
break;
case 7: {
/* use only native machine i64s, not compiler emulated ones */
#if defined(HAS_QUAD) && I64SIZE == 8 && PTRSIZE == 8
const char ARGVOUTs [8] = "ARGVOUT\0";
if (*(I64 *)name == *(I64 *)ARGVOUTs)
//#elif NVSIZE == 8
// /* least CPU and instruction set (X87, SSE2, etc) dependent way */
// if (*(NV *)name == *(NV *)"ARGVOUT")
#elif I32SIZE == 4
/* assume we have a term null even though len is 7 */
const char ARGVOUTs1 [4] = "ARGV";
const char ARGVOUTs2 [4] = "OUT\0";
if (*(I32 *)name == *(I32 *)ARGVOUTs1 && *(I32 *)&name[4] == *(I32
*)ARGVOUTs2)
#else
if (name[0] == 'A' && name[1] == 'R' && name[2] == 'G'
&& name[3] == 'V' && name[4] == 'O' && name[5] == 'U'
&& name[6] == 'T')
#endif
global = TRUE;
}
break;
}
return global;
}
____________________________________________________________
I won't post times yet since they are so unstable from the different
combination, that there is no clear winner, and no pattern to what is
faster, and the code is probably flawed as representing the "real world".
p5pRT commentedon Nov 16, 2012
From @jandubois
On Fri, Nov 16, 2012 at 12:33 PM, bulk 88 via RT
<perlbug-followup@perl.org> wrote:
Digest::MD5 doesn't trust the config.h value because it can be
incorrect if Perl has been built on a different machine. I believe the
concrete example was Perl compiled on Solaris 8, which implicitly uses
-xmemalign=8i, but installed on Solaris 9, where cc uses -xmemalign=8s
by default. Compiling with 8i means that misaligned accesses are
trapped and interpreted, so U32_ALIGNMENT_REQUIRED is not set in
config.h.
When you then use this version of Perl on Solaris 9 and try to compile
the Digest::MD5 module with the same compiler and compiler options, it
will now require alignment and generate SIGBUS errors for misaligned
access.
So to avoid future bug reports from users on Solaris 9+ running a Perl
built on Solaris 8, Digest::MD5 doesn't trust config.h on this and
recomputes it at module build time.
Cheers,
-Jan
p5pRT commentedon Nov 17, 2012
From @bulk88
This patch is also a syntax error in C++ because of cutting off the
(unnecessary) null char in the initializers so as is it can not be
applied. :-/ Other questions remain in this ticket without a conclusion.
p5pRT commentedon Nov 17, 2012
From @bulk88
On Fri Nov 16 18:31:40 2012, bulk88 wrote:
--
--
bulk88 ~ bulk88 at hotmail.com
The fix would be a INSTREQ_1,INSTREQ_2, INSTREQ_3, upto 8 I guess (after
that efficiency starts to go down hill on 64bit machine) macros since
variadic macros arent portable. That would also cut down in the large
amounts of ifdefs my patch has (and probably more if a solution is found
in this ticket).
p5pRT commentedon Jul 30, 2013
From @tonycoz
On Fri Nov 16 18:49:50 2012, bulk88 wrote:
The new code introduced does produce some nice generated code, but it's
unsafe (which could be worked around) and duplicative (which your macros
could fix.)
The main possible performance issue I can think of would be the extra
memory access (probably from the internal cache) required on platforms
that do allow unaligned memory acceses, potentially for both sides of
the comparison. This may explain the mixed results from benchmarking.
Where the name of the variable isn't one of the specials, the variable
name probably *won't* match on the first letter (except for _), so the
change won't produce a large performance win even ignoring the unaligned
accesses.
So I'm rejecting the patch on this ticket.
Tony
p5pRT commentedon Oct 27, 2014
From @tonycoz
On Mon Jul 29 23:26:02 2013, tonyc wrote:
There's been no follow-up, so closing the ticket.
Tony
p5pRT commentedon Oct 27, 2014
@tonycoz - Status changed from 'open' to 'rejected'