proof of concept/performance test for use float #17831

tonycoz · 2020-06-02T06:10:01Z

This is an attempt at #17813

I tested performance with a simple mandelbrot set generator (on my old CPU):

tony@mars:.../git/perl2$ time ./perl -Ilib ../mandel.pl

real    0m25.752s
user    0m25.612s
sys     0m0.132s
tony@mars:.../git/perl2$ time ./perl -Ilib -Mfeature=float ../mandel.pl

real    0m19.751s
user    0m19.742s
sys     0m0.004s

tonycoz · 2020-08-03T06:42:14Z

I see two upvotes - did anyone else try benchmarking this on more useful code?

I ask to see if it's worth developing this further.

I implemented this as a feature, but it doesn't really belong there since it's not a language feature as such, it shouldn't be enabled by a feature version bundle.

I'm hesitant to use a hints bit since we're fairly short on them.

Simply using an entry in %^H has the same problems that it did for indirect feature before features were cached in cop_features - we'd be adding a hash lookup for every binop or unop generated.

Maybe it could be implemented as a feature, but not included in the all feature set, and not documented in feature.pm.

richardleach · 2020-08-05T07:16:32Z

This still seems worthwhile to me, but non of my useful code really uses float math so nothing handy to benchmark.

atoomic · 2020-08-05T17:06:03Z

note that we recovered some hint bits with 5d17394
so maybe it s fine to steal one bit for float?

I've not tested/benchmarked this on other code.

tonycoz · 2020-09-15T05:01:36Z

note that we recovered some hint bits with 5d17394
so maybe it s fine to steal one bit for float?

That recovered only a single bit which is now assigned to the feature mask, where it belongs.

Maybe we just need another hints word.

jkeenan · 2021-01-26T01:45:03Z

@tonycoz , @richardleach , @atoomic, Can we get an update on the status of this p.r.?

Thank you very much.
Jim Keenan

tonycoz · 2021-01-26T22:30:51Z

It's waiting on (likely) adding another hints word.

But I think that needs to wait on reducing the cost of COPs which those are embedded into.

Right now a COP is generated for every statement, but the information in each COP typically doesn't change much except for the line number. I've looked at adding an alternative COP which only has a line number, but this will break some backward compatibility at the XS level.

richardleach · 2022-09-02T13:52:40Z

I noticed that the regular versions of these functions do:

+      TARGn(left * right, 0);
+      SETs( TARG );

rather than:
- SETn( left * right );
to try harder to avoid calling sv_setnv_mg.

demerphq · 2022-09-02T15:03:31Z

On Tue, 26 Jan 2021 at 23:32, Tony Cook ***@***.***> wrote: It's waiting on (likely) adding another hints word. But I think that needs to wait on reducing the cost of COPs which those are embedded into. Right now a COP is generated for every statement, but the information in each COP typically doesn't change much except for the line number. I've looked at adding an alternative COP which only has a line number, but this will break some backward compatibility at the XS level.

I'd like to hear more about this as it aligns with my interest in improving the quality of our error messages. If can do any legwork here id be happy to hear an appraisal of the problem to get started with. Just mail me personally. You know where. :-) Yves

…

-- perl -Mre=debug -e "/just|another|perl|hacker/"

tonycoz · 2022-09-05T01:33:12Z

On Tue, 26 Jan 2021 at 23:32, Tony Cook @.***> wrote: It's waiting on (likely) adding another hints word. But I think that needs to wait on reducing the cost of COPs which those are embedded into. Right now a COP is generated for every statement, but the information in each COP typically doesn't change much except for the line number. I've looked at adding an alternative COP which only has a line number, but this will break some backward compatibility at the XS level.
I'd like to hear more about this as it aligns with my interest in improving the quality of our error messages. If can do any legwork here id be happy to hear an appraisal of the problem to get started with. Just mail me personally. You know where. :-) Yves

I've stalled on this a bit (error: stack overflow), but I did get a "small COP" large implemented and I don't remember getting any crashes. I still needed to update caller() to understand the new COPs.

There may have been other problems though, I wasn't comfortable with the way I was detecting whether a small COP was possible, eg with code like:

line1;
line2;
if (...) { #line3
   line4;
   line5;
   no strict '...';
   line7;
}
line9;
line10;

lines 1, 4, 7, 9 needed full COPs, and I hadn't gotten to the point of checking that was happening when it should.

Even without adding a small COP we could improve memory usage a great deal by reference counting cop_warnings, and I think cop_file on threads, these are profligate users of memory - each cop has it's own copy.

demerphq · 2022-10-26T06:52:41Z

Even without adding a small COP we could improve memory usage a great deal by reference counting cop_warnings, and I think cop_file on threads, these are profligate users of memory - each cop has it's own copy.

In theory it should be pretty easy to use PL_strtab to do that if they are write-once. I will take a look. Do you have a branch for your small cop work?

tonycoz · 2022-10-26T22:26:11Z

Do you have a branch for your small cop work?

It's very hacky and incomplete (and probably just plain broken), but https://github.com/Perl/perl5/tree/tonyc/less-cop

demerphq · 2022-10-27T06:05:27Z

On Thu, 27 Oct 2022 at 00:26, Tony Cook ***@***.***> wrote: Do you have a branch for your small cop work? It's very hacky and incomplete (and probably just plain broken), but https://github.com/Perl/perl5/tree/tonyc/less-cop

Nice, for what its worth ive been looking at replacing cop_file with a HEK. Which would allow the same code to be used to share the pv threads or otherwise. Yves

…

-- perl -Mre=debug -e "/just|another|perl|hacker/"

iabyn · 2022-11-07T12:16:52Z

On Sun, Sep 04, 2022 at 06:33:28PM -0700, Tony Cook wrote: Even without adding a small COP we could improve memory usage a great deal by reference counting cop_warnings, and I think cop_file on threads, these are profligate users of memory - each cop has it's own copy.

An alternative approach perhaps would be to move most of the COP fields out to a separate ref-counted struct shared by each of the COPs in a sequence, where those fields haven't changed, with each COP reduced to little more than cop_line plus a pointer to the new struct.

…

-- Never do today what you can put off till tomorrow.

bbrtj · 2022-11-26T14:20:52Z

I have compiled perl from your branch and tested it on two pieces of code that use some float calculations.

First one is Algorithm::QuadTree::PP, which uses some (not much) float math in its circular shape finding routine. No improvement was seen.

The second one is more math-heavy, as it tries to find all border coordinates for a line segment. The heart of the function is implemented as follows:

my $coeff_x = ($position2->[1] - $position1->[1]) / ($position2->[0] - $position1->[0]);

my $checks_for_x = sub ($pos_x) {
	state $partial = $position1->[1] - $position1->[0] * $coeff_x;
	my $pos_y = $partial + $pos_x * $coeff_x;
	return ([$pos_x, $pos_y], [$pos_x - 1, $pos_y]);
};

my $checks_for_y = sub ($pos_y) {
	state $partial = $position1->[0] - $position1->[1] / $coeff_x;
	my $pos_x = $partial + $pos_y / $coeff_x;
	return ([$pos_x, $pos_y], [$pos_x, $pos_y - 1]);
};

my @coords = (
	(map { $checks_for_x->($_) } $position1->[0] + 1 .. $position2->[0]),
	(map { $checks_for_y->($_) } $position1->[1] + 1 .. $position2->[1])
);

Those two anonymous coderefs are then run for each integer coordinate of x and y. They are called about 20 times each and the entire function runs 40 thousand times per second, but I see no improvement on the benchmark if the function starts with use feature 'float'; (I expect this feature works in lexical scope).

I don't think I have anything else at the moment that has more float math in it.

tonycoz · 2022-11-27T22:54:07Z

I don't think I have anything else at the moment that has more float math in it.

I suspect sub call overhead is drowning the math costs.

From memory I used the following to benchmark it:

use strict;
my $max_iter = 100;
++$|;
for my $iy (0 .. 1000) {
  my $y = -1 + 0.002 * $iy;
  for my $ix (0 .. 1000) {
    my $x = -1 + 0.002 * $ix;
    my $i = 0;
    my $xo = $x;
    my $yo = $y;
    my $iter = 0;
    while ($xo * $xo + $yo * $yo <= 10 && ++$iter < $max_iter) {
      ($xo, $yo) = ( $xo * $xo - $yo * $yo + $x, 2 * $xo * $yo + $y);
    }
  }
  print ".";
}
print "\n";

which I probably adapted from a C sample in Imager.

bbrtj · 2022-11-28T06:22:53Z

I suspect sub call overhead is drowning the math costs.

With all math commented out (but variable declarations etc. left in), it runs about 20% faster, so I assume math takes about 16% of its runtime. When benchmarking your code I see 20-40% improvement, which would mean my code should run about 5-10% faster (taking into account your code also spends some of its runtime assigning variables etc.). You're right, that might not be enough to show on a benchmark.

demerphq · 2023-02-08T07:16:07Z

@tonycoz - i implemented RCPV filename and warnings bits, so we have redcuced the size of cops considerable (all together), so maybe we can reconsider making the hints bits bigger now?

Anyway, this PR is old and in conflict. Maybe we should get it rebased so it can be reconsidered?

tonycoz · 2023-02-08T23:44:58Z

I look at rebasing it, though probably not today.

I'll look at the extra hints word too, though I'm not sure we'll store it for eval (see where doeval_compile() initializes PL_hints).

Since I'm more familiar with features, I've implemented this as a feature. At this point there are no new tests. # Conflicts: # ext/Opcode/Opcode.pm # feature.h # lib/B/Op_private.pm # lib/feature.pm # opcode.h # opnames.h # pp_proto.h # regen/feature.pl # regen/opcode.pl

EdwardDanchetzNI · 2023-03-16T04:16:56Z

It looks like you ran a performance test on a mandelbrot set generator in Perl, comparing the performance of using float versus not using float. The test showed that using float improved the performance by about 6 seconds, with the script running in 19.751 seconds with the float option versus 25.752 seconds without it.

bulk88 · 2024-10-23T22:18:22Z

Do you have a branch for your small cop work?

It's very hacky and incomplete (and probably just plain broken), but https://github.com/Perl/perl5/tree/tonyc/less-cop

I've seen this done in other code bases differently.

Each OP has a bitfield (0xF or 16) of how many lines it is away from the last COP. Emit new COP every 16 lines of non branching code.

Array of line numbers (not seq) per sub, each line num struct, has a U32 mask, each bit in the mask represents a OP struct that is on the line. No runtime penalty for updating line numbers. But O(32) ish to find line number during an exception. Note this requires even sized OPs, P5 are not equal sized. Exception table/JIT peephole "sub-unit" is 32 or 64 OPs max obviously.

Store an array of U8 chars in the CV of op's position in the CV OP slab, PL_curcop+U8 offset=real line number.

Have a global interp line number offset from last COP. OP_NEXTLINE just ++s it. Smaller than little COP design here. no line number field needed. Fat COP can bump line number by units of NOT 1.

richardleach · 2025-01-10T23:07:04Z

We could add the new OPs and associated machinery independently of the COP changes. Despite the inability to explicilty use float without the COP work, the peephole optimizer could swap in a new OP for its generic counterpart when an operand is a CONST NV (that cannot be losslessly converted to an IV). In such cases, we know that a float operation will get carried out.

bulk88 · 2025-06-13T00:28:21Z

@tonycoz - i implemented RCPV filename and warnings bits, so we have redcuced the size of cops considerable (all together), so maybe we can reconsider making the hints bits bigger now?

Anyway, this PR is old and in conflict. Maybe we should get it rebased so it can be reconsidered?

the C lang C level HV* front end has always been very ugly. PP OP *s usually are manipulating SV* PV HEK*s strings and feeding them to the HV* API. C99 functions are always feeding hv_fetch(hv, "TheKey", 6, FALSE, (U32_HASH)0); C strings with no precalced U32 hash number, and no char* == char* skip the memcmp() libc call optimization. the 25 SV_CONST(EXISTS) macros are a right step towards fixing the C lang C level HV* front end, but 25^2 more SV*s are needed in that array to hold the full Perl 5 BNF grammer that lives inside libperl.so. Just run strings on libperl.so and after ignoring UTF invl related C strings, you will start to see my point that all ISO C "" lit strings need to be "CC time" or "PP compile time" converted into SV* PV HEK*s exactly 1x per perl proc lifetime.

bulk88 · 2025-06-13T01:19:01Z

We could add the new OPs and associated machinery independently of the COP changes. Despite the inability to explicilty use float without the COP work, the peephole optimizer could swap in a new OP for its generic counterpart when an operand is a CONST NV (that cannot be losslessly converted to an IV). In such cases, we know that a float operation will get carried out.

Correct, regardless what is said on permonks so hn and reddit, P5 has machine types, the only limitation is PP devs can't turn off the operator overloading and cant disable the implicit <dynamic_cast> methods. Javascript/ECMAScript has the same exact design defect as Perl 5, which is POD doesn't exist and every identifier is full blown OOP, yet what backwards unrefined dangerous programming language written by 1 guy over a weekend are you reading my comment with?

cough cough http://fglock.github.io/Perlito/
cough cough https://webperl.zero-g.net/democode/index.html

https://web.dev/articles/performance-mystery
http://wingolog.org/archives/2011/07/05/v8-a-tale-of-two-compilers

now lets look at some C code

https://github.com/v8/v8/blob/master/src/interpreter/bytecodes.h#L647

  // Return true if |bytecode| is an accumulator load without effects,
  // e.g. LdaConstant, LdaTrue, Ldar.
  static constexpr bool IsAccumulatorLoadWithoutEffects(Bytecode bytecode) {
    STATIC_ASSERT(Bytecode::kLdar < Bytecode::kLdaImmutableCurrentContextSlot);
    return bytecode >= Bytecode::kLdar &&
           bytecode <= Bytecode::kLdaImmutableCurrentContextSlot;
  }

OMG, did I just see Chrome browser just write

if (SvREADONLY(sv) && !SvMAGICAL(sv) && !SvROK(sv))  {
   ck_something_fold(op);
    op_free_something(op->op_next);
}

??? !!!

But Perl since 2014 is doing escape analysis IN THE RUNLOOP INSIDE PP_ENTERSUB!!!!

https://github.com/Perl/perl5/blob/blead/pp_hot.c#L6404

        {
            SV **svp = MARK;
            while (svp < PL_stack_sp) {
                SV *sv = *++svp;
                if (!sv)
                    continue;
                if (SvPADTMP(sv)) {
                    SV *newsv = sv_mortalcopy(sv);
                    *svp = newsv;
#ifdef PERL_RC_STACK
                    /* should just skip the mortalisation instead */
                    SvREFCNT_inc_simple_void_NN(newsv);
                    SvREFCNT_dec_NN(sv);
#endif
                    sv = newsv;
                }
                SvTEMP_off(sv);
            }
        }

perl5/pp_hot.c

Line 6451 in d6f09a8

items = PL_stack_sp - MARK;

            items = PL_stack_sp - MARK;
            if (UNLIKELY(items - 1 > AvMAX(av))) {
                SV **ary = AvALLOC(av);
                Renew(ary, items, SV*);
                AvMAX(av) = items - 1;
                AvALLOC(av) = ary;
                AvARRAY(av) = ary;
            }

            if (items)
                Copy(MARK+1,AvARRAY(av),items,SV*);
            AvFILLp(av) = items - 1;
#ifdef PERL_RC_STACK
            /* transfer ownership of the arguments' refcounts to av */
            PL_stack_sp = MARK;
#endif
        }

I'm sure there is a valid technical rational for the above, but I really despise seeing my yellow arrow enter this block when Im holding down F11.

https://github.com/Perl/perl5/blob/d6f09a896842e5288af5d3817756b67a919ad7ad/pp_hot.c#L6525C1-L6541C10

        else {
            SV **mark = PL_stack_base + markix;
            SSize_t items = PL_stack_sp - mark;
            while (items--) {
                mark++;
                if (*mark && SvPADTMP(*mark)) {
                    SV *oldsv = *mark;
                    SV *newsv = sv_mortalcopy(oldsv);
                    *mark = newsv;
#ifdef PERL_RC_STACK
                    /* should just skip the mortalisation instead */
                    SvREFCNT_inc_simple_void_NN(newsv);
                    SvREFCNT_dec_NN(oldsv);
#endif
                }
            }
        }

Like for real, is @_ an AJAX socket? I thought Perl subs live in the same virtual address space for speed reasons, but I guess I was wrong, each PP sub or XSUB executes on a different Android smartphone, over a SDN VPN and HTTP2 JSON packets over LTE.

bulk88 · 2025-06-13T02:49:10Z

So, to read paragraph # 2, needs 1 min of engineering classroom time.

I, bulk88, call Google's JSObjects, in Perl-ese talk, I will say Perl calls JSObject a SvROK() or calls it a SV* head struct. What Google calls SMIs, I will say Perl calls it a SVt_IV bodyless SV*, or perhaps Perl calls Google's SMIs as PL_sv_immortals[]/&PL_sv_yes.

Here is Google's PL_ppaddr[]; array, its easy to read

https://github.com/v8/v8/blob/master/src/compiler/opcodes.h

The pp_foo() with names that aggressively flaunt raw CPU machine types instead of JSObjects or SMIs, In my professional opinion do not belong inside the P5P distributed libperl.so VM. C89/99/C++23's liveness, escape analysis, and strict alias rules, along with the inability of runtime real live end user continuous instrumenting where each extra 100 entry hits to a Perl sub/ CV* obj, triggers another pass of progressive JITting and more and more big O(ness) big wall time op_ck_foo() rounds on the PP AST. I think the Larry Wall Day 1 Perl 5 C level mix of high level/low level ness of pp_foo() funcs is fine. We don't want to go down to SSA and RTL and FPGA transistor wiring with a 66/110 punch tool. But we don't want each pp_foo() func or PP opcode to be a turing complete CPU emulator/LLM neural network framework.

Its also inappropriate and the ship has sailed very long ago, to turn Larry's Perl 5 engine, into a 9-15 MB .css or .xml or GNUMakefile file that is loaded by LLVM/V8/Davik/Appl Swift interpreters/compilers. Raku failed, NQP is the leftovers, Parrot, and Reini Urban (search GH perl11 or his GH for its name, I compiled it 1x ever for Win32 but it insta SEGVs) and Rperl have all converted the Larry created C89 Perl 5 code base, into a single .css file that is fed into a non-C compiler, and the final binary executes Perl 5 code with about 75-85% source code compatible and bug compatibility, with P5P blead's distributed .pm and .pl files.

Other than RPerl, I dont think any Perl 5 grammar turned into a .css executing on a foreign JIT interp runtime, has ever out benchmarked the stock C89 P5P code base. Rperl is clone of https://en.wikipedia.org/wiki/Asm.js but for Perl 5 lang, libperl.so is always inside the address space of a Rperl process. All RPerl PP subs can be step debugged by perl5db.pl. RPerl's main method of operation is, certain AST-ed/OP treed CV*s, if they match an extremely strict "regexp", get emitted into super clean perfect IMO, C89 src code, that gets sent through FooOS's vanilla CC toolchain, and eventually become XSUBs. Its done at runtime/BEGIN{} time with no extra work by the end user.

I believe RPerls author, but this is my guess, I've never asked, and he has never told me himself, but from my single stepping the FOSS RPerl code, and troubleshooting it with him. I have a hunch he either whiteboarded, or added provisions, or the author or his employer would sell a commercial version of RPerl, that automatically takes PP P5 grammar subroutines, and compiles them and uploads them to a Nvidia RTX GPGPU card, from inside the P5P Perl_runops_standard() runloop.

This is pointless for 80% of Perl users and their production code, but RPerl is a death sentence to CPAN's PDL module. Since RPerl allows AI/Crypto Coin mining/big data/big sci computing code, to be written by entry level PP devs in Perl 5, after reading the camel book. Not learning a foreign programming language called PDL and FFI-ing into the PDL abstract virtual machine.

If I had a personal hobby or business reason to do it, I will just write 0x500 bytes of machine code +.rdata x64/i386 JIT assembler and throw it into leonerd's builtin.c and now the PSC can release Perl 7 with killer new features like JIT from PP.

*Disclaimer P5P does not accept any bug reports for any SEGVs caused by broken JIT x64 created with *main::builtin::jit:: package, if the bug reporter can't write it in GCC inline ASM without a SEGV, they can't write in Perl 5 PP code high performance JIT either

tonycoz added the do not merge Don't merge this PR, at least for now label Jun 2, 2020

toddr added the Feature A New Feature. label Jul 30, 2020

atoomic added the needs-work The pull request needs changes still label Jul 30, 2020

toddr added has conflicts and removed needs-work The pull request needs changes still labels Jul 31, 2020

github-actions bot added the hasConflicts label Jul 31, 2020

atoomic removed the has conflicts label Jul 31, 2020

github-actions bot added hasConflicts and removed hasConflicts labels Sep 1, 2020

richardleach mentioned this pull request Apr 6, 2022

SV is "PVNV" despite being assigned/used in numeric context only #19589

Closed

jkeenan added the Stalled label Jul 3, 2022

tonycoz force-pushed the 17813-use-float-poc branch from b1d2c2d to 4d1c8cc Compare February 9, 2023 04:48

github-actions bot added hasConflicts and removed hasConflicts labels Feb 9, 2023

jkeenan added Feature Request Feature A New Feature. and removed Feature A New Feature. Feature Request labels Dec 29, 2024

proof of concept/performance test for use float #17831

Are you sure you want to change the base?

proof of concept/performance test for use float #17831

Uh oh!

Conversation

tonycoz commented Jun 2, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tonycoz commented Aug 3, 2020

Uh oh!

richardleach commented Aug 5, 2020

Uh oh!

atoomic commented Aug 5, 2020

Uh oh!

tonycoz commented Sep 15, 2020

Uh oh!

jkeenan commented Jan 26, 2021

Uh oh!

tonycoz commented Jan 26, 2021

Uh oh!

richardleach commented Sep 2, 2022

Uh oh!

demerphq commented Sep 2, 2022 via email

Uh oh!

tonycoz commented Sep 5, 2022

Uh oh!

demerphq commented Oct 26, 2022

Uh oh!

tonycoz commented Oct 26, 2022

Uh oh!

demerphq commented Oct 27, 2022 via email

Uh oh!

iabyn commented Nov 7, 2022 via email

Uh oh!

bbrtj commented Nov 26, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tonycoz commented Nov 27, 2022

Uh oh!

bbrtj commented Nov 28, 2022

Uh oh!

demerphq commented Feb 8, 2023

Uh oh!

tonycoz commented Feb 8, 2023

Uh oh!

EdwardDanchetzNI commented Mar 16, 2023

Uh oh!

bulk88 commented Oct 23, 2024

Uh oh!

richardleach commented Jan 10, 2025

Uh oh!

bulk88 commented Jun 13, 2025

Uh oh!

bulk88 commented Jun 13, 2025

Uh oh!

bulk88 commented Jun 13, 2025

Uh oh!

Uh oh!

tonycoz commented Jun 2, 2020 •

edited

Loading

bbrtj commented Nov 26, 2022 •

edited

Loading