-
Notifications
You must be signed in to change notification settings - Fork 589
proof of concept/performance test for use float #17831
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: blead
Are you sure you want to change the base?
Conversation
I see two upvotes - did anyone else try benchmarking this on more useful code? I ask to see if it's worth developing this further. I implemented this as a feature, but it doesn't really belong there since it's not a language feature as such, it shouldn't be enabled by a feature version bundle. I'm hesitant to use a hints bit since we're fairly short on them. Simply using an entry in Maybe it could be implemented as a feature, but not included in the |
This still seems worthwhile to me, but non of my useful code really uses float math so nothing handy to benchmark. |
note that we recovered some hint bits with 5d17394 I've not tested/benchmarked this on other code. |
That recovered only a single bit which is now assigned to the feature mask, where it belongs. Maybe we just need another hints word. |
@tonycoz , @richardleach , @atoomic, Can we get an update on the status of this p.r.? Thank you very much. |
It's waiting on (likely) adding another hints word. But I think that needs to wait on reducing the cost of COPs which those are embedded into. Right now a COP is generated for every statement, but the information in each COP typically doesn't change much except for the line number. I've looked at adding an alternative COP which only has a line number, but this will break some backward compatibility at the XS level. |
I noticed that the regular versions of these functions do:
rather than: |
On Tue, 26 Jan 2021 at 23:32, Tony Cook ***@***.***> wrote:
It's waiting on (likely) adding another hints word.
But I think that needs to wait on reducing the cost of COPs which those are embedded into.
Right now a COP is generated for every statement, but the information in each COP typically doesn't change much except for the line number. I've looked at adding an alternative COP which only has a line number, but this will break some backward compatibility at the XS level.
I'd like to hear more about this as it aligns with my interest in
improving the quality of our error messages. If can do any legwork
here id be happy to hear an appraisal of the problem to get started
with. Just mail me personally. You know where. :-)
Yves
…--
perl -Mre=debug -e "/just|another|perl|hacker/"
|
I've stalled on this a bit (error: stack overflow), but I did get a "small COP" large implemented and I don't remember getting any crashes. I still needed to update caller() to understand the new COPs. There may have been other problems though, I wasn't comfortable with the way I was detecting whether a small COP was possible, eg with code like:
lines 1, 4, 7, 9 needed full COPs, and I hadn't gotten to the point of checking that was happening when it should. Even without adding a small COP we could improve memory usage a great deal by reference counting cop_warnings, and I think cop_file on threads, these are profligate users of memory - each cop has it's own copy. |
In theory it should be pretty easy to use PL_strtab to do that if they are write-once. I will take a look. Do you have a branch for your small cop work? |
It's very hacky and incomplete (and probably just plain broken), but https://github.com/Perl/perl5/tree/tonyc/less-cop |
On Thu, 27 Oct 2022 at 00:26, Tony Cook ***@***.***> wrote:
Do you have a branch for your small cop work?
It's very hacky and incomplete (and probably just plain broken), but
https://github.com/Perl/perl5/tree/tonyc/less-cop
Nice, for what its worth ive been looking at replacing cop_file with a HEK.
Which would allow the same code to be used
to share the pv threads or otherwise.
Yves
…--
perl -Mre=debug -e "/just|another|perl|hacker/"
|
On Sun, Sep 04, 2022 at 06:33:28PM -0700, Tony Cook wrote:
Even without adding a small COP we could improve memory usage a great
deal by reference counting cop_warnings, and I think cop_file on
threads, these are profligate users of memory - each cop has it's own
copy.
An alternative approach perhaps would be to move most of the COP fields
out to a separate ref-counted struct shared by each of the COPs in a
sequence, where those fields haven't changed, with each COP reduced to
little more than cop_line plus a pointer to the new struct.
…--
Never do today what you can put off till tomorrow.
|
I have compiled perl from your branch and tested it on two pieces of code that use some float calculations. First one is Algorithm::QuadTree::PP, which uses some (not much) float math in its circular shape finding routine. No improvement was seen. The second one is more math-heavy, as it tries to find all border coordinates for a line segment. The heart of the function is implemented as follows: my $coeff_x = ($position2->[1] - $position1->[1]) / ($position2->[0] - $position1->[0]);
my $checks_for_x = sub ($pos_x) {
state $partial = $position1->[1] - $position1->[0] * $coeff_x;
my $pos_y = $partial + $pos_x * $coeff_x;
return ([$pos_x, $pos_y], [$pos_x - 1, $pos_y]);
};
my $checks_for_y = sub ($pos_y) {
state $partial = $position1->[0] - $position1->[1] / $coeff_x;
my $pos_x = $partial + $pos_y / $coeff_x;
return ([$pos_x, $pos_y], [$pos_x, $pos_y - 1]);
};
my @coords = (
(map { $checks_for_x->($_) } $position1->[0] + 1 .. $position2->[0]),
(map { $checks_for_y->($_) } $position1->[1] + 1 .. $position2->[1])
); Those two anonymous coderefs are then run for each integer coordinate of x and y. They are called about 20 times each and the entire function runs 40 thousand times per second, but I see no improvement on the benchmark if the function starts with I don't think I have anything else at the moment that has more float math in it. |
I suspect sub call overhead is drowning the math costs. From memory I used the following to benchmark it:
which I probably adapted from a C sample in Imager. |
With all math commented out (but variable declarations etc. left in), it runs about 20% faster, so I assume math takes about 16% of its runtime. When benchmarking your code I see 20-40% improvement, which would mean my code should run about 5-10% faster (taking into account your code also spends some of its runtime assigning variables etc.). You're right, that might not be enough to show on a benchmark. |
@tonycoz - i implemented RCPV filename and warnings bits, so we have redcuced the size of cops considerable (all together), so maybe we can reconsider making the hints bits bigger now? Anyway, this PR is old and in conflict. Maybe we should get it rebased so it can be reconsidered? |
I look at rebasing it, though probably not today. I'll look at the extra hints word too, though I'm not sure we'll store it for eval (see where doeval_compile() initializes PL_hints). |
Since I'm more familiar with features, I've implemented this as a feature. At this point there are no new tests. # Conflicts: # ext/Opcode/Opcode.pm # feature.h # lib/B/Op_private.pm # lib/feature.pm # opcode.h # opnames.h # pp_proto.h # regen/feature.pl # regen/opcode.pl
b1d2c2d
to
4d1c8cc
Compare
It looks like you ran a performance test on a mandelbrot set generator in Perl, comparing the performance of using float versus not using float. The test showed that using float improved the performance by about 6 seconds, with the script running in 19.751 seconds with the float option versus 25.752 seconds without it. |
I've seen this done in other code bases differently. Each OP has a bitfield (0xF or 16) of how many lines it is away from the last COP. Emit new COP every 16 lines of non branching code. Array of line numbers (not seq) per sub, each line num struct, has a U32 mask, each bit in the mask represents a OP struct that is on the line. No runtime penalty for updating line numbers. But O(32) ish to find line number during an exception. Note this requires even sized OPs, P5 are not equal sized. Exception table/JIT peephole "sub-unit" is 32 or 64 OPs max obviously. Store an array of U8 chars in the CV of op's position in the CV OP slab, PL_curcop+U8 offset=real line number. Have a global interp line number offset from last COP. OP_NEXTLINE just ++s it. Smaller than little COP design here. no line number field needed. Fat COP can bump line number by units of NOT 1. |
We could add the new OPs and associated machinery independently of the COP changes. Despite the inability to explicilty |
the C lang C level |
Correct, regardless what is said on permonks so hn and reddit, P5 has machine types, the only limitation is PP devs can't turn off the operator overloading and cant disable the implicit cough cough http://fglock.github.io/Perlito/ https://web.dev/articles/performance-mystery now lets look at some C code https://github.com/v8/v8/blob/master/src/interpreter/bytecodes.h#L647
OMG, did I just see Chrome browser just write
??? !!! But Perl since 2014 is doing escape analysis IN THE RUNLOOP INSIDE PP_ENTERSUB!!!! https://github.com/Perl/perl5/blob/blead/pp_hot.c#L6404
Line 6451 in d6f09a8
I'm sure there is a valid technical rational for the above, but I really despise seeing my yellow arrow enter this block when Im holding down F11.
Like for real, is |
So, to read paragraph # 2, needs 1 min of engineering classroom time. I, bulk88, call Google's Here is Google's https://github.com/v8/v8/blob/master/src/compiler/opcodes.h The Its also inappropriate and the ship has sailed very long ago, to turn Larry's Perl 5 engine, into a 9-15 MB Other than RPerl, I dont think any Perl 5 grammar turned into a I believe RPerls author, but this is my guess, I've never asked, and he has never told me himself, but from my single stepping the FOSS RPerl code, and troubleshooting it with him. I have a hunch he either whiteboarded, or added provisions, or the author or his employer would sell a commercial version of RPerl, that automatically takes PP P5 grammar subroutines, and compiles them and uploads them to a Nvidia RTX GPGPU card, from inside the P5P This is pointless for 80% of Perl users and their production code, but RPerl is a death sentence to CPAN's PDL module. Since RPerl allows AI/Crypto Coin mining/big data/big sci computing code, to be written by entry level PP devs in Perl 5, after reading the camel book. Not learning a foreign programming language called PDL and FFI-ing into the PDL abstract virtual machine. If I had a personal hobby or business reason to do it, I will just write 0x500 bytes of machine code +.rdata x64/i386 JIT assembler and throw it into leonerd's *Disclaimer P5P does not accept any bug reports for any SEGVs caused by broken JIT x64 created with |
This is an attempt at #17813
I tested performance with a simple mandelbrot set generator (on my old CPU):