Open
Description
In my initial performance work, the big 3 improvements were local method tables (framecode.methodtables
), reusing framedata (the junk
mechanism), and getting rid of runtime dispatch internally within the interpreter (by manual union-splitting, aka, if isa(x, ConcreteType)...
).
Some possible future improvements:
- Recent changes diving into
Core._apply
,Core._apply_latest
, andinvoke
hurt performance. Perhaps some of that might be clawed back by moving more of this work intooptimize!
? -
ccall
s are slow. I just pushed ateh/compiled_ccall
that contains an old attempt to create one compiled function perccall
. I was most interested in seeing whether it would circumvent MWE of char crash #28 (it didn't seem to), but I don't think I looked at its runtime implications (esp on a ccall-heavy workload). I suspect that branch is pretty close to usable, if anyone wants to pick it up and run with it. - specifically look for calls to
iterate
and do something special, a.k.a., (1) determine whetheriterate
could possibly hit a breakpoint or error (if not then you're allowed to play tricks), (2) see if you can determine whether the loop will be long, and if so (3) pass the iterator and frame to a specialrun_loop
function. That would have to compile for each iterator type, so is worth doing only if the loop is long. But you would save the time spent diving in and out ofiterate
, and that could be quite helpful. - apply similar treatment to arrayrefs. Even our builtin handling of
arrayref
could be improved by separately treating small dimensions (e.g.,if nargs == 1 ... elseif nargs == 2 ... elseif nargs == 3 ... else <varargs version> end
.
Obviously spending some time with ProfileView will be useful. I did a ton of that at the beginning and fixed most everything I that I noticed. But I haven't done it in a while and we may have regressed in places as we expanded our functionality.