Closed
Description
The current specification for #46 defines spread of a Map in terms of the Iterable protocol. This seems unnecessarily inefficient, as compared to just iterating over the keys (avoiding allocating the MapEntry
objects).
Even better would be to call an addAllTo
method passing the the receiver map as an argument, thereby avoiding allocating the key iterator as well, but that would be a breaking change to the map API.
Activity
lrhn commentedon Feb 6, 2019
I'm less convinced that this is the best option, than I am about spreading iterables.
The
entries
iterator can easily require allocation for each entry (doesn't have to, but likely will).However, the alternatives are not amazing either:
keys
means having to do amap[key]
lookup afterwards. Whether that's more or less efficient than allocating entry objects is a matter of optimization.forEach
means introducing a closure,oldMap.forEach((k, v) { newMap[k] = v; }
.map.addAll
will do something generic as well (default implementation uses 1.)addAllTo
method to maps would require passing a map to the method, and exposing the map that is currently being built in an unfinished state..forEach
.Leaving it unspecified makes the language fragile. A class that implements some
Map
functionality, but not all, might work on one platform and fail on another. The only user recommendation would be to only ever use platform maps with spreads. I wouldn't want that, so 6. is out.I considered iterating entries as the best of the other alternatives, with
forEach
as the next in line.The
forEach
operation is really the correct approach - let the map iterate itself, then do something for each key/value pair. The only concern is that it requires creating a closure.On the VM, the
entries
version does seem to be ~20% slower thanforEach
, which is slightly faster thankeys
/addAll
.With dart2js/d8, everything is dirt slow, but
entries
is ~half the speed offorEach
, which is again a little faster thankeys
/addAll
.So, from the benchmarks alone, we should be using
forEach
.I'm not particularly happy about making language design decisions based on the current speed of implementations. We will be stuck with the specification no matter what we otherwise optimize.
If we get tuples in the language, then we might want to change the specification to use a tuple-iterator instead of
entries
. That might (should) be more efficient, but that's speculation.Still, I'm willing to change to
forEach
. The implementations might be able to do something efficient with the closure since they are in complete control of its body.We'd have to make sure that a user
forEach
that leaks the incoming function cannot use it after theforEach
call has completed (we don't want anyone changing our map after it's created).So,
...e
becomes something like:A quick benchmark (now polymorphic):
List
use the Iterable protocol or direct iteration? #208leafpetersen commentedon Feb 7, 2019
Ok. I'm not deeply dug in on this, but I do think it's a wasted opportunity if we ship something in a form that's unnecessarily slow, and I don't see any user facing reasons to choose one or the other. I'm certainly sympathetic to the difficulty of extrapolating future performance from current performance, but I'm not sure that justifies just throwing up our hands and flipping a coin.
For what it's worth, @rakudrama expressed some preference for using
.addAll
to allow for possible future double dispatch implementation tricks.eernstg commentedon Feb 7, 2019
When we decide on the semantics on spreads, it would be nice to leave a tiny bit of wiggle room for improved performance. Granted, we must give developers specific guarantees about what's going on, but we could have situations where we want to do something in a slightly different way.
One example is when a map is created from an existing map, and then adjusted:
This might be a quite common situation. The point is that during the phase where the map elements from
m1
are added tom2
, we know more than we usually do: For each key fromm1
, it is guaranteed that it is not already present inm2
, and hence we do not need to look it up for the purpose of overwriting rather than adding a new map element.Different implementations of
Map
may make this a bigger or smaller optimization, but the point is that we'd just need to iterate over the map elements (m1.forEach
would do that), and insertion could use some "secret" version of[]=
for which it is guaranteed by the compiler that it's a fresh key form2
.A similar situation arises for sets, as mentioned here: #208 (comment).
lrhn commentedon Feb 7, 2019
@leafpetersen
There is the option of saying, for
...e
, executed to collect map entries for aMap<K, V>
in a list, that:This is basically the approach I ruled out, because it's impossible to predict which members of the user supplied map are used to do the iteration. Even if we say that, we will use some operation (likely
forEach
) to do the operation, and then someone might write code that only implementsforEach
for some utilityMap
that is only intended to be spread. If we change the behavior later, then we break that code. If we are not ready and willing to make that break, then the implementation is the specification, and we might as well make the specification match what we do.So, are we ready and willing to break partially implemented maps? (And I want that in writing, in public, so we can point to it if users complain). If so, sure, let's be unspecified. Otherwise, we should specify one approach that users can depend on (and that seems to be
forEach
).If we allow any way to iterate, the implementation can just always use
addAll
, because then anythingaddAll
does will be correct.@eernstg
The "fresh key" is a red herring. It is possible to have a map with two different keys that are equal according to
==
.Example:
Unless you know the internal workings of the map you are spreading, you can't assume anything.
If you do know the internal workings, it's a platform map and then we can do anything anyway, including adding an
_addTo<K, V>(LinkedHashMap<K, V> target)
that works directly on the underlying hash table.eernstg commentedon Feb 8, 2019
@lrhn wrote
Right, it is possible to have a set representation that contains two elements which are (currently) equal, and a map can have two equal keys. That's basically a bug in the realization of those two concepts, because they ought to maintain uniqueness, but we can't avoid it unless we introduce a strong notion of values (immutable entities, with deep equality).
Anyway, it still works with built-in types like integers and strings, and even in that case it might be nice to reserve enough wiggle room to be able to optimize whatever widespread special case we might encounter. (And, as discussed elsewhere, using
addAll
might actually be quite helpful for that!)lrhn commentedon Feb 8, 2019
As an opposing view, users are not likely to get their understanding of language features directly from the language specification, so maybe we will get a better understanding if we define the operations in terms of building blocks that users do know.
In this case, we could define list, set and map literals to be equivalent to creating an empty list, set or map, and then perform sequences of
add
andaddAll
operations. A single element callsadd
(or[]=
for maps). A spread element callsaddAll
. The control flow operations recursively end up with elements or spreads.If we do that, then we might be able to generalize literals to other kinds of collections, so you can write:
to initialize a new instance of
Queue
.(Sadly, it won't work for
Uint8List
since it has noadd
method).Or something.
In any case, we would just need to optimizie the
addAll
ofGrowableList
,LinkedHashSet
andLinkedHashMap
to improve performance, and why woudln't we do that anyway.leafpetersen commentedon Feb 9, 2019
Using
.addAll
is just a different way of either:.addAll
does now foreverIf we do either of those, I'd rather that we just either define it in terms of behavior (since otherwise we're stopping
.addAll
from changing how it iterates, or explicitly make it implementation defined.My general take is that using the entries iterator provides no real user benefit, analytically is unlikely to ever provide us with better performance than other approaches, and empirically seems slower.
I'm ok with making it implementation defined, specifying only that we use iteration order, but this generally doesn't seem in line with how we do things in Dart, and de facto it will be a breaking change in the implementations to change how they do it.
I'm ok with using
.forEach
, but I'm slightly concerned about code size given that closure representations are not cheap in all platforms.I think overall, my preference is to define this in terms of iteration over
.keys
. It's the code that I would generally write myself, it empirically (based on your measurements) seems better than using.entries
and is close to as good as.forEach
.munificent commentedon Feb 9, 2019
I wrote a little benchmark to try to get some numbers for what the various protocols that a spread could desugar look like. So far, I've only run this on the VM, but I can try other platforms. Here's the results:
The "iterate entries into map" line is what the proposal mandates. That benchmark covers both iterating over the from map and inserting the result into the to map. Arguably, the latter isn't interesting because any proposal has to do that somehow. So the "just" lines do the work of iterating over the from map but then don't actually insert into another map. In theory, the differences between the "just" rows are more important since that shows the relative difference of each strategy ignoring the fixed overhead of inserting into the new map. The "overhead" lines show the difference, which should be the time just to insert the entries.
The "addEntries" one is a little special. It calls
to.addEntries(from)
directly, which means we can't split out the iterating from the inserting.It does look like the overhead is about constant, which is a good sanity check. From the numbers it seems like forEach() is the big winner. Iterating over the keys is a little faster, but not much.
But... it depends on how you slice the data. This benchmark is deliberately polymorphic and spreads LinkedHashMaps, HashMaps, and SplayTreeMaps. If I split that out into three separate monomorphic benchmarks:
SplayTreeMap:
LinkedHashMap:
HashMap:
Then it looks like the three classes have very different performance profiles for their APIs. forEach() is really fast on SplayTreeMap compared to iterating over the keys or entries. But for the other classes, it doesn't make that much of a difference. This isn't too surprising if you look at the implementation of SplayTreeMap.
Note also that this benchmark uses a mixture of map sizes from 0 up to 100. If we leaned towards certain sizes, that may also skew the numbers.
It does seem like
.entries
has little going for it. If nothing else, I think.keys
and[]
is both a little faster and more idiomatic.forEach()
isn't as much of a consistent win as it first appeared, especially given how rarely used SplayTreeMap is in practice.One way to think about this is that we should decide if we think internal or external iteration is the right protocol. Internal iteration is "easier" for the iteratee. It's just given a closure and it has total control over how it walks itself and invokes the closure. It doesn't need to return control back to the thing doing the spread until all of the iteration is done. That means it can effectively use the top of the callstack as its own local state storage.
With any kind of external iteration, the iteratee has to return to the iterator. That means in order to pick up where it left off, that state has be reified (and thus usually allocated in Dart) somewhere. In return for this, the iterator gets more flexibility. It can choose to stop iterating, resume later, etc.
In the case of spread, we don't need any of that flexibility. Every spread always wants to iterate over the spreadee exactly once, to completion. So there's an argument is that internal iteration is the principled choice.
That implies using
forEach()
for maps and probably also for Iterables being spread into lists and sets. On the other issue (#208), my benchmark shows thatforEach()
is faster for Iterables, except on the VM. Maybe there's some overhead for calling a closure. Note that unlike the other options on that issue, we could do this for all Iterables being spread, not just those that implement List.Cases where the thing you are spreading is a List are interesting. In that special case, the only state you need for an external iterator is an integer, so it tends to be quite fast.
On the other hand, the
for-in
statement always uses external iteration, so making thefor
element do a different protocol is maybe unsymmetric. If we ever wanted to later add support forbreak
infor
elements or anything else that would cause us to take control away from the spreadee, we will hate ourselves for choosing internal iteration.Thoughts?
lrhn commentedon Feb 11, 2019
I'm leaning towards internal iteration for spreads, for the reasons mentioned. The collection being spread is the only one who can truly know how best to iterate it. Anything we specify will be sub-optional in some cases, unless we specify nothing (and then pick internal iteration as implementation).
That said, we have iteration in the language with
sync*
andfor-in
. Any time we see a combination of those, we should be able to inline very eagerly and get zero-overhead effectively-internal iteration.The big problem with that is that nobody uses
sync*
for writing their collection'siterator
, and we can't statically detect that it happens. That is sad.An issue with internal iteration is that the only abstraction for executable code we have is closures, so we have to allocate a closure when using
forEach
. We also get a number of polymorphic closure invocations, and we leak this function to user code, which may choose to keep it alive, so we have to program defensively.Maybe the platforms can do something cheaper than a full closure for these things. Maybe not.
So, in summary:
forEach
for maps, or leave it unspecified, becauseforEach
might not always be the fastest.addAll
because then it might generalize to user collections too. That's speculative.Can we live with breaking user code if we change an unspecified behavior?
If not, then we need to specify it in the smallest details (and likely cheat). Then I'd just use iteration, for consistency. If we say that a list must be iterated using index operations, we increase the code size for every iteration of an
Iterable
that might be aList
(if we want to inline) or at least we introduce an extra type check. I don't like that.In practice, all lists will be platform lists, so just silently special-casing those will likely be enough.
munificent commentedon Feb 11, 2019
Here's numbers for the other main platforms. dart2js is crushingly slow if you don't do
-O4
. I believe most deployed apps use that setting too, so doing that here:dart2js -O4, all three map types
dart2js -O4, just LinkedHashMap
dart2js -O4, just SplayTreeMap
dart2js -O4, just HashMap
AoT
AoT, all three map types
AoT, just LinkedHashMap
AoT, just SplayTreeMap
AoT, just HashMap
I'm going to go through the list benchmark too and try to consolidate this into some single set of numbers that are hopefully actionable.
jonahwilliams commentedon Feb 11, 2019
I'm not sure if flutter webs apps will be able to use
-O4
due to our usage of Type and runtimeTime objects. @yjbanov have you experimented with any of the dart2js optimization levels?yjbanov commentedon Feb 12, 2019
@munificent Have you looked at memory usage difference between various methods? I'd be curious what objects end up allocated on the heap during iteration. In
forEach
case we allocate at least a closure. Others seem to need at least anIterator
. However, if the compiler knows the exact type of the map it may be able to iterate without allocating anything at all.yjbanov commentedon Feb 12, 2019
@jonahwilliams I want to say we use
-O4
in our prototype already, but I'll have to double-check.munificent commentedon Feb 13, 2019
I haven't, but I don't think memory usage should differ that much. In general, we expect most spreads will be fairly small. On large spreads, you may churn through a bunch of MapEntry objects, but modern GCs handle short-lived objects pretty well.
I did some more benchmarking and investigation and wrote up my thoughts. We discussed this more in the language meeting. The decision is to keep specifying this in terms of
entries
.As always, if an implementation detects that the object being spread is a known built-in type (which is almost always the case for maps), it's free to do some lower-level access to the map entries since it knows that there won't be any user-visible side effects.