-
Notifications
You must be signed in to change notification settings - Fork 214
Should spread of a Map
use the Iterable protocol or direct iteration?
#209
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I'm less convinced that this is the best option, than I am about spreading iterables. However, the alternatives are not amazing either:
Leaving it unspecified makes the language fragile. A class that implements some I considered iterating entries as the best of the other alternatives, with On the VM, the I'm not particularly happy about making language design decisions based on the current speed of implementations. We will be stuck with the specification no matter what we otherwise optimize. Still, I'm willing to change to var sourceMap = e;
var tmp = targetMap;
sourceMap.forEach((k, v) {
tmp[k] = v;
});
tmp = null; // Prevent later calls from changing `targetMap`. A quick benchmark (now polymorphic): import"dart:collection";
void copy1(Map<String, int> from, Map<String, int> to) {
for (var entry in from.entries) {
to[entry.key] = entry.value;
}
}
void copy2(Map<String, int> from, Map<String, int> to) {
for (var key in from.keys) {
to[key] = from[key];
}
}
void copy3(Map<String, int> from, Map<String, int> to) {
var tmp = to;
from.forEach((key, value) {
tmp[key] = value;
});
tmp = null;
}
void copy4(Map<String, int> from, Map<String, int> to) {
to.addAll(from);
}
main() {
for (int i = 0; i < 5; i++) {
bench("entries", copy1);
bench("keys", copy2);
bench("forEach", copy3);
bench("addAll", copy4);
}
}
int id(int x)=>x;
var maps = List.generate(100, (n) {
var map = Map<String, int>.fromIterable(Iterable.generate(n * 10), key: (n) =>"#$n");
if (n % 4 == 1) map = SplayTreeMap<String, int>.from(map);
if (n % 4 == 2) map = HashMap<String, int>.from(map);
return map;
});
void bench(String name, void Function(Map<String, int>, Map<String, int>) action) {
var e = 0;
var c = 0;
var sw = Stopwatch()..start();
do {
for (var from in maps) {
var to = <String, int>{};
action(from, to);
c += from.length;
}
e = sw.elapsedMilliseconds;
} while (e < 2000);
print("$name: ${c/e} entries/ms");
} |
Ok. I'm not deeply dug in on this, but I do think it's a wasted opportunity if we ship something in a form that's unnecessarily slow, and I don't see any user facing reasons to choose one or the other. I'm certainly sympathetic to the difficulty of extrapolating future performance from current performance, but I'm not sure that justifies just throwing up our hands and flipping a coin. For what it's worth, @rakudrama expressed some preference for using |
When we decide on the semantics on spreads, it would be nice to leave a tiny bit of wiggle room for improved performance. Granted, we must give developers specific guarantees about what's going on, but we could have situations where we want to do something in a slightly different way. One example is when a map is created from an existing map, and then adjusted: Map<K, V> m1 = e; // Implementation may not be know to compilers.
var m2 = {...m1, plus: some, more: stuff}; This might be a quite common situation. The point is that during the phase where the map elements from Different implementations of A similar situation arises for sets, as mentioned here: #208 (comment). |
@leafpetersen
This is basically the approach I ruled out, because it's impossible to predict which members of the user supplied map are used to do the iteration. Even if we say that, we will use some operation (likely So, are we ready and willing to break partially implemented maps? (And I want that in writing, in public, so we can point to it if users complain). If so, sure, let's be unspecified. Otherwise, we should specify one approach that users can depend on (and that seems to be If we allow any way to iterate, the implementation can just always use @eernstg Map<Duration, int> m1 = Map.identity()..[Duration(seconds: 1)] = 1 ..[Duration(seconds: 1)] = 2;
print(m1.length); // 2
var m2 = [...m1];
print(m2.length); // 1
print(m2[Duration(seconds: 1)]); // 2 Unless you know the internal workings of the map you are spreading, you can't assume anything. |
@lrhn wrote
Right, it is possible to have a set representation that contains two elements which are (currently) equal, and a map can have two equal keys. That's basically a bug in the realization of those two concepts, because they ought to maintain uniqueness, but we can't avoid it unless we introduce a strong notion of values (immutable entities, with deep equality). Anyway, it still works with built-in types like integers and strings, and even in that case it might be nice to reserve enough wiggle room to be able to optimize whatever widespread special case we might encounter. (And, as discussed elsewhere, using |
As an opposing view, users are not likely to get their understanding of language features directly from the language specification, so maybe we will get a better understanding if we define the operations in terms of building blocks that users do know. In this case, we could define list, set and map literals to be equivalent to creating an empty list, set or map, and then perform sequences of If we do that, then we might be able to generalize literals to other kinds of collections, so you can write: var queue = Queue()..{1, 2, ... something, if (whatnot) 42}; to initialize a new instance of In any case, we would just need to optimizie the |
Using
If we do either of those, I'd rather that we just either define it in terms of behavior (since otherwise we're stopping My general take is that using the entries iterator provides no real user benefit, analytically is unlikely to ever provide us with better performance than other approaches, and empirically seems slower. I'm ok with making it implementation defined, specifying only that we use iteration order, but this generally doesn't seem in line with how we do things in Dart, and de facto it will be a breaking change in the implementations to change how they do it. I'm ok with using I think overall, my preference is to define this in terms of iteration over |
I wrote a little benchmark to try to get some numbers for what the various protocols that a spread could desugar look like. So far, I've only run this on the VM, but I can try other platforms. Here's the results:
The "iterate entries into map" line is what the proposal mandates. That benchmark covers both iterating over the from map and inserting the result into the to map. Arguably, the latter isn't interesting because any proposal has to do that somehow. So the "just" lines do the work of iterating over the from map but then don't actually insert into another map. In theory, the differences between the "just" rows are more important since that shows the relative difference of each strategy ignoring the fixed overhead of inserting into the new map. The "overhead" lines show the difference, which should be the time just to insert the entries. The "addEntries" one is a little special. It calls It does look like the overhead is about constant, which is a good sanity check. From the numbers it seems like forEach() is the big winner. Iterating over the keys is a little faster, but not much. But... it depends on how you slice the data. This benchmark is deliberately polymorphic and spreads LinkedHashMaps, HashMaps, and SplayTreeMaps. If I split that out into three separate monomorphic benchmarks: SplayTreeMap:
LinkedHashMap:
HashMap:
Then it looks like the three classes have very different performance profiles for their APIs. forEach() is really fast on SplayTreeMap compared to iterating over the keys or entries. But for the other classes, it doesn't make that much of a difference. This isn't too surprising if you look at the implementation of SplayTreeMap. Note also that this benchmark uses a mixture of map sizes from 0 up to 100. If we leaned towards certain sizes, that may also skew the numbers. It does seem like One way to think about this is that we should decide if we think internal or external iteration is the right protocol. Internal iteration is "easier" for the iteratee. It's just given a closure and it has total control over how it walks itself and invokes the closure. It doesn't need to return control back to the thing doing the spread until all of the iteration is done. That means it can effectively use the top of the callstack as its own local state storage. With any kind of external iteration, the iteratee has to return to the iterator. That means in order to pick up where it left off, that state has be reified (and thus usually allocated in Dart) somewhere. In return for this, the iterator gets more flexibility. It can choose to stop iterating, resume later, etc. In the case of spread, we don't need any of that flexibility. Every spread always wants to iterate over the spreadee exactly once, to completion. So there's an argument is that internal iteration is the principled choice. That implies using Cases where the thing you are spreading is a List are interesting. In that special case, the only state you need for an external iterator is an integer, so it tends to be quite fast. On the other hand, the Thoughts? |
I'm leaning towards internal iteration for spreads, for the reasons mentioned. The collection being spread is the only one who can truly know how best to iterate it. Anything we specify will be sub-optional in some cases, unless we specify nothing (and then pick internal iteration as implementation). That said, we have iteration in the language with An issue with internal iteration is that the only abstraction for executable code we have is closures, so we have to allocate a closure when using So, in summary:
Can we live with breaking user code if we change an unspecified behavior? In practice, all lists will be platform lists, so just silently special-casing those will likely be enough. |
Here's numbers for the other main platforms. dart2js is crushingly slow if you don't do dart2js -O4, all three map types
dart2js -O4, just LinkedHashMap
dart2js -O4, just SplayTreeMap
dart2js -O4, just HashMap
AoTAoT, all three map types
AoT, just LinkedHashMap
AoT, just SplayTreeMap
AoT, just HashMap
I'm going to go through the list benchmark too and try to consolidate this into some single set of numbers that are hopefully actionable. |
I'm not sure if flutter webs apps will be able to use |
@munificent Have you looked at memory usage difference between various methods? I'd be curious what objects end up allocated on the heap during iteration. In |
@jonahwilliams I want to say we use |
I haven't, but I don't think memory usage should differ that much. In general, we expect most spreads will be fairly small. On large spreads, you may churn through a bunch of MapEntry objects, but modern GCs handle short-lived objects pretty well. I did some more benchmarking and investigation and wrote up my thoughts. We discussed this more in the language meeting. The decision is to keep specifying this in terms of As always, if an implementation detects that the object being spread is a known built-in type (which is almost always the case for maps), it's free to do some lower-level access to the map entries since it knows that there won't be any user-visible side effects. |
The current specification for #46 defines spread of a Map in terms of the Iterable protocol. This seems unnecessarily inefficient, as compared to just iterating over the keys (avoiding allocating the
MapEntry
objects).Even better would be to call an
addAllTo
method passing the the receiver map as an argument, thereby avoiding allocating the key iterator as well, but that would be a breaking change to the map API.cc @lrhn @munificent @rakudrama
The text was updated successfully, but these errors were encountered: