Skip to content

Should spread of a List use the Iterable protocol or direct iteration? #208

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
leafpetersen opened this issue Feb 6, 2019 · 12 comments
Closed

Comments

@leafpetersen
Copy link
Member

leafpetersen commented Feb 6, 2019

The current proposed semantics given by the spread collections proposal specify spreading into a list using the Iterable protocol. So we have that

foo(List<int> x) => [...x];

de-sugars to (roughly):

foo(List<int> x) {
  var tmp = <int>[];
  for(var e in x) {
    tmp.add(e);
  }
  return tmp;
}

We could, in the case where we are statically spreading something of type List, use a direct loop as follows:

foo(List<int> x) {
  var len = x.length;
  var tmp = List(len);
  for(int i = 0; i < len; i++) {
     tmp[i] = x[i];
  }
  return tmp;
}

This is likely to be significantly more efficient, but is not a valid optimization that a compiler can do in the absence of extra information about the implementation type of the List.

cc @munificent @lrhn @ferhatb

@leafpetersen
Copy link
Member Author

cc @rakudrama

@rakudrama
Copy link
Member

I assume you mean [...x].

In the general case, you don't know the starting position, and the result needs to be growable, so the indexing version is incorrect in allocating a fixed length list and in assuming the length is easily known and the indexes in the source and target match. dart2js would need to figure out that x was a JavaScript Array to really benefit from indexing, otherwise the indexing would be less efficient that you are assuming.

I think desugaring to something like foo(List<int> x) => []..addAll(x) is probably best.
It is concise, and since it is a single operation, it could be easily recognized and replaced by another single operation that contains the streamlined loop shared by all JSArray-into-JSArray spreads.
It would be better to have one loop that is hot enough to get jitted rather than 1000 loops that never get jitted. Integrating the optimization with that of JSArray.addAll would help getting the loop hot by sharing code.

I don't see why on ES6 we can't compile this to, well, [...x], but that would be difficult to recognize if the operation is desugared to any kind of loop. Let the JS engine do that :-).

Lowering to a single operation would also aid in special case recognition, e.g. of splicing into the start of a list, which might be compiled to something like JavaScript's x.slice(0).

@lrhn
Copy link
Member

lrhn commented Feb 6, 2019

Native lists are not really the problem here. If you know that you have a native list, you also know the behavior of the iterator, and you can implement it any way you want that is indistinguishable from iterating (and most approaches are, because you know the list operations do not have side effects).

So, the only place where this matters is when the object is, at List that is not known to be a native list.
Should we then use index-based iteration instead of iterator-based iteration?
(And should it be based on static type or dynamic type?)

I was aware of the issue when writing the spec, but decided that it was best to keep things simple.
I see five options:

  1. Specify that you use the iterator in a specified way (what we did)
  2. Specify that if the static type of the object is a list, use index-iteration in a specified way.
  3. Specify that if the run-time type of the object is a list, use index-iteration in a specified way.
  4. Specify that if the run-time type of the object is a list, the implementation may use either index-iteration or iterator-iteration (both specified), on a per iteration basis.
  5. Do not specify how "the elements" of the iterable are accessed (except that it must be in iteration order).

The "iteration order" is generally defined as the order the iterator provides, but for Lists, it's also stated that iteration order must be index order.

When a specification doesn't say which user-declared members it accesses, we have unspecified behavior. Different implementations may act differently on the same code, and the best suggestion to avoid that is to not use the feature. For all we know, it might use length and elementAt. Or repeated skip(n).first. So, unspecified behavior is bad, and option 5 is definitely out.

Option 2 would require using length/[] on all objects that are statically known to be lists. That's efficiently detectable, and users can predict it. If they know their list has lousy index efficiency, they can cast it to Iterable and default to iterator-iteration.

Option 3 would require using length/[] on all lists, independent of static type. That adds an extra type check and larger run-time code, because it is based on the run-time code (we need both branches if the static type is Iterable, and heck, even if it's Set or Queue, because some class might implement that and List).
It may or may not be faster depending on the custom List implementation (again, if it's just native lists we optimize, the spec already allow that because it's indistinguishable from the specified behavior), and users cannot opt out.

Option 4 gives implementations the option of choosing either 1., 2. or 3. on a per location basis.
If this matters in any way, then users will program against the concrete implementation, and maybe get worse performance on a different platform.
And we might get locked into whatever implementation we choose, if some important customer would get a performance hit if we change it. Even if it is "unspecified", the implementation is still concrete.

The reason I decided to not do any of these is that it's consistent with for-in.
Our for-in specification uses iterator iteration for lists too.
You can write an index-based for-loop manually, but you can do that too for list literals.
A spread [...x] is really equivalent to [for (var v in x) v], and the index-based version of that would be:

[for (var i = 0; i < x.length; i++) x[i]]

(The only difference is that we allow spreads as constant expressions, but not the iteration that gives access to individual elements, but you shouldn't worry over the performance of constant expressions anyway).

So, spreads are specified in one simple way, and if you want something else, you do have an alterntaive option. I think that's sufficient.

@eernstg
Copy link
Member

eernstg commented Feb 6, 2019

Cf. #209 (comment), it might be good for the overall simplicity of the language if we could use forEach on the spread iterable, because forEach seems to be a good candidate approach for maps.

Cf. comment from Lasse, an argument against this could be that we use iteration as the semantics of for-in statements, so it's a matter of consistency among spreads vs. consistency among different occurrences of iterables.

Edit: @leafpetersen mentioned that there's strong evidence that forEach would be too costly, especially when the compiler cannot discern which concrete type of instance is the receiver. That would be a heavy argument, of course.

@munificent
Copy link
Member

I took a stab at writing a benchmark for a couple of different ways we could desugar spreads if we know the spread object is a List of some type.

On my Mac laptop, here are the numbers:

VM JIT
   0 iterate           13389.96 spreads/ms                
   0 List for          15849.30 spreads/ms   1.18x baseline
   0 resize and set    14079.84 spreads/ms   1.05x baseline
   0 addAll()          13562.42 spreads/ms   1.01x baseline
   0 forEach()         12967.78 spreads/ms   0.97x baseline

   1 iterate           12523.18 spreads/ms                
   1 List for          15012.68 spreads/ms   1.20x baseline
   1 resize and set    11265.20 spreads/ms   0.90x baseline
   1 addAll()          13005.04 spreads/ms   1.04x baseline
   1 forEach()         11296.44 spreads/ms   0.90x baseline

   2 iterate           12169.42 spreads/ms                
   2 List for          14370.12 spreads/ms   1.18x baseline
   2 resize and set    10352.88 spreads/ms   0.85x baseline
   2 addAll()          12269.00 spreads/ms   1.01x baseline
   2 forEach()         10241.02 spreads/ms   0.84x baseline

   5 iterate            8638.68 spreads/ms                
   5 List for          10240.64 spreads/ms   1.19x baseline
   5 resize and set     8791.70 spreads/ms   1.02x baseline
   5 addAll()           8946.10 spreads/ms   1.04x baseline
   5 forEach()          7005.48 spreads/ms   0.81x baseline

  10 iterate            7057.00 spreads/ms                
  10 List for           8595.82 spreads/ms   1.22x baseline
  10 resize and set     6346.50 spreads/ms   0.90x baseline
  10 addAll()           8551.22 spreads/ms   1.21x baseline
  10 forEach()          5179.32 spreads/ms   0.73x baseline

  20 iterate            4295.82 spreads/ms                
  20 List for           4298.04 spreads/ms   1.00x baseline
  20 resize and set     4141.14 spreads/ms   0.96x baseline
  20 addAll()           6194.86 spreads/ms   1.44x baseline
  20 forEach()          2970.70 spreads/ms   0.69x baseline

  50 iterate            2040.42 spreads/ms                
  50 List for           2579.76 spreads/ms   1.26x baseline
  50 resize and set     2297.06 spreads/ms   1.13x baseline
  50 addAll()           3561.36 spreads/ms   1.75x baseline
  50 forEach()          1337.58 spreads/ms   0.66x baseline

 100 iterate            1096.14 spreads/ms                
 100 List for           1429.78 spreads/ms   1.30x baseline
 100 resize and set     1323.30 spreads/ms   1.21x baseline
 100 addAll()           2151.06 spreads/ms   1.96x baseline
 100 forEach()           712.14 spreads/ms   0.65x baseline

1000 iterate             137.22 spreads/ms                
1000 List for            171.46 spreads/ms   1.25x baseline
1000 resize and set      149.74 spreads/ms   1.09x baseline
1000 addAll()            244.90 spreads/ms   1.78x baseline
1000 forEach()            82.66 spreads/ms   0.60x baseline

dart2js on Node
   0 iterate             704.66 spreads/ms                
   0 List for            761.38 spreads/ms   1.08x baseline
   0 resize and set      671.92 spreads/ms   0.95x baseline
   0 addAll()            659.94 spreads/ms   0.94x baseline
   0 forEach()           679.60 spreads/ms   0.96x baseline

   1 iterate             405.86 spreads/ms                
   1 List for            486.10 spreads/ms   1.20x baseline
   1 resize and set      485.68 spreads/ms   1.20x baseline
   1 addAll()            449.30 spreads/ms   1.11x baseline
   1 forEach()           436.88 spreads/ms   1.08x baseline

   2 iterate             253.70 spreads/ms                
   2 List for            394.66 spreads/ms   1.56x baseline
   2 resize and set      375.46 spreads/ms   1.48x baseline
   2 addAll()            371.50 spreads/ms   1.46x baseline
   2 forEach()           374.62 spreads/ms   1.48x baseline

   5 iterate             143.12 spreads/ms                
   5 List for            243.54 spreads/ms   1.70x baseline
   5 resize and set      231.36 spreads/ms   1.62x baseline
   5 addAll()            239.58 spreads/ms   1.67x baseline
   5 forEach()           230.06 spreads/ms   1.61x baseline

  10 iterate              83.72 spreads/ms                
  10 List for            147.12 spreads/ms   1.76x baseline
  10 resize and set      143.60 spreads/ms   1.72x baseline
  10 addAll()            146.92 spreads/ms   1.75x baseline
  10 forEach()           140.48 spreads/ms   1.68x baseline

  20 iterate              44.46 spreads/ms                
  20 List for             81.66 spreads/ms   1.84x baseline
  20 resize and set       81.88 spreads/ms   1.84x baseline
  20 addAll()             81.32 spreads/ms   1.83x baseline
  20 forEach()            79.30 spreads/ms   1.78x baseline

  50 iterate              18.52 spreads/ms                
  50 List for             34.52 spreads/ms   1.86x baseline
  50 resize and set       35.34 spreads/ms   1.91x baseline
  50 addAll()             35.56 spreads/ms   1.92x baseline
  50 forEach()            34.52 spreads/ms   1.86x baseline

 100 iterate               9.26 spreads/ms                
 100 List for             17.22 spreads/ms   1.86x baseline
 100 resize and set       17.72 spreads/ms   1.91x baseline
 100 addAll()             18.24 spreads/ms   1.97x baseline
 100 forEach()            17.60 spreads/ms   1.90x baseline

1000 iterate               0.95 spreads/ms                
1000 List for              1.80 spreads/ms   1.89x baseline
1000 resize and set        1.88 spreads/ms   1.98x baseline
1000 addAll()              1.86 spreads/ms   1.96x baseline
1000 forEach()             1.80 spreads/ms   1.89x baseline

AoT
   0 iterate           11671.52 spreads/ms                
   0 List for          13153.62 spreads/ms   1.13x baseline
   0 resize and set    11962.50 spreads/ms   1.02x baseline
   0 addAll()           9852.48 spreads/ms   0.84x baseline
   0 forEach()         11260.96 spreads/ms   0.96x baseline

   1 iterate            8810.60 spreads/ms                
   1 List for          11243.84 spreads/ms   1.28x baseline
   1 resize and set     9203.44 spreads/ms   1.04x baseline
   1 addAll()           7621.98 spreads/ms   0.87x baseline
   1 forEach()          9583.36 spreads/ms   1.09x baseline

   2 iterate            7711.44 spreads/ms                
   2 List for          10388.96 spreads/ms   1.35x baseline
   2 resize and set     8001.64 spreads/ms   1.04x baseline
   2 addAll()           7592.64 spreads/ms   0.98x baseline
   2 forEach()          8424.48 spreads/ms   1.09x baseline

   5 iterate            4318.00 spreads/ms                
   5 List for           6281.92 spreads/ms   1.45x baseline
   5 resize and set     6246.58 spreads/ms   1.45x baseline
   5 addAll()           4631.00 spreads/ms   1.07x baseline
   5 forEach()          5101.42 spreads/ms   1.18x baseline

  10 iterate            2768.04 spreads/ms                
  10 List for           4667.94 spreads/ms   1.69x baseline
  10 resize and set     4078.04 spreads/ms   1.47x baseline
  10 addAll()           3976.26 spreads/ms   1.44x baseline
  10 forEach()          3773.82 spreads/ms   1.36x baseline

  20 iterate            1533.74 spreads/ms                
  20 List for           2460.72 spreads/ms   1.60x baseline
  20 resize and set     2673.56 spreads/ms   1.74x baseline
  20 addAll()           2639.84 spreads/ms   1.72x baseline
  20 forEach()          1975.64 spreads/ms   1.29x baseline

  50 iterate             716.82 spreads/ms                
  50 List for           1167.02 spreads/ms   1.63x baseline
  50 resize and set     1240.78 spreads/ms   1.73x baseline
  50 addAll()           1272.58 spreads/ms   1.78x baseline
  50 forEach()           900.34 spreads/ms   1.26x baseline

 100 iterate             375.16 spreads/ms                
 100 List for            625.08 spreads/ms   1.67x baseline
 100 resize and set      746.50 spreads/ms   1.99x baseline
 100 addAll()            690.42 spreads/ms   1.84x baseline
 100 forEach()           511.24 spreads/ms   1.36x baseline

1000 iterate              42.58 spreads/ms                
1000 List for             72.28 spreads/ms   1.70x baseline
1000 resize and set       79.04 spreads/ms   1.86x baseline
1000 addAll()             73.46 spreads/ms   1.73x baseline
1000 forEach()            56.76 spreads/ms   1.33x baseline

The in-progress code for this is: #210

@eernstg
Copy link
Member

eernstg commented Feb 7, 2019

Apart from the general approach (iterate, List for, etc.), one thing that we might want to reserve the option to do is to build sets from existing sets with improved performance:

Set<T> s1 = e; // Non-trivial expression, implementation may not be known.
var s2 = {...s1, some, extra, stuff};

The point is that it may actually be a common situation that a set is created by taking an existing set and adding a few extras to it, and in this situation we can transfer elements without any checks for having each element already: s2 is empty when we start adding elements from s1, and for each element o from s1 we already know that o does not belong to s2 at that point, because s1 is also a set.

Developers would need to know that it's much better to write {...s1, some, extras} than {some, extras, ...s1}, but that could presumably be a lint.

The new set s2 will be a platform class, so we can do whatever we want. In particular, we would use a special "secret" method to add elements in a way where there is no check for whether that element is already in the set. For native-to-native on the VM, we might be able to use memcpy, for the general case (user_written-to-native) we would at least be able to use such a low-level add method (which is only safe because we know more than we otherwise do during add).

@lrhn
Copy link
Member

lrhn commented Feb 7, 2019

We could say that we use addAll, but the way that the specification is currently written, it tries not to depend on the implementation of the created map. You execute a member expression in order to introduce a number of elements to the created collection, but we don't say how those elements get put into the collection (we can use non-public API if that's easier).

Saying that we use LinkedHashSet.addAll or _GrowableList.addAll seems restrictive, because that actually does specify which operations are performed on the spreadee.

I'd consider "use addAll" as a specification as equivalent to allowing anything, because we can change the behavior of addAll at any time.

So, the most efficient solution is probably to allow any implementation which accesses elements in iteration order, and to use addAll as the implementation.

@eernstg
Copy link
Member

eernstg commented Feb 8, 2019

"use addAll" as a specification as equivalent to allowing anything

addAll is really attractive because its receiver is new the data structure that we're building, and that allows us to generate arbitrary code (nobody can know that this isn't already part of the implementation of the behavior of addAll on a native implementation).

But instead of considering this to 'allow anything' we should probably rely on the documentation of the method (e.g., https://api.dartlang.org/stable/2.1.0/dart-core/List/addAll.html), because that's already exactly what developers have today.

We might then want to specify the addAll method in the language specification starting off with those core lib dartdocs, if we want to ensure that the language specification is more self-contained.

@leafpetersen
Copy link
Member Author

My general take is that there is no real user value in requiring spreading an Iterable and a List to both use the same iteration protocol. There's a small specification simplicity benefit, and a small implementation benefit. There's also a small performance cost. On balance, I'd rather provide the user the performance benefit. If I was writing the code by hand, I'd use the methods from the List protocol for lists. I might use a for in loop since it's easy, but to me that's more an argument that for in should have been specified to use the List protocol if iterating over a list (except that of course that didn't work for Dart 1).

Using .addAll, as I commented in the Map issue, is really just another way of either specifying using a particular protocol (the one currently used by .addAll), or of saying that the behavior is undefined.

I think overall, I would prefer that we just use the obvious list iteration loop as the specification for how we spread a List. This code can then be put into a private method on our internal list representation if desired (_internalAddAll), or hoisted into a helper function, or emitted inline, or whatever.

@munificent
Copy link
Member

Note that using forEach() gets you most of the way to the performance of using [], and also works for all Iterables, not just Lists. Except on the VM where it's unfortunately slower than for-in. :-/

@munificent
Copy link
Member

Here are some updated numbers for dart2js using -O4, which disables runtime checks on primitives and implicit casts. That's unsound, but seems to be how most deployed dart2js apps are compiled:

   0 iterate           15958.42 spreads/ms                  
   0 List for          17137.86 spreads/ms   1.07x baseline 
   0 resize and set     6746.54 spreads/ms   0.42x baseline 
   0 addAll()          13356.88 spreads/ms   0.84x baseline 
   0 forEach()         14547.32 spreads/ms   0.91x baseline 

   1 iterate           10337.94 spreads/ms                  
   1 List for          11671.70 spreads/ms   1.13x baseline 
   1 resize and set     4968.06 spreads/ms   0.48x baseline 
   1 addAll()          10043.20 spreads/ms   0.97x baseline 
   1 forEach()         13332.14 spreads/ms   1.29x baseline 

   2 iterate            9025.48 spreads/ms                  
   2 List for          10053.64 spreads/ms   1.11x baseline 
   2 resize and set     4947.78 spreads/ms   0.55x baseline 
   2 addAll()           8785.94 spreads/ms   0.97x baseline 
   2 forEach()         12771.96 spreads/ms   1.42x baseline 

   5 iterate            6596.02 spreads/ms                  
   5 List for           7382.40 spreads/ms   1.12x baseline 
   5 resize and set     4601.36 spreads/ms   0.70x baseline 
   5 addAll()           6252.78 spreads/ms   0.95x baseline 
   5 forEach()         11482.30 spreads/ms   1.74x baseline 

  10 iterate            4359.76 spreads/ms                  
  10 List for           5005.16 spreads/ms   1.15x baseline 
  10 resize and set     4362.68 spreads/ms   1.00x baseline 
  10 addAll()           4151.52 spreads/ms   0.95x baseline 
  10 forEach()          9977.88 spreads/ms   2.29x baseline 

  20 iterate            2364.44 spreads/ms                  
  20 List for           2680.28 spreads/ms   1.13x baseline 
  20 resize and set     3028.66 spreads/ms   1.28x baseline 
  20 addAll()           2271.46 spreads/ms   0.96x baseline 
  20 forEach()          5965.52 spreads/ms   2.52x baseline 

  50 iterate            1080.76 spreads/ms                  
  50 List for           1293.62 spreads/ms   1.20x baseline 
  50 resize and set     2259.54 spreads/ms   2.09x baseline 
  50 addAll()           1055.38 spreads/ms   0.98x baseline 
  50 forEach()          3408.76 spreads/ms   3.15x baseline 

 100 iterate             575.62 spreads/ms                  
 100 List for            670.02 spreads/ms   1.16x baseline 
 100 resize and set     1482.80 spreads/ms   2.58x baseline 
 100 addAll()            531.80 spreads/ms   0.92x baseline 
 100 forEach()          1763.66 spreads/ms   3.06x baseline 

1000 iterate              60.38 spreads/ms                  
1000 List for             70.60 spreads/ms   1.17x baseline 
1000 resize and set      193.60 spreads/ms   3.21x baseline 
1000 addAll()             58.58 spreads/ms   0.97x baseline 
1000 forEach()           210.40 spreads/ms   3.48x baseline 

@munificent
Copy link
Member

I did some more benchmarking and investigation and wrote up my thoughts. We discussed this more in the language meeting. The decision is:

  • We will continue to specify it in terms of the Iterable protocol, but...

  • If the object being spread implements List, Queue, or Set, we state that an implementation may choose to call length if it wants.

  • If the object being spread implements List, we state that an implementation may use [] to access the elements instead of or in addition to the Iterable protocol.

We think this should give implementations enough freedom to optimize without adding much complexity to the spec.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants