Wanted: Strawman proposals for new collections architecture

A redesign of the standard library is on the roadmap for one of the next Scala versions (could be as early as 2.13). Some requirements for the redesign should be:
1. Normal usage of collections should stay the same. That is, the standard operations
   on collections should still be available under the same names. Some more exotic and advanced
   scenarios such as `breakOut` can be dropped if alternative ways to achieve the same functionality
   exist.
2. User-defined implementations of collections should port to the new library as far as is reasonable. We should allow some breakage here, if it is necessary to achieve other goals.
3. We should strive for simplicity in APIs and implementations.
4. We should strive to better separate interfaces from implementation and avoid the fragile base
   class problem, where too much gets inherited automatically.
5. We should try to simplify the inheritance graphs, in particular those of often-used collections
   such as lists.
6. We should improve the integration of strict and lazy collections by means of a better architecture for views. Views should avoid accidental forcing, and should omit transforms from collections that require such forcing. However, forcing is still needed to support aggregations. 
7. We should try to integrate Java8 streams and/or have our own high-performance implementations for
   parallel collections.
8. We should generally be at least on par with current collections in what concerns efficiency. In particular, we should still allow specializations of collection operations for particular implementations. These optimizations should still work if the static collection type is abstract. E.g. an optimized implementation of `++` on `ArrayBuffer`s should be called even if the static types of the operands of `++` are `Iterable`s. 
9. The design should be friendly to optimizations such as inlining and, possibly, more advanced
   whole program optimizations.

To gain experience it would be good to experiment with some prototypes that illustrate possible designs and can point to strengths and weaknesses. A full collections design is too heavy to fill this role. It takes too long to write and understand. As an alternative, I propose to implement just a subset of the required functionality. The subset should implementable in about 500 LOC or less and at the same time should be as representative as possible of the whole collections. The API should be the same for all strawman proposals. 

After some experimentation, I came up with the following proposal for the API to be implemented by strawman proposals. There's still some modest room to add things if people feel it is important.
## Base Traits

```
Iterator
Iterable
Seq
```

For now we leave out maps and sets (which does not say they are not important). We gloss over the mutabile/immutable distinction. All collections are in the same package (this is only for the strawman, to keep things simple, not for the full collections, of course).

`Iterable` and `Seq` are considered collection types, but `Iterator` is not.
## Collection Implementations

```
List                    
ListBuffer   
ArrayBuffer          
java.lang.String  
View                 
```

`List` is a simplified version of Scala lists. It's an immutable collection favoring linear access patterns.
`List` operations should be tail-recursive, hence the addition of `ListBuffer` to hold intermediate results. `ArrayBuffer` is a mutable collection that supports random access. `String` should demonstrate how the architecture integrates externally defined types. `String` should be seen by Scala as an immutable random access sequence of `Char`. Finally, views should be constructible over all other collections. They are immutable and lazy.

`List`, `ListBuffer` and `ArrayBuffer` should have companion objects that let one construct collections given their elements in the usual way:

```
List(1, 2, 3)
ListBuffer("a", "bc")
ArrayBuffer()
```
## Collection operations

The following operations should be supported by all collections.

```
foldLeft
foldRight
isEmpty
head
iterator
view
to
```

`foldLeft` and `foldRight` are the basis of all sorts of aggregations. `isEmpty` and `head` exemplify
tests and element projections. `iterator` and `view` construct iterators and views over a collection. `collect` is not part of the current Scala collections, but is useful to support views and Java 8 streams. It should construct a new collection of a given type from the elements of the receiver. An invocation pattern of `to` could be either of the following:

```
xs.to[List]
xs.to(List)
```

Strawman collections also should support the following transforms:

```
filter
map
flatMap
++
zip
partition
drop
```

`map` and `flatMap` together support all monadic operations. `++` and `zip` are examples of operations with multiple inputs. `drop` was chosen as a typical projection that yields a sub-collection. `partition` was chosen in addition `filter` because it exemplifies transforms with multiple outputs. 

Strawman `Seq`s (but not necessarily `Iterable`s or `View`s) should also support the following operations

```
length
apply
indexWhere
reverse
```

Sequences in a way are defined by their `length` and their indexing method `apply`. `indexWhere` is an example of a method that combines indices and sequences in an interesting way. `reverse` is an example of a transform that suggests an access pattern different from left-to-right. 

`ArrayBuffer` and `ListBuffer` should in addition support the following mutation operations

```
+=
++=
trimStart
```
## Required Optimizations

There are a number of possible specializations that the strawman architecture should support. Here are
two examples:
1. `xs.drop(n)` on a list `xs` should take time proportional to `n`, the retained part of the list should not be copied.
2. `xs ++ ys` on array buffers `xs` and `ys` should be implemented by efficient array copy operations. `s1 ++ s2` on strings should use native string concatenation.
3. `partition` should require only a single collection traversal if the underlying collection is strict (i.e., not a view).

These specializations should be performed independently of the static types of their arguments.
## Why No Arrays?

Collections definitely should support arrays as they do now. In particular, arrays should have the same representation as in Java and all collection operations should be applicable to them. Arrays are left out of the strawman requirements because of the bulk their implementation would add. Even though no explicit implementation is demanded, we should still check all designs for how they would support arrays. 


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Wanted: Strawman proposals for new collections architecture #818

Base Traits

Collection Implementations

Collection operations

Required Optimizations

Why No Arrays?

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Wanted: Strawman proposals for new collections architecture #818

Description

Base Traits

Collection Implementations

Collection operations

Required Optimizations

Why No Arrays?

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions