-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Wanted: Strawman proposals for new collections architecture #818
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
It's not clear if Sets are considered part of "collections" (since they're left out), but a number of operations are questionable for sets, e.g. |
@nilskp This is answered by requirement (1): normal usage of collections should stay the same. So, yes, |
Some minor bits, which might be interesting despite all the fundamental objections I have to this:
|
@soc Agree that
|
@soc The collection size type is a separate issue. Important to discuss this too, I agree. |
@odersky, but it should be trivial to "cast" set to a collection that supports |
@nilskp How do you "cast"? But I think we should take this discussion elsewhere it is out of scope for the strawman discussion. Or, more productively: How about writing a strawman implementation that includes sets? Then we can see how that would work out. |
Not realizing I was mapping a Set has tripped me on a number of occasions too. Everyone else does it too, though (Haskell, Python). Haskell at least has a special comment on I find Edit: But you are right that this is besides the point for now. Moving on. One point I'd like to add is that it would be nice if it was easy to create one's own abstract collection operations while retaining the type of the concrete collection. Last time I tried this it seemed relatively complicated. Though I do realise that it's not an easy issue to begin with. |
@machisuji I think the issue with Set will only be a problem, if we keep trying to pretend that every intermediate operation is a Set. Given that we know all the drawbacks of that, like:
|
I'd like to explore the idea that a collection is data, and then you apply transformations to it, and transformations can be composed separately from the data (think: transducers), macros could them be used to very efficiently inline these operations. |
@odersky, I meant something like
The fact that |
An AnyVal Option type would be preferred for collections as well. (think Map.get etc) |
We probably want mutable collections to not have the same API as the immutable ones. One large problem with the current mutable collections is that by inheriting all the immutable methods, it's neigh impossible to figure out where the "useful" mutable methods are. e.g. By that logic,
What's the advantage of Views over using Iterators? That's what I've been using when I want to fuse map/filter/flatmap/etc. operators into constant-memory and it seems to work. You don't accidentally force because the only forcing operations are relatively obvious (reduce, sum, fold, foreach, toXXX).
Should we add |
@lihaoyi Yes, having scala.collection.{Seq, Map, … } as a shared hierarchy between mutable and immutable is not beneficial IMO. Same goes for the deep embedding of Parallel Collections into the hierarchy. And The XOnce-things, if you need a XOnce you use an Iterator. |
@lihaoyi @viktorklang the current designs that we had a look into follow your proposals. The advantage is that most operations(map\flatmap\reduce\zip) do not care if underlying collection is mutable or not. |
@viktorklang @lihaoyi , what's your take on deep embedding of views into collections hierarchy? |
The way the old collections gloss over this is one of the paramount issues I would hope to see addressed in any new design. |
@DarkDimius With a transducer approach I don't see the need for views at all since the primary need for views is to avoid intermediate representations. |
@viktorklang In my understanding, |
@DarkDimius With a transducer approach there's (as far as I can tell) no need for Views—you'd apply the transducer of your choice to the collection of your choice at the time you need it. Did I misunderstand you? |
It's great to have the debates, but I think for the purpose of this issue, requirement (1) is non-negotiatable. If we really change fundamentally how users interact with collections, how do we even consider migration and backwards compatibility? It would be a completely separate discussion (which we might want to have). But on this issue I'd like to keep the discussion focussed. How about I open a thread on scala-debate and everybody chimes in there for stuff outside the requirements presented here? |
@odersky The way that virtually all modern platforms have done this in the past is to introduce a new library/package and include forwarders for the old library/package, is there a reason why this wouldn't be preferable in this case too? I'm assuming that there won't be enough time to arrive at a battle-proven new library unless there is a transitional period where one can fall back to the old library? |
That sounds great! In general, the high-level "interface" @odersky describes leaves out a lot of (what would be) crucial scaladoc. Sure, we have |
Here's the thread on scala-debate: https://groups.google.com/forum/#!topic/scala-debate/Z1YH_0Hgu5A All discussions that fall outside requirements 1-9 should go there. I would like to reserve this issue for concrete proposals that try to address the requirements as outlined. |
@SethTisue your comment seems at odds with goal 3. Particularly in terms of making methods total which simplifies things significantly. |
I think it may be possible to model non-emptiness in the types while still adhering to "Normal usage of collections should stay the same. That is, the standard operations on collections should still be available under the same names" by leaving the partial methods in place but with a deprecation warning. I won't be sure until I try, but I will try it once I get time to write some code. |
I think you are misunderstanding point 3. Point 3 seems to be mainly about keeping the API simple, and not about subjective use cases of simplicity from modifying the API for reasons of totality (or other reasons). Simple is not the same thing as correctness, or totality. They are orthogonal concepts
Not sure how this can work, as far as I am aware, you can't have methods with the exact same parameter signatures return different types (i.e. |
I think you meant you cannot have methods with the same parameter signatures but different return types. That's true you can't overload that way. You can have a differently named method though, for example we already have headOption in addition to head. I'd rather not write English about it but Scala, but the way I modeled "inhabitedness" (to give a more positive name for non-emptiness) is to make a subtype. The subtype has a head method, the supertype does not, but if this were ever to happen in the standard library, there would be a long deprecation period in which the supertype still had head, tail, and reduce. I did a proof-of-concept for just head on Set here: That's inside an instance that captures an Equality typeclass, which also won't work for the standard library, but the subtype technique might still work. I won't know until I try it. It may turn out to not work, and even if it does work, it may be not be appropriate for the standard library. Still I will try it and see where it leads. One thing I haven't tried, which I'm looking forward to trying, is making a List where the Cons cell subtype has a type alias inhabited.List, and that's how it works on List. So I think for List it might actually be pretty practical even in the standard library. But for everything else, you'd end up with an extra type for everything, and that's a lot of extra surface area. Maybe it could just be List that gets it in the standard library. Anyway, already too much English. Not enough Scala. |
Yup, just edited the post now, my bad! |
Actually you can, using implicits and dependant types. Imagine something like trait FailureStyle[T] {
type Result
def pass(t: T): Result
def fail: Result
}
def gimme[T](implicit f: FailureStyle[T]): f.Result = ???
// and an implicit like one of these in scope
implicit def throws[T] = new FailureStyle[T] { type Result = T; …}
implicit def option[T] = new FailureStyle[T] { type Result = Option[T]; …}
implicit def returnNull[T >: Null] = new FailureStyle[T] { type Result = T; …} |
Good point. Jon Pretty has something similar to the failure modes in your example in Rapture. Here's another example of that sort of technique: The result type of map depends on which |
The lack of correctness or totality makes things more complicated (or less simple). This is easy to demonstrate. Take I suggest you watch Ken Scambler's excellent video for a more detailed description of why partiality increases complexity https://www.youtube.com/watch?v=yS4Dbb1KfmA. Or Philip Wadler's Propositions as Types https://www.youtube.com/watch?v=IOiZatlZtGU on why totality matters for types. |
Your stating this as an axiom when it isn't one
Using your logic, List should return a disjunction of type Either, because its theoretically possible for it to fail due to memory constraints. So according to you, making all List operations have something of type You are arguing about correctness, not simplicity. There isn't a one to one correlation between them. You can make things so simple that they are very hard to get correct (assembly), and you can emphasis correctness so much that things are no longer simple (try writing a generic program in In any case, this is getting to be off topic. The scope for the SLIP has already been defined, and you are redefining simple as correct |
I don't think that's the kind of strawman meant in "Wanted: Strawman proposals". |
Of course not, but than you can argue the same thing about @jedws proposal. The point being made is that correctness is not the same thing as simple, and they are not directly comparable, which is addressing point 3, i.e.
|
No mate. I honestly can't find a way to interpret @jedws's points as strawman. It's a valid point and I disagree with this false dichotomy you're trying to draw. It's not true that correctness will always prevent simplicity. It doesn't have to be the case here either. |
This is a strawman, I was never making a false dichotomy. My point is that an excessive use of correctness can actually harm simplicity, not help improve it.
Don't disagree with this, but I was never making this point... |
No, within any logic there are some things that are taken as axiomatic. Within computing, appropriate resource availability and correctness of underlying hardware and environment is usually one of them (unless you're building OSs). Your argument mistakenly conflates non-deterministic runtime problems, with things that are deterministically provable and essentially says, "well, you can't prove all the things, so bugger it, don't bother proving anything". I am not averse to this argument, it is essentially the argument against type systems, but don't think it has much place within the design of strongly typed APIs. The central point remains, introducing non-determinism, or falsity, or incorrectness, does actually increase complexity. |
Please take this discussion to another forum such as scala-debate. |
As somebody who writes C# at work and Scala at home, there are some conveniences in the .NET collection API that I miss in Scala. I realize that most of the discussion here is related to the implementation details of the collections library, whereas my comments are more concerned with the user API of the collections library. Still, maybe these thoughts will be useful. Suppose I have a list of pairs (to be interpreted as key/value pairs). I'd like to group the values by their corresponding keys, with the understanding that there may be duplicate keys. In Scala, I would do that like this (unless there's a simpler way that I'm not seeing):
Or, to be more verbose but perhaps also more efficient, I might do this:
Whereas in C#, I could use an overload of GroupBy do this:
The .Net library realizes that simultaneous mapping while grouping is a common use case, so they provide a way to do that. The C# version feels far more ergonomic to me than either Scala snippet. Another nice feature of the .Net collection API is that all methods that iterate the collection with a callback have overloads whose callback function takes the current index. So while I would need to do this in Scala:
I could do this in C#
It's easy enough for Select, SelectMany, and friends to also track the index while iterating, so they provide overloads exposing that functionality. The .Net collection API includes many such conveniences. It's clear that the designers put a lot of thought into it. And though having methods take multiple function parameters seems like a questionable idea, in practice, it works out surprisingly well. I realize that I'm arguing for convenience, which goes somewhat against goal #3, above. I also realize that the sort of convenience methods that I'm talking about could be implemented by a third-party library. But I believe that the Scala collection library would be better for having these conveniences built-in. |
It sounds interesting (and I've run into the first problem myself), but that's also more in scope for the scala-debate discussion. (Also, I think in Scala you'd want to have different method names — IIRC if you overloaded |
As a reference or Joke: Collection Development Philosophy (web.library.yale.edu/policy/collection-development-statements ) |
I really expect LazyList or Stream add these powerful and convenient methods : |
I can't figure out where this discussion ended up on transducers (mentioned first above by @viktorklang ), but it seems like it wouldn't be hard to compose |
Closing since this discussion has gone stale. Discussions related to the main collection strawman proposal should go in https://github.com/scala/collection-strawman. More general discussions should probably move to https://contributors.scala-lang.org . |
A redesign of the standard library is on the roadmap for one of the next Scala versions (could be as early as 2.13). Some requirements for the redesign should be:
on collections should still be available under the same names. Some more exotic and advanced
scenarios such as
breakOut
can be dropped if alternative ways to achieve the same functionalityexist.
class problem, where too much gets inherited automatically.
such as lists.
parallel collections.
++
onArrayBuffer
s should be called even if the static types of the operands of++
areIterable
s.whole program optimizations.
To gain experience it would be good to experiment with some prototypes that illustrate possible designs and can point to strengths and weaknesses. A full collections design is too heavy to fill this role. It takes too long to write and understand. As an alternative, I propose to implement just a subset of the required functionality. The subset should implementable in about 500 LOC or less and at the same time should be as representative as possible of the whole collections. The API should be the same for all strawman proposals.
After some experimentation, I came up with the following proposal for the API to be implemented by strawman proposals. There's still some modest room to add things if people feel it is important.
Base Traits
For now we leave out maps and sets (which does not say they are not important). We gloss over the mutabile/immutable distinction. All collections are in the same package (this is only for the strawman, to keep things simple, not for the full collections, of course).
Iterable
andSeq
are considered collection types, butIterator
is not.Collection Implementations
List
is a simplified version of Scala lists. It's an immutable collection favoring linear access patterns.List
operations should be tail-recursive, hence the addition ofListBuffer
to hold intermediate results.ArrayBuffer
is a mutable collection that supports random access.String
should demonstrate how the architecture integrates externally defined types.String
should be seen by Scala as an immutable random access sequence ofChar
. Finally, views should be constructible over all other collections. They are immutable and lazy.List
,ListBuffer
andArrayBuffer
should have companion objects that let one construct collections given their elements in the usual way:Collection operations
The following operations should be supported by all collections.
foldLeft
andfoldRight
are the basis of all sorts of aggregations.isEmpty
andhead
exemplifytests and element projections.
iterator
andview
construct iterators and views over a collection.collect
is not part of the current Scala collections, but is useful to support views and Java 8 streams. It should construct a new collection of a given type from the elements of the receiver. An invocation pattern ofto
could be either of the following:Strawman collections also should support the following transforms:
map
andflatMap
together support all monadic operations.++
andzip
are examples of operations with multiple inputs.drop
was chosen as a typical projection that yields a sub-collection.partition
was chosen in additionfilter
because it exemplifies transforms with multiple outputs.Strawman
Seq
s (but not necessarilyIterable
s orView
s) should also support the following operationsSequences in a way are defined by their
length
and their indexing methodapply
.indexWhere
is an example of a method that combines indices and sequences in an interesting way.reverse
is an example of a transform that suggests an access pattern different from left-to-right.ArrayBuffer
andListBuffer
should in addition support the following mutation operationsRequired Optimizations
There are a number of possible specializations that the strawman architecture should support. Here are
two examples:
xs.drop(n)
on a listxs
should take time proportional ton
, the retained part of the list should not be copied.xs ++ ys
on array buffersxs
andys
should be implemented by efficient array copy operations.s1 ++ s2
on strings should use native string concatenation.partition
should require only a single collection traversal if the underlying collection is strict (i.e., not a view).These specializations should be performed independently of the static types of their arguments.
Why No Arrays?
Collections definitely should support arrays as they do now. In particular, arrays should have the same representation as in Java and all collection operations should be applicable to them. Arrays are left out of the strawman requirements because of the bulk their implementation would add. Even though no explicit implementation is demanded, we should still check all designs for how they would support arrays.
The text was updated successfully, but these errors were encountered: