Skip to content

Records: Destructure superinterfaces could be subtype related #1278

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
eernstg opened this issue Oct 30, 2020 · 17 comments
Closed

Records: Destructure superinterfaces could be subtype related #1278

eernstg opened this issue Oct 30, 2020 · 17 comments
Labels
patterns Issues related to pattern matching.

Comments

@eernstg
Copy link
Member

eernstg commented Oct 30, 2020

Cf. https://github.com/dart-lang/language/blob/master/working/0546-patterns/records-feature-specification.md.

In the proposal, the type of a record with N positional fields (N <= 15) is a subtype of DestructureN<T1, ..., TN> for some types T1 ... TN. This introduces a special kind of width subtyping (records do not otherwise support width subtyping). So it's worth considering the price.

The use of types like Destructure2<int, int> will force the records to be heap allocated, but other types like Object could be used and have a similar effect already. However, it is probably possible to use a memory layout where access to the statically known number of fields can use fixed offsets, which means that the performance can be good except for the cost of the boxing operation itself.

However, it would probably not cause any additional performance issues to enable access based on any smaller number of positional fields than the actual number. This could be supported by declaring these classes as follows:

abstract class Destructure1<T0> {
  T0 get field0;
}

abstract class Destructure2<T0, T1> implements Destructure1<T0> {
  T1 get field1;
}

abstract class Destructure3<T0, T1, T2> implements Destructure2<T0, T1> {
  T2 get field2;
}

...

abstract class Destructure16<T0, T1, T2, ..., T15>
    implements Destructure15<T0, T1, T2, ..., T14> {
  T15 get field15;
}
@munificent
Copy link
Member

munificent commented Oct 30, 2020

TL;DR: Yes, I agree we could do this, in exactly the manner you propose. I'm currently leaning towards not.

In the proposal, the type of a record with N positional fields (N <= 15) is a subtype of DestructureN<T1, ..., TN> for some types T1 ... TN. This introduces a special kind of width subtyping (records do not otherwise support width subtyping). So it's worth considering the price.

I wouldn't consider that width subtyping because a record type only implements the one DestructureN interface whose arity matches its own.

The use of types like Destructure2<int, int> will force the records to be heap allocated, but other types like Object could be used and have a similar effect already.

Yes, I considered moving records outside of the object hierarchy, but I think unboxed values (of any type) should be an orthogonal feature that we may or may not do. It felt weird to me to have records not be subtypes of Object when numbers are, after all. If we're going to have unboxed non-Object values, I think we should consider that for all of the built-in types and do it as a separate feature.

However, it is probably possible to use a memory layout where access to the statically known number of fields can use fixed offsets, which means that the performance can be good except for the cost of the boxing operation itself.

That is a goal I was hoping to enable, so it's good to hear you confirm this. :)

However, it would probably not cause any additional performance issues to enable access based on any smaller number of positional fields than the actual number. This could be supported by declaring these classes as follows:

This is what I would consider to be width subtyping. I considered this, and it's not off the table, but I'm leaning against it. My intent is that the DestructureN interfaces is what we'll use to statically check pattern matching and destructuring. So, if you want to do, say:

var (a, b, c) = obj;

Then obj must implement Destructure3 for some set of type arguments. If Destructure3 is a subtype of Destructure4, then this would let you do:

var four = (1, 2, 3, 4);
var (a, b, c) = four;

I'm not deeply morally opposed to allowing destructuring to silently discard unmatched positional fields, but I'm leaning away from that. I think most of the time doing so would be an error. I also like having some amount of symmetry with positional function arguments/parameters where we consider it an error to pass more arguments than a function has positional parameters.

So, one way I look at the DestructureN interfaces (especially when a user-defined class deliberately declares it implements one) is that each one is a way to explicitly choose how many fields you can and must destructure from the object. Of course, a class could implement multiple DestructureN interfaces of different arities and that would let an instance of the class be destructured with any of those arities.

But for built-in records, and most classes, I think users will generally want to match all the fields and have the type checker tell them if they didn't. When they want to opt out of that and deliberately only match a subset of the positional fields, I'm hoping we can have some kind of ... syntax to allow that.

Also, I only consider this to be the case for positional fields. For named fields, I don't think users have an expectation of "completeness" when it comes to matching and for those I intend it to be more ad hoc and structural. You can match the named fields you want and ignore any others the value may expose, whether that value is a record or an instance of a user-defined class. I think that aligns with user intuition where positional parameters default to required in Dart, but named parameters default to optional.

@lrhn
Copy link
Member

lrhn commented Oct 30, 2020

A class can't implement a structural type (unless we special-case it and say that you can, obviously).

If the Destructure3 interface is what pattern matching will eventually be desugared to, then it means that a user class can implement Destructure3<int,int,int> and be destructurable as (int x, int y, int z) = instanceOfThatClass;.

It feels slightly inconsistent to me that you can do that for positional tuples, and ignore the named entries, but you can't ignore further positional entries. I don't particularly want a four-tuple (Destructure4) to implement three-tuple (Destructure3). That is, I disagree with this issue's request, and I think that (int, int, {int x}) being a subtype of Destructure2<int,int> is just as weird as Destructure3<int,int,int> being it.

I'm not deeply morally opposed to allowing destructuring to silently discard unmatched positional fields,

I am. Maybe not deeply, but I'd still prefer not to go there. It's too easy to hide a mistake if you start ignoring type-mismatches (it's exactly the same as the "overapplying function invocations" request).

Now, we could start introducing a type like (int, int, [int]) for a two-or-three-tuple. Or rather, let's not.

@munificent
Copy link
Member

munificent commented Oct 31, 2020

Could you explain the motivation behind Destructure interfaces? If the class wants to implement, say, Destructure3, woun't it be easier to just say:

Great question. My goal is to enable compilers to efficiently compile records. That means passing their fields in registers when possible instead of heap-allocating, and inlining field access. That is a lot harder when record types are a polymorphic interface that any class can implement. This is the same reason you can't implement String, num, etc.

So there are two levels of types in the proposal. Record types are the concrete types only inhabited by actual record objects. The Destructure types are the abstract interfaces that unify records and any user-defined class that wants to participate in positional pattern matching and destructuring. The Destructure types are directly analogous to (and inspired by) the numbered Product traits in Scala.

A class can't implement a structural type (unless we special-case it and say that you can, obviously).

cough cough call() methods cough cough. :)

@lrhn
Copy link
Member

lrhn commented Nov 1, 2020

A class can't implement a structural type (unless we special-case it and say that you can, obviously).

cough cough call() methods cough cough. :)

There was a reason we stopped saying that such classes implemented the function type. Now they're just implicitly convertible to a function type.
(There were several reasons, being able to implement a contravariant type was particularly irksome, but I wanted to separate functions and class instances independently of that).

@rrousselGit
Copy link

There was a reason we stopped saying that such classes implemented the function type. Now they're just implicitly convertible to a function type.
(There were several reasons, being able to implement a contravariant type was particularly irksome, but I wanted to separate functions and class instances independently of that).

Kind of off-topic but: I find it quite saddening that this change was made.
It allowed using is on functions:

class MyClass {
  void call() {}
}

void main() {
  void Function() example = MyClass();

  print(example is MyClass); // now always false, used to be true
}

That allowed APIs to provides both a simple syntax using closures and an advanced syntax using delegates, like:

object.addListener((value) => print(value));

object.addListener(
  Listener(
    onchange: (v) =>  print(v),
    onDispose: () => print('disposed'),
  ),
);

@lrhn
Copy link
Member

lrhn commented Nov 2, 2020

You could write that code, but you couldn't type it. The parameter type of addListener would still just be the function type, and then internally you'd do an is MyMagicInterface to do special casing of some "functions".
I think that is bad API design. Just have an addMyMagicInterfaceListener method too (with overloading, it would make more sense to accept both for addListener, but we don't have overloading, so you need a different method name). It's the same reason I have generally opposed a CancelableFuture which you'd pass through a Future parameter and maybe special-case at the end. It's too easy to get wrong since there is no type support.

@eernstg
Copy link
Member Author

eernstg commented Nov 2, 2020

@munificent wrote:

I wouldn't consider that width subtyping because a record
type only implements the one DestructureN interface whose
arity matches its own.

The reason why I called that width subtyping is that it would presumably cover arbitrary sets of named fields as well, cf. this section:

If a record type has positional fields, then it is a subtype of ... Destructure

There was the issue about dropping some positional fields by accident, and I agree that this would be a thing to look out for:

If Destructure3 is a subtype of Destructure4, then this would let you do:

var four = (1, 2, 3, 4);
var (a, b, c) = four;

I think users will generally want to match all the fields and have
the type checker tell them if they didn't
...
For named fields, I don't think users have an expectation of
"completeness" when it comes to matching and for those
I intend it to be more ad hoc and structural.

I like the fact that this creates some connections between records and actual argument lists, which would be helpful if we will support the use of records as argument lists (say, if var (a, b) = foo(); and then bar(a, b) could be expressed as something like bar(... foo())).

From this perspective it makes sense to flag situations where not all positional arguments are used. But to be consistent we should then also flag situations where not all named arguments are used — it is an error to pass a named argument which is not expected!

The other perspective (which is arguably more likely to match actual usage) is that a record is used as a value, that is, as an immutable object. From that perspective it is just a matter of normal subtyping that we are able to drop some positional (and named) fields: Destructure3<T1, T2, T3> <: Destructure2<T1, T2> is not that different from ColorPoint <: Point. So why wouldn't we flag upcasts whenever they involve width subtyping?

However, we can still ensure that a pattern declaration like var (a, b, c) doesn't admit dropping fields by considering the declared variable to have a concrete record type.(int, int, String) is a proper subtype of Destructure3<int, int, String>, so it is a downcast to initialize it with an expression whose static type is Destructure3<int, int, String>, which also implies that we can't initialize it from (2, 3, 'x', name: true): that's an upcast followed by a downcast, which means that we're initializing with an unrelated type, which is an error.

With this approach, a pattern can be used to destructure a concrete record type, and we won't drop positional fields. We can freely make the choice of allowing or disallowing named fields to be dropped.

This means that a certain amount of flexibility is lost, but we do get the effect that it is an error to initialize a pattern declaration with a record of a different shape.

We could choose an intermediate level of strictness where downcasts from Destructure3<T1, T2, T3> to (T1, T2, T3) are allowed for the initializing expression of a pattern of the form (_, _, _). This would still make it an error to upcast-then-downcast from Destructure4<T1, T2, T3, T4> or from (T1, T2, T3, T4, {T5 name}). But it would allow the use of a pattern to destructure the value of an expression whose type is DestructureN<...> into N components (no more, no less). This would allow patterns to destructure concrete record types plus DestructureN<...> types with a matching N.

I think that might work quite well.

@lrhn
Copy link
Member

lrhn commented Nov 2, 2020

The subtyping is arbitrary (at least slightly). You suggest that Destructure3<T1, T2, T3> <: Destructure2<T1, T2>, but that's just one possible projection. We could equally well say that Destructure3<T1, T2, T3> <: Destructure2<T2, T3>, picking the last elements instead of the first. (And yes, one feels more natural, most likely because we're used to read left-to-right ... and because it works well if the memory layout is also in declaration order, the you don't need to know the length to find the first two elements).

However, since Destructure-classes are supertypes of tuple types, we'd also get:
Destructure3<T1, T2, T3> <: Destructure2<T1, T2> <: (T1, T2, {T3 x}).
That feels iffy to me. If anything, I'd at least say that only tuples with no named entries can implement a Destructure interface.

@eernstg
Copy link
Member Author

eernstg commented Nov 2, 2020

@lrhn wrote:

picking the last elements instead of the first

I don't think that would be particularly natural. So the arbitrariness is very slight. ;-)

But the other example goes in the other direction, (T1, T2, {T3 x}) <: Destructure2<T1, T2>, so that wouldn't be an anomaly relative to Destructure3<T1, T2, T3> <: Destructure2<T1, T2>. (Also, it would be unsound to promise that there's a T3 x when we are looking at, say, a (T1, T2, T3), so the direction can't be reversed.)

@lrhn
Copy link
Member

lrhn commented Nov 2, 2020

True, got the subtyping order wrong. Any tuple with exactly two positional elements implements Destructure2.
And you're arguing that it could be "Any tuple with at least two ...." instead.
(And I'm arguing to drop the interfaces entirely. I prefer the C# approach to pattern matching general classes better than having them implement an interface with getter names field1 etc. Those names do not belong in a class API, they should be named after what they mean, and in a class, they are not just positional entries in a generic structural type).

@eernstg
Copy link
Member Author

eernstg commented Nov 2, 2020

you're arguing that it could be "Any tuple with at least two ...."

Right, I'm just exploring the options.

With the strict proposed model, Destructure2 abstracts over all record types with exactly 2 positional fields and any set of named fields. It would be completely useless if it wouldn't allow us to drop the named fields (then it's the concrete record type again), so we wouldn't want to make it even more strict.

But if we are allowed to drop the named fields using the Destructure... types then we might want to be able to drop any suffix of positional fields as well, especially if it has no extra performance implications, and it doesn't inherently introduce error-prone subtyping relationships. It looks like that could work, but we'd want to think about the practical value of doing this.

For the pattern matching declaration, we can of course have syntax that allows dropping a suffix of the list of positional fields and any subset of the named fields (so we don't actually need the Destructure... types in order to be able to specify how a pattern matching declaration can drop record fields):

var (a, b, c) = (1, 2, 3, 4, name: 5, name2: 6); // Error.
var (a, b, c, name2: d, ...) = (1, 2, 3, 4, name: 5, name2: 6); // OK.

@ds84182
Copy link

ds84182 commented Nov 2, 2020

Have you considered a single generic Destructure interface instead?

abstract class Destructure<T extends Record> {
  T destructure();
}

Then an example:

class Vector3 extends Destructure<(double, double, double)> {
  final (double, double, double) _data;
  Vector3.zero() : _data = (0, 0, 0);
  (double, double, double) destructure() => _data;
}

var (x, y, z) = Vector3.zero();

@ds84182
Copy link

ds84182 commented Nov 2, 2020

@tatumizer I didn't see your edit, really wish GitHub would send edits through email.

I don't think this feature would be associated as "records" in the minds of developers, so toRecord doesn't fit (to me). While it is true that its turning some instance data into a record, this is used by the destructuring syntax. If you look at this as "multiple return values" and not "bundling data together", it makes sense.

I don't understand what is gained by the field0, field1, ... getters. Yes, they can be lazily evaluated, but does that really provide us any advantages? And the handling of named members is a lot more straightforward, as its part of the type now.

@munificent
Copy link
Member

Have you considered the option of NOT allowing classes to implement records in any way? After all, the class can always add a method toRecord() and return any record it wants with no limitations at all.

I have, yes. Another way we could define the Destructure interfaces is like:

abstract class Destructure1<T0> {
  T0 toRecord();
}

abstract class Destructure2<T0, T1> {
  (T0, T1) toRecord();
}

abstract class Destructure3<T0, T1, T2> {
  (T0, T1, T2) toRecord();
}

// ...

That adds some complexity to the semantics of destructuring. With this, there is a multi-step process for destructuring an instance of a user-defined class:

  1. Call toRecord() on it. This may end up heap-allocating a new record object and copying all of the fields over.
  2. Access fields on the resulting record using some built-in primitive semantics. We need some magic here because otherwise we have an infinite regress where you keep calling toRecord() to get a thing you can destructure but the only way you know to destructure the result is to call toRecord() on it.
  3. Discard the temporarily created record.

In theory, compilers may be able to optimize most of that away, but I don't like relying on "sufficiently smart" compilers for performance.

Also, this mechanism would only apply to positional fields. For destructuring named fields, I explicitly do want to call getters on the original instance because that lets you immediately used named field destructuring on user-defined classes without having to add any explicit destructuring support. If it's got getters, you can destructure 'em.

It felt weird to me to use direct getter calls for the named fields and a separate toRecord() copying step for the positional ones. This is especially true given that a single destructuring pattern may have both positional and named fields.

I prefer the C# approach to pattern matching general classes better than having them implement an interface with getter names field1 etc. Those names do not belong in a class API, they should be named after what they mean, and in a class, they are not just positional entries in a generic structural type).

I admit that having to declare field0, field1, etc. getters right on your class's public API is a little strange. If we had something like C#'s "explicit interface implementation", I would use that. But I do like the simplicity and uniformity of saying that all destructuring is simply getter calls.

We can't do C#'s approach directly because Dart doesn't have output parameters, which is the primitive functionality C# uses to have a single function call emit multiple values without needing to allocate some other primitive aggregate type. (You could do something clever by having the destructuring pass a closure to the toRecord() method, which invokes the closure with parameters for each destructured field, but that is just too weird.)

In practice, I think most classes won't expose positional destructuring. They will either use named destructuring or extractors (out of scope here, but basically a named conversion step between the original value and the destructured result). The relatively small number of classes that do directly support this—think classes like MapEntry—will clearly be data-like and oriented towards this use so seeing a couple of fieldn getters probably isn't too painful.

@lrhn
Copy link
Member

lrhn commented Nov 3, 2020

I was talking about the C# 8.0 recursive pattern object matching:

if (obj is Foo{Bar: var bar, Baz: var baz) foo) { 
  // foo := obj as Foo
  // bar := foo.Bar
  // baz := foo.Baz
}

There are no out-parameters here, just getter calls. With that pattern matching, you don't need to convert to an intermediate tuple in order to pattern match against an object. A class should almost never have a toTuple, because that operation does not correspond to anything in the domain (it's a completely abstract data operation, not a domain operation). It could have an (int, int) toPoint(); operation if it can be represented by a point, but a "tuple" is a physical representation detail, not a logical one. Or it can have a toTuple like it can have a toJson method if it thinks that might be useful for serialization (but I'd actually recommend the toJson over toTuple in that situation because JSON structures can be inspected).

An example which can be run directly:

using System;
					
public class Program
{
	public static void Main()
	{
		Object o = new Example();
		if (o is Example{Foo: var v} e) {
			System.Console.WriteLine(e.Foo);
			System.Console.WriteLine(v);
		}
	}
}

class Example {
	public int Foo { 
		get { return 42; }
	}
}

A "Destructor"/"Destructurer" is not needed for objects. It's a projection function, and the appropriate projections of an object is its getters.

@munificent
Copy link
Member

I was talking about the C# 8.0 recursive pattern object matching:

if (obj is Foo{Bar: var bar, Baz: var baz) foo) { 
  // foo := obj as Foo
  // bar := foo.Bar
  // baz := foo.Baz
}

Ah, interesting, I wasn't familiar with that feature.

There are no out-parameters here, just getter calls. With that pattern matching, you don't need to convert to an intermediate tuple in order to pattern match against an object.

Right, this is how I generally expect most user-defined classes to be used in pattern matching. I think named fields will be the common case and the pattern match syntax for those desugars to calling getters on the instance. In other words, I want this to just work:

class Point {
  final num x, y;
}

test(Point p) {
  var (x: x, y: y) = p;
  print("$x $y");
}

But, for some classes—likely a relatively small minority—positional destructuring is obvious enough and the brevity useful enough to be worth having the class opt in. For example, I'd really like MapEntry to be positionally destructurable so you can do:

test(Map map) {
  for (var (key, value) in map.entries) { ... }
}

@munificent
Copy link
Member

For now at least, I have removed the Destructure_n_ interfaces from the proposal. This means that user-defined classes can't define a protocol for positional field destructuring. In return, we get a simpler feature and get to eliminate a uniformly unliked corner of the proposal. We can always add user-defined positional destructuring interfaces in a future extension of the feature.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
patterns Issues related to pattern matching.
Projects
None yet
Development

No branches or pull requests

5 participants