Skip to content

Add section on records being constant. #2413

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Aug 23, 2022
Merged

Conversation

lrhn
Copy link
Member

@lrhn lrhn commented Aug 18, 2022

A proposal for constant specifying records being constant.
Hope I covered all the corners.

Basically, only two paragraphs are normative, the rest is elaboration.

Fixes #2337

@mit-mit

My proposal for constant behavior.
@lrhn lrhn requested review from munificent and eernstg August 18, 2022 13:37
Copy link
Member

@eernstg eernstg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For now, I've just commented on the situation in this review, but looking at #2337, I think the matter could benefit from some further discussion in that issue (or in some other issue, if needed).

_This means that a record expression can be used in the initializer list
of a constant non-redirecting generative constructor._

Constant *object* instantiations create deeply immutable and canonicalied objects.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Constant *object* instantiations create deeply immutable and canonicalied objects.
Constant *object* instantiations create deeply immutable and canonicalized objects.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This whole paragraph would be 'rationale', so the typical formatting would be to use an italic font, just like commentary.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at the rest of the document, it doesn't seem to be as strongly separating rationale from normative text.
I tried making everything except paragraph 1 and 3 italic, and it looked weird.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that's OK. It can look weird in this context, but that's just because we ought to change the rest of this document similarly, and in the meantime there's nothing lost by doing it here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I try to use italics for non-normative text when it's in the middle of otherwise clearly normative sections, mostly as a signal to implementers that they don't have to treat it as gospel. But in introductory sections that are higher level or motivational, almost the whole thing is non-normative, so I don't bother highlighting it.

I admit I could probably be clearer and more disciplined about this. :)

if all its record field expressions are potentially constant or constant expressions.

_This means that a record expression can be used in the initializer list
of a constant non-redirecting generative constructor._
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a subtle and probably substantial extension of the expressive power of (potentially) constant expressions.

I can see that the next paragraphs contain considerations about the questions that came to my mind immediately; I've responded to those considerations below, and there is a certain amount of overlap, but I still think it makes sense to have the following considerations in one place:

It means that we would allow declarations like const C(int i): this.myRecord = (1, i);.

However, it is not a given that we will also allow const D(int i): this.myList = const [1, i];, or similar declarations using sets or maps (not to mention <constObjectExpression>s).

The obvious difference is that it is "not true" that [1, i] is a constant list, even though it will become a constant list after actual argument substitution in all those cases where the constructor is invoked with a constant expression as the actual argument passed to i. So we could allow const [1, i] in the initializer list, and then make it an error at each constructor invocation if the substitution of the actual argument for i doesn't produce a constant expression.

Do we intend to do the same thing for records, or would we accept C(DateTime.now().millisecondsSinceEpoch);? Is the treatment of identical() on records sufficiently supportive of the current notion of canonicalization to say that constant records are canonicalized? Or perhaps we just don't say that?

How about using an explicit const modifier at the front of a record expression in order to stipulate that this particular record is constant, thus giving records the same treatments as other composite entities like lists and maps?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It means that we would allow declarations like const C(int i): this.myRecord = (1, i);.

That's actually a consequence of the "potentially constant" paragraph below, not this one. Its true that we'd allow it.

We should definitely not allow const [1, i] (with or without the leading const). We use different implementation classes with different behavior for constant and non-constant lists.
A C(1, i) (class constant constructor invocation with constant/potentially constant arguments) is more likely, because that uses the same class for constant and non-constant invocations, but because constructors can execute (some) code, we can get into problems with recursive constructor calls. And it differs wrt. canonicalization, which records don't (at least you can't tell if they do when a single record can be non-identical to itself).

Records are special. They may or may not correspond to an object at all, and you can't tell if they do.
A record like (c1, c2) really is just the juxtaposition of its values. It's just two values, nothing more or less. If each value can occur, so can both.

A list has structure on top of its contents. We use different classes for const lists and non-const lists. Creating a list is an object creation expression.

Not so for records. Example:

class C {
  final int a, b;
  final (int, int) p1;
  const C(int a, int b, int c, int d) : a = a, b = b, p = (c, d);
  (int, int) get p2 => (a, b);
}

There is no fundamental difference between the initializers for a, b, p1.$0 and p1.$1, and I don't want there to be one. The only difference is the grouping of the values, where p.$0 and p.$1 can be accessed together as p1, just as a and b can be accessed through p2. The p1 variable still just (potentially) two adjacent storage locations in the class.

We can add const in front of a record, but it feels spurious. It doesn't cause canonicalization, because records don't have a consistent identity anyway. It ensures that the field values are constants, but either that doesn't matter, it's enforced anyway, or you can put const on each field expression if necessary, like ({x = (const C(), const [])}).
(I'd be fine with allowing const expression for any expression, which then puts that expression into a constant context. The const (e1, e2) would just be shorthand for (const e1, const e2), and const 4 is valid and constant.

I don't know if we intend to do the same thing for structs. If we can, I'd probably be happy with it. A StructName(arg1,...,argn) would be constant (aka. is valid where a constant is required) if it invokes the primary constructor, and the argument expressions are constant.
However, since a struct can have non-primary constructors too, and those would presumably require const to invoke, it would be better for consistency, with other struct constructors and other const constructors in general, to require a const in front. So, "probably not" to doing the same for structs, or other constructs with constructors.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Records are special. They may or may not correspond to an object at all, and you can't tell if they do.

I can't see any way to support a reference of type Object whose value is assigned/initialized to a record, other than boxing (or using an existing boxed representation of) the record, unless that whole reference can be removed during optimization. The type could also be any top type, or it could be a type variable whose value is Object or a top type.

This means that records can be known to have a heap entity representation at some point. Any usage of that representation at the record type might unbox it, but the Object typed reference is not a weak reference and the fact that it is a reference to a heap entity makes it more object-like than the wording

It's just two values, nothing more or less

implies.

We can add const in front of a record, but it feels spurious. It doesn't cause canonicalization

It could cause canonicalization.

allowing const expression for any expression

Looks like that would be a thin layer of syntactic sugar: Putting const freshVariableName = expression; at the top level and then replacing const expression by freshVariableName.

Records are always unmodifiable, and if their field values are deeply immutable,
like constants values, the records are also deeply immutable.
It's meaningless to consider whether record constants are canonicalized,
since records do not have a persistent identity.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The purpose of having explicitly constant records and specifying canonicalization would be that (1) this clarifies the intent, and makes it an error if some components of a const record literal are not constant expressions, and (2) this serves as a request for physical canonicalization. Granted, if there is a location in code where this kind of record is unboxed then we would not promise to find the canonical object if and when it is re-boxed. Still, I'd expect the explicit request for canonicalization to be helpful, in order to make identical work as a fast path for testing equality.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you plan to depend on canonicalization, you're setting yourself up for failure.

I'd expect a significant percentage of record-typed instance variables to be inlined in the object structure. You lose canonicalization for those immediately at creation time.

Asking for canonicalization is asking for something you should not expect getting, which is why I actually prefer not giving you the option.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you plan to depend on canonicalization, you're setting yourself up for failure.

That is true!

But it still clarifies the intent and allows for compile-time checking that we have a constant.

Of course, a top level const freshVariableName = ... would do that, too. So the question is just whether or not we want this syntactic sugar.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I think canonicalization is a non-starter for records. The language has no formal notion of what it means to assign a value or bind it to a variable. So even if you have a canonicalized constant record, there's no guarantee that when you mention it that doing so doesn't copy it to a thing with a new identity. Even binding it to the parameter to the identical() function itself could cause the canonicalized record to be copied to a new stack-allocated storage and lose its original identity.

to be a constant, like there is for object creation expressions.
A record expression with field values that are constant-created values,
will be indistinguishable from a similar expression created in a constant
context, since identity cannot be used as a distinguishing trait.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Identity cannot be used as a negatively distinguishing trait, that is, we can't conclude non-sameness from identical(r1, r2) == false.

However, this doesn't imply that identical(r1, r2) == true is useless, and we might be able to use explicitly constant records to ensure that this occurs more frequently.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really, really hope people won't start trying to force canonicalization and depend on the identity of records for anything.
It's an optimization. If your code can't run well if identical returns false on all records, you should fix it. Or stop using records for something where identity matters.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really, really hope people won't start trying to force canonicalization and
depend on the identity of records for anything.

Right, we might very well want a lint against statically known records as arguments to identical, except for the use of exactly this approach in the implicitly induced implementation of == on record types.

But the point is that an explicit request for canonicalization could ensure that == runs faster, more frequently.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm strongly opposed to giving people the idea that they can affect canonicalization of records, and that it matters in any way.

You really cannot know that canonicalization will make anything run faster. It might. It's definitely true that more canonicalized objects gives more chances of a positive identical check.

But you also cannot know that requesting canonicalization actually will cause more canonicalization. It's perfectly reasonable for a compiler to canonicalize all the constant records expressions it can see anyway, and if the compiler chooses not to, perhaps it's for a good reason. And it still can choose not to, even if you request canonicalization.

If I was a compiler writer implementing records, I'd completely ignore any user request for canonicalization of records, because I can, and do whatever I think is best for the compilation.

So why give the user a button to press, if it's not connected to anything anyway?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even if the compiler does canonicalize every single record, as soon as you mention it, the compiler could silently copy it to a new unboxed storage and you've lost the fact that it once referred to some canonical instance.

Copy link
Member

@munificent munificent left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really like this, better than the naive approach I took in #2422. If you want to merge it under me, I'm fine with it. I'll rebase my PR to handle it.

Records are always unmodifiable, and if their field values are deeply immutable,
like constants values, the records are also deeply immutable.
It's meaningless to consider whether record constants are canonicalized,
since records do not have a persistent identity.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I think canonicalization is a non-starter for records. The language has no formal notion of what it means to assign a value or bind it to a variable. So even if you have a canonicalized constant record, there's no guarantee that when you mention it that doing so doesn't copy it to a thing with a new identity. Even binding it to the parameter to the identical() function itself could cause the canonicalized record to be copied to a new stack-allocated storage and lose its original identity.

to be a constant, like there is for object creation expressions.
A record expression with field values that are constant-created values,
will be indistinguishable from a similar expression created in a constant
context, since identity cannot be used as a distinguishing trait.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even if the compiler does canonicalize every single record, as soon as you mention it, the compiler could silently copy it to a new unboxed storage and you've lost the fact that it once referred to some canonical instance.

_This means that a record expression can be used in the initializer list
of a constant non-redirecting generative constructor._

Constant *object* instantiations create deeply immutable and canonicalied objects.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I try to use italics for non-normative text when it's in the middle of otherwise clearly normative sections, mostly as a signal to implementers that they don't have to treat it as gospel. But in introductory sections that are higher level or motivational, almost the whole thing is non-normative, so I don't bother highlighting it.

I admit I could probably be clearer and more disciplined about this. :)

since records do not have a persistent identity.

Because of that, there is no need for a `const (1, 2)` syntax to force a record
to be a constant, like there is for object creation expressions.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a really good insight! I like the idea of records being automagically const. They're immutable and have no persistent identity, so there's literally nothing preventing a compiler from treating any record expression with compile-time-known field values as a constant. And if that's the case, I see no compelling reason to require users to write const before it.

On the other hand, auto-const for records might lead users to want other immutable classes to be auto-constable too, and then get annoyed that they aren't. Perhaps something to think about with structs.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True.
My perspective is that where const Foo(..) creates a new Foo object, (a, b) does not create anything. It just places a next to b. It's just two values instead of one value.
It's not that you create a constant pair, it's that a pair of constant values are ... well, two constant values. They can be used wherever two constant values are needed.
You never need "a constant pair", you need a pair of constant values.

We can allow an initializer list of the form:

  const C((int, int) pair) : (this.x, this.y) = pair;

(We should!)

We could even allow

  (final x, const y) = (1, 2);

(We probably shouldn't.)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think there is anything particularly tricky about

class C {
  final int x, y;
  const C((int, int) pair) : (x, y) = pair;
}

All we need is the ability to desugar the initializer list to x = pair.$0, y = pair.$1. But the opposite direction is more tricky:

class D {
  final (int, int) pair;
  const D(int x, int y) : pair = (x, y);
}

And even more:

class E {
  final Object o;
  const E(int x, int y, bool b) : o = b ? (x, y) : x + y;
}

Canonicalization is currently based on a compile-time version of identical applied to the values of each instance variable, but identical applied to a pair of records isn't guaranteed to ever return true, so we wouldn't have any guarantees about canonicalization of instances of E, unless we specify that this compile-time-identical behaves differently.

Perhaps it is enough to say that identical at compile-time will always perform a complete traversal in case it is invoked on two record values (of course, bailing out early if there is a non-match, but traversing the entire structure in order to obtain the value true).

Currently a non-record is canonicalized based on the `identical` comparison of its fields against previously created constant values of the same type. 
Since a record field does not have a predictably or useful `identical` comparison result, we instead define a comparison structurally on the record, and use that instead.

Also formatting tweaks.
@lrhn lrhn merged commit fd0d6ff into master Aug 23, 2022
@lrhn
Copy link
Member Author

lrhn commented Aug 23, 2022

Landed. Your merge :)

@lrhn lrhn deleted the lrhn-record-constant-spec branch August 23, 2022 12:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Can records be const?
3 participants