-
Notifications
You must be signed in to change notification settings - Fork 213
Add section on records being constant. #2413
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
My proposal for constant behavior.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For now, I've just commented on the situation in this review, but looking at #2337, I think the matter could benefit from some further discussion in that issue (or in some other issue, if needed).
_This means that a record expression can be used in the initializer list | ||
of a constant non-redirecting generative constructor._ | ||
|
||
Constant *object* instantiations create deeply immutable and canonicalied objects. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Constant *object* instantiations create deeply immutable and canonicalied objects. | |
Constant *object* instantiations create deeply immutable and canonicalized objects. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This whole paragraph would be 'rationale', so the typical formatting would be to use an italic font, just like commentary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking at the rest of the document, it doesn't seem to be as strongly separating rationale from normative text.
I tried making everything except paragraph 1 and 3 italic, and it looked weird.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that's OK. It can look weird in this context, but that's just because we ought to change the rest of this document similarly, and in the meantime there's nothing lost by doing it here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I try to use italics for non-normative text when it's in the middle of otherwise clearly normative sections, mostly as a signal to implementers that they don't have to treat it as gospel. But in introductory sections that are higher level or motivational, almost the whole thing is non-normative, so I don't bother highlighting it.
I admit I could probably be clearer and more disciplined about this. :)
if all its record field expressions are potentially constant or constant expressions. | ||
|
||
_This means that a record expression can be used in the initializer list | ||
of a constant non-redirecting generative constructor._ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a subtle and probably substantial extension of the expressive power of (potentially) constant expressions.
I can see that the next paragraphs contain considerations about the questions that came to my mind immediately; I've responded to those considerations below, and there is a certain amount of overlap, but I still think it makes sense to have the following considerations in one place:
It means that we would allow declarations like const C(int i): this.myRecord = (1, i);
.
However, it is not a given that we will also allow const D(int i): this.myList = const [1, i];
, or similar declarations using sets or maps (not to mention <constObjectExpression>
s).
The obvious difference is that it is "not true" that [1, i]
is a constant list, even though it will become a constant list after actual argument substitution in all those cases where the constructor is invoked with a constant expression as the actual argument passed to i
. So we could allow const [1, i]
in the initializer list, and then make it an error at each constructor invocation if the substitution of the actual argument for i
doesn't produce a constant expression.
Do we intend to do the same thing for records, or would we accept C(DateTime.now().millisecondsSinceEpoch);
? Is the treatment of identical()
on records sufficiently supportive of the current notion of canonicalization to say that constant records are canonicalized? Or perhaps we just don't say that?
How about using an explicit const
modifier at the front of a record expression in order to stipulate that this particular record is constant, thus giving records the same treatments as other composite entities like lists and maps?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It means that we would allow declarations like
const C(int i): this.myRecord = (1, i);
.
That's actually a consequence of the "potentially constant" paragraph below, not this one. Its true that we'd allow it.
We should definitely not allow const [1, i]
(with or without the leading const
). We use different implementation classes with different behavior for constant and non-constant lists.
A C(1, i)
(class constant constructor invocation with constant/potentially constant arguments) is more likely, because that uses the same class for constant and non-constant invocations, but because constructors can execute (some) code, we can get into problems with recursive constructor calls. And it differs wrt. canonicalization, which records don't (at least you can't tell if they do when a single record can be non-identical to itself).
Records are special. They may or may not correspond to an object at all, and you can't tell if they do.
A record like (c1, c2)
really is just the juxtaposition of its values. It's just two values, nothing more or less. If each value can occur, so can both.
A list has structure on top of its contents. We use different classes for const lists and non-const lists. Creating a list is an object creation expression.
Not so for records. Example:
class C {
final int a, b;
final (int, int) p1;
const C(int a, int b, int c, int d) : a = a, b = b, p = (c, d);
(int, int) get p2 => (a, b);
}
There is no fundamental difference between the initializers for a
, b
, p1.$0
and p1.$1
, and I don't want there to be one. The only difference is the grouping of the values, where p.$0
and p.$1
can be accessed together as p1
, just as a
and b
can be accessed through p2
. The p1
variable still just (potentially) two adjacent storage locations in the class.
We can add const
in front of a record, but it feels spurious. It doesn't cause canonicalization, because records don't have a consistent identity anyway. It ensures that the field values are constants, but either that doesn't matter, it's enforced anyway, or you can put const
on each field expression if necessary, like ({x = (const C(), const [])})
.
(I'd be fine with allowing const expression
for any expression, which then puts that expression into a constant context. The const (e1, e2)
would just be shorthand for (const e1, const e2)
, and const 4
is valid and constant.
I don't know if we intend to do the same thing for structs. If we can, I'd probably be happy with it. A StructName(arg1,...,argn)
would be constant (aka. is valid where a constant is required) if it invokes the primary constructor, and the argument expressions are constant.
However, since a struct can have non-primary constructors too, and those would presumably require const
to invoke, it would be better for consistency, with other struct constructors and other const constructors in general, to require a const
in front. So, "probably not" to doing the same for structs, or other constructs with constructors.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Records are special. They may or may not correspond to an object at all, and you can't tell if they do.
I can't see any way to support a reference of type Object
whose value is assigned/initialized to a record, other than boxing (or using an existing boxed representation of) the record, unless that whole reference can be removed during optimization. The type could also be any top type, or it could be a type variable whose value is Object
or a top type.
This means that records can be known to have a heap entity representation at some point. Any usage of that representation at the record type might unbox it, but the Object
typed reference is not a weak reference and the fact that it is a reference to a heap entity makes it more object-like than the wording
It's just two values, nothing more or less
implies.
We can add const in front of a record, but it feels spurious. It doesn't cause canonicalization
It could cause canonicalization.
allowing
const expression
for any expression
Looks like that would be a thin layer of syntactic sugar: Putting const freshVariableName = expression;
at the top level and then replacing const expression
by freshVariableName
.
Records are always unmodifiable, and if their field values are deeply immutable, | ||
like constants values, the records are also deeply immutable. | ||
It's meaningless to consider whether record constants are canonicalized, | ||
since records do not have a persistent identity. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The purpose of having explicitly constant records and specifying canonicalization would be that (1) this clarifies the intent, and makes it an error if some components of a const
record literal are not constant expressions, and (2) this serves as a request for physical canonicalization. Granted, if there is a location in code where this kind of record is unboxed then we would not promise to find the canonical object if and when it is re-boxed. Still, I'd expect the explicit request for canonicalization to be helpful, in order to make identical
work as a fast path for testing equality.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you plan to depend on canonicalization, you're setting yourself up for failure.
I'd expect a significant percentage of record-typed instance variables to be inlined in the object structure. You lose canonicalization for those immediately at creation time.
Asking for canonicalization is asking for something you should not expect getting, which is why I actually prefer not giving you the option.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you plan to depend on canonicalization, you're setting yourself up for failure.
That is true!
But it still clarifies the intent and allows for compile-time checking that we have a constant.
Of course, a top level const freshVariableName = ...
would do that, too. So the question is just whether or not we want this syntactic sugar.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I think canonicalization is a non-starter for records. The language has no formal notion of what it means to assign a value or bind it to a variable. So even if you have a canonicalized constant record, there's no guarantee that when you mention it that doing so doesn't copy it to a thing with a new identity. Even binding it to the parameter to the identical()
function itself could cause the canonicalized record to be copied to a new stack-allocated storage and lose its original identity.
to be a constant, like there is for object creation expressions. | ||
A record expression with field values that are constant-created values, | ||
will be indistinguishable from a similar expression created in a constant | ||
context, since identity cannot be used as a distinguishing trait. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Identity cannot be used as a negatively distinguishing trait, that is, we can't conclude non-sameness from identical(r1, r2) == false
.
However, this doesn't imply that identical(r1, r2) == true
is useless, and we might be able to use explicitly constant records to ensure that this occurs more frequently.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I really, really hope people won't start trying to force canonicalization and depend on the identity of records for anything.
It's an optimization. If your code can't run well if identical
returns false
on all records, you should fix it. Or stop using records for something where identity matters.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I really, really hope people won't start trying to force canonicalization and
depend on the identity of records for anything.
Right, we might very well want a lint against statically known records as arguments to identical
, except for the use of exactly this approach in the implicitly induced implementation of ==
on record types.
But the point is that an explicit request for canonicalization could ensure that ==
runs faster, more frequently.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm strongly opposed to giving people the idea that they can affect canonicalization of records, and that it matters in any way.
You really cannot know that canonicalization will make anything run faster. It might. It's definitely true that more canonicalized objects gives more chances of a positive identical
check.
But you also cannot know that requesting canonicalization actually will cause more canonicalization. It's perfectly reasonable for a compiler to canonicalize all the constant records expressions it can see anyway, and if the compiler chooses not to, perhaps it's for a good reason. And it still can choose not to, even if you request canonicalization.
If I was a compiler writer implementing records, I'd completely ignore any user request for canonicalization of records, because I can, and do whatever I think is best for the compilation.
So why give the user a button to press, if it's not connected to anything anyway?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Even if the compiler does canonicalize every single record, as soon as you mention it, the compiler could silently copy it to a new unboxed storage and you've lost the fact that it once referred to some canonical instance.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I really like this, better than the naive approach I took in #2422. If you want to merge it under me, I'm fine with it. I'll rebase my PR to handle it.
Records are always unmodifiable, and if their field values are deeply immutable, | ||
like constants values, the records are also deeply immutable. | ||
It's meaningless to consider whether record constants are canonicalized, | ||
since records do not have a persistent identity. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I think canonicalization is a non-starter for records. The language has no formal notion of what it means to assign a value or bind it to a variable. So even if you have a canonicalized constant record, there's no guarantee that when you mention it that doing so doesn't copy it to a thing with a new identity. Even binding it to the parameter to the identical()
function itself could cause the canonicalized record to be copied to a new stack-allocated storage and lose its original identity.
to be a constant, like there is for object creation expressions. | ||
A record expression with field values that are constant-created values, | ||
will be indistinguishable from a similar expression created in a constant | ||
context, since identity cannot be used as a distinguishing trait. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Even if the compiler does canonicalize every single record, as soon as you mention it, the compiler could silently copy it to a new unboxed storage and you've lost the fact that it once referred to some canonical instance.
_This means that a record expression can be used in the initializer list | ||
of a constant non-redirecting generative constructor._ | ||
|
||
Constant *object* instantiations create deeply immutable and canonicalied objects. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I try to use italics for non-normative text when it's in the middle of otherwise clearly normative sections, mostly as a signal to implementers that they don't have to treat it as gospel. But in introductory sections that are higher level or motivational, almost the whole thing is non-normative, so I don't bother highlighting it.
I admit I could probably be clearer and more disciplined about this. :)
since records do not have a persistent identity. | ||
|
||
Because of that, there is no need for a `const (1, 2)` syntax to force a record | ||
to be a constant, like there is for object creation expressions. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a really good insight! I like the idea of records being automagically const. They're immutable and have no persistent identity, so there's literally nothing preventing a compiler from treating any record expression with compile-time-known field values as a constant. And if that's the case, I see no compelling reason to require users to write const
before it.
On the other hand, auto-const for records might lead users to want other immutable classes to be auto-constable too, and then get annoyed that they aren't. Perhaps something to think about with structs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
True.
My perspective is that where const Foo(..)
creates a new Foo
object, (a, b)
does not create anything. It just places a
next to b
. It's just two values instead of one value.
It's not that you create a constant pair, it's that a pair of constant values are ... well, two constant values. They can be used wherever two constant values are needed.
You never need "a constant pair", you need a pair of constant values.
We can allow an initializer list of the form:
const C((int, int) pair) : (this.x, this.y) = pair;
(We should!)
We could even allow
(final x, const y) = (1, 2);
(We probably shouldn't.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think there is anything particularly tricky about
class C {
final int x, y;
const C((int, int) pair) : (x, y) = pair;
}
All we need is the ability to desugar the initializer list to x = pair.$0, y = pair.$1
. But the opposite direction is more tricky:
class D {
final (int, int) pair;
const D(int x, int y) : pair = (x, y);
}
And even more:
class E {
final Object o;
const E(int x, int y, bool b) : o = b ? (x, y) : x + y;
}
Canonicalization is currently based on a compile-time version of identical
applied to the values of each instance variable, but identical
applied to a pair of records isn't guaranteed to ever return true, so we wouldn't have any guarantees about canonicalization of instances of E
, unless we specify that this compile-time-identical
behaves differently.
Perhaps it is enough to say that identical
at compile-time will always perform a complete traversal in case it is invoked on two record values (of course, bailing out early if there is a non-match, but traversing the entire structure in order to obtain the value true
).
Currently a non-record is canonicalized based on the `identical` comparison of its fields against previously created constant values of the same type. Since a record field does not have a predictably or useful `identical` comparison result, we instead define a comparison structurally on the record, and use that instead. Also formatting tweaks.
Landed. Your merge :) |
A proposal for constant specifying records being constant.
Hope I covered all the corners.
Basically, only two paragraphs are normative, the rest is elaboration.
Fixes #2337
@mit-mit