Allow to statically analyze messages to detect undefined local variables #403

stasm · 2023-06-26T18:33:49Z

This is a follow-up to a conversation I had with @mihnita about variable shadowing. See #310 and #402 for context. I think we've been focusing too much on whether local variables should be allowed to shadow other local variables, and that we treated the question of local variables' shadowing of the message argument as derivative. I'm here to argue that we should reverse this and first discuss whether local variables are allowed to shadow message arguments.

Mihai observed that in the current design in which local variables share the namespace with message arguments it's impossible to statically analyze references to local variables. In the following message we can't with certainty detect that $fooo is a typo, because it could be a message argument (i.e. a value provided by the developer of the app at the callsite of the MF API).

let $foo = {1}
{Hello, {$fooo} and {$user}.}

Forbidding such redefinitions is difficult pre-runtime, because the list of message arguments is unknown upfront.

One idea could be to provide a way for message authors to define message argument names in the message itself, for example:

input $user
let $foo = {1}
{Hello, {$fooo} and {$user}.}

Another solution would be to redefine the syntax for local variables such that it prevents name conflicts with message arguments. This could be done through a new sigil (@mihnita suggested #foo) or through an extra requirement to the name, e.g. a dash in the middle of the name: $my-foo, similar to how HTML allows custom element names.

Do others also feel that not being able to do extensive static analysis on variable names is a problem that needs to be fixed?

The text was updated successfully, but these errors were encountered:

aphillips · 2023-06-26T21:19:58Z

Thanks @stasm and @mihnita for working on this.

I agree that separating arguments from local variables has a number of benefits. I think the best approach to doing this would be to change the sigil for local variables. My next nearest preference would be "special" (I don't mean non-ASCII here) characters just after the $ sigil. I think that external arguments are the most common use case and we should bias towards them. Local variables can be less pretty.

I don't agree with the idea of adding a "declaration requirement" for arguments. This would make simple expressions more complex than they need to be:

{Today is {$date}}

suddenly has to be:

input $date {Today is {$date}}

I think it is important to be as forgiving and simple-to-write as we can in our syntax. We are not a programming, scripting, or even templating language. Note that allowing callers to pass more arguments than the message currently consumes is a feature.

I think my proposal would be to change the sigil for local variables. I would be "okay" (not overjoyed, but satisfied) with making local variables immutable if we did this (notice that this would make external arguments immutable too). I'm okay with keeping let or switching to const. I also note that we can eliminate the keyword altogether if we change sigils. Consider:

#foo = {$foo :transformMe}
{Today is {$foo} and I like to {#foo}}   // valid and `$foo` and `#foo` have potentially different values

Or perhaps:

$!foo = {$foo :something}
{Today is {$foo} and something make it {$!foo}}

stasm · 2023-06-26T21:35:49Z

I think my proposal would be to change the sigil for local variables. I would be "okay" (not overjoyed, but satisfied) with making local variables immutable if we did this (notice that this would make external arguments immutable too).

I think we can have these two discussions separately. I propose that we first fix message arguments vs. local variables conflicts; I see it as a more severe issue that I hope we can all rally around.

(I still think we may want to allow redeclarations of local variables.)

stasm · 2023-06-26T21:41:45Z

@mihnita's original suggestion was to use #foo for local variables. I think I'd prefer to first look at "compound prefixes", i.e. composed of two characters, than to add a new prefix to the syntax. However, my opinion is not strong.

Also, wrt. the # prefix in particular, it may be more suitable for open-close syntax; see #398 (comment).

macchiati · 2023-06-26T23:19:12Z

How about $$foo for locals? Pretty visible, related but different, easy to type, and doesn't use up another character.

macchiati · 2023-06-26T23:20:28Z

BTW I agree with "I agree that separating arguments from local variables"

mihnita · 2023-06-28T17:06:00Z

I think that "declaring" the arguments does not help much (input $user).

Without declaration:

{Hello {$user}}

It user is not passed as argument we get a runtime error (missing argument or something like that).

With declaration the above would be reported as a problem at lint time (static), so I add a declaration as proposed:

input $usert
{Hello {$user}}

Now the lint is happy.
But if user is still not passed as argument we get the same runtime error as before.

So the declaration does not solve much. It only tells us that user is used, and expected to be an argument.
But we can get the same information by looking at the message proper.
And if we have different sigils (or naming) for local variables (consts?) that also tells us that user is an argument.

For separation: I'm fine with $$, $!, #
I would rather not use - because that is also allowed in the variable proper.
So it looks more like a naming convention, not a sigil. But not a blocker, I can live with it.

Also fine to drop the declaration keyword (let or const), as proposed by Addison.
Although a const means one can read the message and understand what's going on (with the shadowing :-) without reading the spec.

Without reading the spec this looks legal, but it is not:

$$foo = {13}
$$foo = {42}

Without reading the spec this looks illegal, and it is:

const $$foo = {13}
const $$foo = {42}

I still think that we should consider this together with the local variable shadowing.
Yes, they are different issues, and it is good to separate them.
But looking at both would make the solution more coherent.

eemeli · 2023-06-30T01:49:05Z

I'm not convinced that the core assertion about the current state of affairs holds true:

In the following message we can't with certainty detect that $fooo is a typo, because it could be a message argument (i.e. a value provided by the developer of the app at the callsite of the MF API).
let $foo = {1}
{Hello, {$fooo} and {$user}.}
Forbidding such redefinitions is difficult pre-runtime, because the list of message arguments is unknown upfront.

Generalising a bit, there are two stages at which a message like this is being edited:

By a developer, when first creating it.
By a translator, when translating it.

In neither of these situations does the message exist in a vacuum. In the first case, the developer is also working with code that formats it; in the second, the source message is also available. Using this contextual information it's possible for static analysis to:

Warn the developer that their formatting call is missing a fooo value, or
Warn the translator that they're using a fooo variable that the source message didn't use.

Given this, I would posit that the added value of being able reach a similar conclusion by static analysis without requiring such external context has relatively low value.

The preceding discussion here appears to have jumped directly to figuring out solutions for the stated problem, which I would assert isn't significant enough to need fixing.

stasm · 2023-06-30T09:25:48Z

How about $$foo for locals? Pretty visible, related but different, easy to type, and doesn't use up another character.

It was indeed one of the options that we discussed with @mihnita. I think it makes sense overall, but I have one concern that I'd like to get feedback from others about. Namely, I think some users may read $$foo as $($foo), that is: take the value of $foo and use this value to look up another variable in the scope, by name. This isn't something that's even possible in MF2, so this is likely too contrived?

stasm · 2023-06-30T09:31:47Z

I'm okay with keeping let or switching to const. I also note that we can eliminate the keyword altogether if we change sigils.

We could technically eliminate let even today, when local variables still use the $ prefix. However, I continue to think that they improve the readability of our syntax and fit well with our use of other keywords: match and when.

stasm · 2023-07-03T10:43:39Z

Warn the developer that their formatting call is missing a fooo value, or

Such warnings require a much more advanced, cross-language static analysis. I expect that not all authoring tools will be able to provide this level of authoring experience.

Given how simple our syntax is, I'd like us to strive to make it trivial to analyze and detect as many kinds of errors as possible before resorting to more complex static analysis methods. (Which I also hope will be possible in the future.)

Warn the translator that they're using a fooo variable that the source message didn't use.

This is a good point to keep in mind, thanks. We can indeed expect most of the translator-oriented use-cases to have access to the original source message. Here again, however, I feel like there's an opportunity to leverage the simplicty of our syntax to allow some use-cases to not even need the source message. For example, linting, pretty-formatting, and certain correctness checks could be performed on translated messages only, without access to the source—simplifying the logic and infrastructure required to run it.

eemeli · 2023-07-03T11:49:36Z

Given how simple our syntax is, I'd like us to strive to make it trivial to analyze and detect as many kinds of errors as possible before resorting to more complex static analysis methods. (Which I also hope will be possible in the future.)

I agree with this approach in principle, but I find the cost in terms of complexity -- adding a second "variable" prefix -- to be significantly greater than the benefits.

stasm · 2023-07-25T16:48:27Z

My thinking evolves as we discuss other topics in parallel. In #310 (comment) I said:

In the teleconference yesterday I said namespacing and shadowing were orthogonal and I suggested to make a decision about namespecing first. However, I now see that they're likely tightly-coupled, unfortunately. That's beause namespacing without a discussion about shadowing directly contradicts the first requirement above: Be able to re-annotate input variables under the same name without having to rename them in the message body. We need to pair with something like the modify keyword solution.

I'd like to expand on this and I choose to do it in this issue because I'm going to argue with myself from a month ago :)

The goal of this request is to allow detecting references to undefined local variables. Specifically, under the current syntax, in the following message we have to allow the possibility that $fooo is a valid input variable rather than a mistyped reference to $foo.

let $foo = {1}
{Hello, {$fooo} and {$user}.}

The proposal presented in this issue was to introduce namespacing:

let $$foo = {1}
{Hello, {$$fooo} and {$user}.}

This does indeed allow tooling to detect $$fooo as a mistyped reference to $$foo — but this is only half of the problem, as many have pointed out in #310 and elsewhere.

The real problem is that it's easy to forget to use the local $$foo in the first place.

One of the potential solutions discussed in #310 proposes to introduce new syntax for re-annotating variables under the same name, i.e. without introducing a new name into the scope:

modify {$count :number maxFractionDigits=0}

This has the benefit of helping ensure that the same annotated $count is used throughout the message body: both for selection and for formatting. You can't accidentally use the original $count for selection and the annotated $$count for formatting.

It also sidesteps the need to create new local variables in what I think will be the majority of use-cases that we imagined for them. I expect local variables would become rarely used because re-annotations would serve these use-cases well enough.

Local variables would still be needed to allow creating new names for new things, and I think we could then drop strict namespacing and rely on conventions:

let $_hostName = {$host :person firstName=long}
let $_guestName = {$guest :person firstName=long}
let $_guestsOther = {$guestCount :number offset=1}

To summarize:

namespacing solves only half of the problem of statically detecting mistakes in variable references,
namespacing is an expensive solution if we choose to do it with a new sigil, like $$ or #,
if we go in the direction of introducing new syntax for re-annotation, like modify or anything else, perhaps we don't actually need strict namespacing and can instead rely on naming conventions?

aphillips · 2023-12-05T15:12:13Z

So... much has happened since we last touched this issue, including the creation of .local and making variables immutable. However, I suspect that the correct resolution of this issue will be to reject it. The problem is that we allow undeclared variables to be used and local variables remain syntactically identical to input variables.

I think my question would be: why do we need static analysis of only local variables? What purpose does that serve?

Suggest that we close this. @stasm what do you think?

stasm mentioned this issue Jul 2, 2023

Change the syntax of the #open and /close function calls #398

Closed

catamorphism mentioned this issue Jul 5, 2023

Require variable names to be globally unique #402

Closed

stasm mentioned this issue Jul 25, 2023

Discuss variable priorities and shadowing #310

Closed

aphillips added the syntax Issues related with syntax or ABNF label Jul 29, 2023

stasm mentioned this issue Sep 9, 2023

Design document for variable mutability and namespacing #469

Merged

aphillips added the resolve-candidate This issue appears to have been answered or resolved, and may be closed soon. label Dec 5, 2023

aphillips added the Stale Obsolete? label Jan 11, 2024

aphillips closed this as completed Jan 11, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow to statically analyze messages to detect undefined local variables #403

Allow to statically analyze messages to detect undefined local variables #403

stasm commented Jun 26, 2023

aphillips commented Jun 26, 2023

stasm commented Jun 26, 2023

stasm commented Jun 26, 2023

macchiati commented Jun 26, 2023

macchiati commented Jun 26, 2023

mihnita commented Jun 28, 2023

eemeli commented Jun 30, 2023

stasm commented Jun 30, 2023

stasm commented Jun 30, 2023

stasm commented Jul 3, 2023

eemeli commented Jul 3, 2023

stasm commented Jul 25, 2023

aphillips commented Dec 5, 2023

Allow to statically analyze messages to detect undefined local variables #403

Allow to statically analyze messages to detect undefined local variables #403

Comments

stasm commented Jun 26, 2023

aphillips commented Jun 26, 2023

stasm commented Jun 26, 2023

stasm commented Jun 26, 2023

macchiati commented Jun 26, 2023

macchiati commented Jun 26, 2023

mihnita commented Jun 28, 2023

eemeli commented Jun 30, 2023

stasm commented Jun 30, 2023

stasm commented Jun 30, 2023

stasm commented Jul 3, 2023

eemeli commented Jul 3, 2023

stasm commented Jul 25, 2023

aphillips commented Dec 5, 2023