Skip to content

Allow to statically analyze messages to detect undefined local variables #403

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
stasm opened this issue Jun 26, 2023 · 13 comments
Closed
Labels
resolve-candidate This issue appears to have been answered or resolved, and may be closed soon. Stale Obsolete? syntax Issues related with syntax or ABNF

Comments

@stasm
Copy link
Collaborator

stasm commented Jun 26, 2023

This is a follow-up to a conversation I had with @mihnita about variable shadowing. See #310 and #402 for context. I think we've been focusing too much on whether local variables should be allowed to shadow other local variables, and that we treated the question of local variables' shadowing of the message argument as derivative. I'm here to argue that we should reverse this and first discuss whether local variables are allowed to shadow message arguments.

Mihai observed that in the current design in which local variables share the namespace with message arguments it's impossible to statically analyze references to local variables. In the following message we can't with certainty detect that $fooo is a typo, because it could be a message argument (i.e. a value provided by the developer of the app at the callsite of the MF API).

let $foo = {1}
{Hello, {$fooo} and {$user}.}

Forbidding such redefinitions is difficult pre-runtime, because the list of message arguments is unknown upfront.


One idea could be to provide a way for message authors to define message argument names in the message itself, for example:

input $user
let $foo = {1}
{Hello, {$fooo} and {$user}.}

Another solution would be to redefine the syntax for local variables such that it prevents name conflicts with message arguments. This could be done through a new sigil (@mihnita suggested #foo) or through an extra requirement to the name, e.g. a dash in the middle of the name: $my-foo, similar to how HTML allows custom element names.


Do others also feel that not being able to do extensive static analysis on variable names is a problem that needs to be fixed?

@aphillips
Copy link
Member

Thanks @stasm and @mihnita for working on this.

I agree that separating arguments from local variables has a number of benefits. I think the best approach to doing this would be to change the sigil for local variables. My next nearest preference would be "special" (I don't mean non-ASCII here) characters just after the $ sigil. I think that external arguments are the most common use case and we should bias towards them. Local variables can be less pretty.

I don't agree with the idea of adding a "declaration requirement" for arguments. This would make simple expressions more complex than they need to be:

{Today is {$date}}

suddenly has to be:

input $date {Today is {$date}}

I think it is important to be as forgiving and simple-to-write as we can in our syntax. We are not a programming, scripting, or even templating language. Note that allowing callers to pass more arguments than the message currently consumes is a feature.

I think my proposal would be to change the sigil for local variables. I would be "okay" (not overjoyed, but satisfied) with making local variables immutable if we did this (notice that this would make external arguments immutable too). I'm okay with keeping let or switching to const. I also note that we can eliminate the keyword altogether if we change sigils. Consider:

#foo = {$foo :transformMe}
{Today is {$foo} and I like to {#foo}}   // valid and `$foo` and `#foo` have potentially different values

Or perhaps:

$!foo = {$foo :something}
{Today is {$foo} and something make it {$!foo}}

@stasm
Copy link
Collaborator Author

stasm commented Jun 26, 2023

I think my proposal would be to change the sigil for local variables. I would be "okay" (not overjoyed, but satisfied) with making local variables immutable if we did this (notice that this would make external arguments immutable too).

I think we can have these two discussions separately. I propose that we first fix message arguments vs. local variables conflicts; I see it as a more severe issue that I hope we can all rally around.

(I still think we may want to allow redeclarations of local variables.)

@stasm
Copy link
Collaborator Author

stasm commented Jun 26, 2023

@mihnita's original suggestion was to use #foo for local variables. I think I'd prefer to first look at "compound prefixes", i.e. composed of two characters, than to add a new prefix to the syntax. However, my opinion is not strong.

Also, wrt. the # prefix in particular, it may be more suitable for open-close syntax; see #398 (comment).

@macchiati
Copy link
Member

How about $$foo for locals? Pretty visible, related but different, easy to type, and doesn't use up another character.

@macchiati
Copy link
Member

BTW I agree with "I agree that separating arguments from local variables"

@mihnita
Copy link
Collaborator

mihnita commented Jun 28, 2023

I think that "declaring" the arguments does not help much (input $user).

Without declaration:

{Hello {$user}}

It user is not passed as argument we get a runtime error (missing argument or something like that).

With declaration the above would be reported as a problem at lint time (static), so I add a declaration as proposed:

input $usert
{Hello {$user}}

Now the lint is happy.
But if user is still not passed as argument we get the same runtime error as before.

So the declaration does not solve much. It only tells us that user is used, and expected to be an argument.
But we can get the same information by looking at the message proper.
And if we have different sigils (or naming) for local variables (consts?) that also tells us that user is an argument.


For separation: I'm fine with $$, $!, #
I would rather not use - because that is also allowed in the variable proper.
So it looks more like a naming convention, not a sigil. But not a blocker, I can live with it.

Also fine to drop the declaration keyword (let or const), as proposed by Addison.
Although a const means one can read the message and understand what's going on (with the shadowing :-) without reading the spec.

Without reading the spec this looks legal, but it is not:

$$foo = {13}
$$foo = {42}

Without reading the spec this looks illegal, and it is:

const $$foo = {13}
const $$foo = {42}

I still think that we should consider this together with the local variable shadowing.
Yes, they are different issues, and it is good to separate them.
But looking at both would make the solution more coherent.

@eemeli
Copy link
Collaborator

eemeli commented Jun 30, 2023

I'm not convinced that the core assertion about the current state of affairs holds true:

In the following message we can't with certainty detect that $fooo is a typo, because it could be a message argument (i.e. a value provided by the developer of the app at the callsite of the MF API).

let $foo = {1}
{Hello, {$fooo} and {$user}.}

Forbidding such redefinitions is difficult pre-runtime, because the list of message arguments is unknown upfront.

Generalising a bit, there are two stages at which a message like this is being edited:

  1. By a developer, when first creating it.
  2. By a translator, when translating it.

In neither of these situations does the message exist in a vacuum. In the first case, the developer is also working with code that formats it; in the second, the source message is also available. Using this contextual information it's possible for static analysis to:

  1. Warn the developer that their formatting call is missing a fooo value, or
  2. Warn the translator that they're using a fooo variable that the source message didn't use.

Given this, I would posit that the added value of being able reach a similar conclusion by static analysis without requiring such external context has relatively low value.

The preceding discussion here appears to have jumped directly to figuring out solutions for the stated problem, which I would assert isn't significant enough to need fixing.

@stasm
Copy link
Collaborator Author

stasm commented Jun 30, 2023

How about $$foo for locals? Pretty visible, related but different, easy to type, and doesn't use up another character.

It was indeed one of the options that we discussed with @mihnita. I think it makes sense overall, but I have one concern that I'd like to get feedback from others about. Namely, I think some users may read $$foo as $($foo), that is: take the value of $foo and use this value to look up another variable in the scope, by name. This isn't something that's even possible in MF2, so this is likely too contrived?

@stasm
Copy link
Collaborator Author

stasm commented Jun 30, 2023

I'm okay with keeping let or switching to const. I also note that we can eliminate the keyword altogether if we change sigils.

We could technically eliminate let even today, when local variables still use the $ prefix. However, I continue to think that they improve the readability of our syntax and fit well with our use of other keywords: match and when.

@stasm
Copy link
Collaborator Author

stasm commented Jul 3, 2023

  • Warn the developer that their formatting call is missing a fooo value, or

Such warnings require a much more advanced, cross-language static analysis. I expect that not all authoring tools will be able to provide this level of authoring experience.

Given how simple our syntax is, I'd like us to strive to make it trivial to analyze and detect as many kinds of errors as possible before resorting to more complex static analysis methods. (Which I also hope will be possible in the future.)

  • Warn the translator that they're using a fooo variable that the source message didn't use.

This is a good point to keep in mind, thanks. We can indeed expect most of the translator-oriented use-cases to have access to the original source message. Here again, however, I feel like there's an opportunity to leverage the simplicty of our syntax to allow some use-cases to not even need the source message. For example, linting, pretty-formatting, and certain correctness checks could be performed on translated messages only, without access to the source—simplifying the logic and infrastructure required to run it.

@eemeli
Copy link
Collaborator

eemeli commented Jul 3, 2023

Given how simple our syntax is, I'd like us to strive to make it trivial to analyze and detect as many kinds of errors as possible before resorting to more complex static analysis methods. (Which I also hope will be possible in the future.)

I agree with this approach in principle, but I find the cost in terms of complexity -- adding a second "variable" prefix -- to be significantly greater than the benefits.

@stasm
Copy link
Collaborator Author

stasm commented Jul 25, 2023

My thinking evolves as we discuss other topics in parallel. In #310 (comment) I said:

In the teleconference yesterday I said namespacing and shadowing were orthogonal and I suggested to make a decision about namespecing first. However, I now see that they're likely tightly-coupled, unfortunately. That's beause namespacing without a discussion about shadowing directly contradicts the first requirement above: Be able to re-annotate input variables under the same name without having to rename them in the message body. We need to pair with something like the modify keyword solution.

I'd like to expand on this and I choose to do it in this issue because I'm going to argue with myself from a month ago :)

The goal of this request is to allow detecting references to undefined local variables. Specifically, under the current syntax, in the following message we have to allow the possibility that $fooo is a valid input variable rather than a mistyped reference to $foo.

let $foo = {1}
{Hello, {$fooo} and {$user}.}

The proposal presented in this issue was to introduce namespacing:

let $$foo = {1}
{Hello, {$$fooo} and {$user}.}

This does indeed allow tooling to detect $$fooo as a mistyped reference to $$foo — but this is only half of the problem, as many have pointed out in #310 and elsewhere.

The real problem is that it's easy to forget to use the local $$foo in the first place.

One of the potential solutions discussed in #310 proposes to introduce new syntax for re-annotating variables under the same name, i.e. without introducing a new name into the scope:

modify {$count :number maxFractionDigits=0}

This has the benefit of helping ensure that the same annotated $count is used throughout the message body: both for selection and for formatting. You can't accidentally use the original $count for selection and the annotated $$count for formatting.

It also sidesteps the need to create new local variables in what I think will be the majority of use-cases that we imagined for them. I expect local variables would become rarely used because re-annotations would serve these use-cases well enough.


Local variables would still be needed to allow creating new names for new things, and I think we could then drop strict namespacing and rely on conventions:

let $_hostName = {$host :person firstName=long}
let $_guestName = {$guest :person firstName=long}
let $_guestsOther = {$guestCount :number offset=1}

To summarize:

  • namespacing solves only half of the problem of statically detecting mistakes in variable references,
  • namespacing is an expensive solution if we choose to do it with a new sigil, like $$ or #,
  • if we go in the direction of introducing new syntax for re-annotation, like modify or anything else, perhaps we don't actually need strict namespacing and can instead rely on naming conventions?

@aphillips aphillips added the syntax Issues related with syntax or ABNF label Jul 29, 2023
@aphillips
Copy link
Member

So... much has happened since we last touched this issue, including the creation of .local and making variables immutable. However, I suspect that the correct resolution of this issue will be to reject it. The problem is that we allow undeclared variables to be used and local variables remain syntactically identical to input variables.

I think my question would be: why do we need static analysis of only local variables? What purpose does that serve?

Suggest that we close this. @stasm what do you think?

@aphillips aphillips added the resolve-candidate This issue appears to have been answered or resolved, and may be closed soon. label Dec 5, 2023
@aphillips aphillips added the Stale Obsolete? label Jan 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
resolve-candidate This issue appears to have been answered or resolved, and may be closed soon. Stale Obsolete? syntax Issues related with syntax or ABNF
Projects
None yet
Development

No branches or pull requests

5 participants