-
Notifications
You must be signed in to change notification settings - Fork 18k
Unable to use some Hindi unicode characters in source code as identifiers #42830
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
spec:
The invalid characters are classed as (forgive the mangling of the characters)
|
Variable names needs to have letters in them
so as it was pointed out this is working as expected. |
Wait, this is too early. I am drafting my reply. |
If it turns out this is a bug we'll re-open. We usually quickly close by default. |
Okay, let me explain. There is a misconception regarding the Hindi language here. Hindi language characters are divided into two categories: Swar(s) and Matra(s). Here is the list of some Hindi literals which are Matra(s): ऀ ँ ं ः ऺ ऻ ़ ा ि ी ु ू ृ ॄ ॅ ॆ े ै ॉ ॊ ो ौ ् ॎ ॏ ॕ ॖ ॗ Matra(s) do not make in sense in written form. But they do make sense when preceded by Swar(s). See the example: ौ = Not valid Hindi literal जौ = A valid Hindi literal Therefore, Matra(s) preceding any Swar are valid identifiers. But the existing Go compiler is not allowing that. It is treating Matras(s) differently. So, पोर्ट_नंबर := 5432 The above code is valid! There is a bug in the compiler. |
The compiler is correctly implementing the Go spec which depends on the Unicode spec. Even if it is a valid Hindi literal, it is not composed purely of Unicode Letters.
|
It doesn't matter if those character combine with others to make up letters. From the compiler point of view, every character in identifiers needs to be a letter. These characters are not letters, and even if theoretically they just "combine" with the previous letter, they're still there. We are aware of the fact that this prevents people from writing certain words in certain scripts. See for example: #194 (Allow Unicode combining characters in identifiers). We also have a FAQ on this:
So, as it was pointed out: the compiler is behaving according to the language specification. Combining character are intentionally excluded. We are aware that this can cause some issues in certain scripts, but whether to change the rules (and how) is a different matter. The current behaviour is aligned to the language spec. |
Okay, that rule needs to be changed, seriously! This rule is explicitly breaking other languages. What is the appropriate place to raise this issue? |
Essentially you want to re-open #194 for consideration. This issue tracker is the right place to do it. For big changes, we generally prefer a well laid out proposal (see here: https://github.com/golang/proposal). Not breaking any existing Go code is extremely important and it's likely that any proposal that does it lightly will be rejected. Note that any proposal about changing which identifiers are allowed should probably at least consider solving the other big issue currently Go has:
For example if you can write |
I believe the relevant issue here is #20706. |
Ah, there it is. Thanks Ian, I didn't remember we already had an issue for this. |
What version of Go are you using (
go version
)?Does this issue reproduce with the latest release?
I don't know.
What operating system and processor architecture are you using (
go env
)?go env
OutputWhat did you do?
As per the spec, I should be able to use Unicode characters in source code but unable to do so as seen here.
What did you expect to see?
I expected the program to compile as usual. But when I remove some specific Hindi characters from the identifier:
पोर्ट_नंबर
toपरट_नबर
, the code compiles to give output:What did you see instead?
The text was updated successfully, but these errors were encountered: