Description
I've brought up this idea several times before informally. I'm filing this issue now for the formal documentation trail.
Currently, Go permits octal, decimal, and hexadecimal integer literals. There's a pending proposal for binary integer literals (#19308) which has wide support.
Proposal:
This is a fully backward-compatible proposal for arbitrary-radix integer literals. We change the integer literal syntax to the following:
int_lit = decimal_lit | octal_lit | radix_lit .
decimal_lit = ( "1" … "9" ) { decimal_digit } .
octal_lit = "0" { octal_digit } .
radix_lit = radix ( "x" | "X" ) radix_digit { radix_digit } .
radix = decimal_lit .
with
radix_digit = "0" … "9" | "A" … "Z" | "a" … "z" .
representing the digit values 0 to 35 (for a maximum radix of 36). The radix must be a decimal literal between 0 and 36, expressing the radix; with the radix value 0 having the same meaning as 16, and the value 1 being invalid.
Examples:
0x10 // same as 16x10 or 16
2x1001 // binary integer literal, same as 9
3x010 // ternary integer literal, same as 3
8x066 // octal integer literal, same as octal 066 or 54
36xz // integer literal in base 36, value is 35
Discussion:
The beauty of this approach is that it permits arbitrary radix notation, thus removing any future need to expand this again, remove the need for the extra notation for hexadecimal numbers because they are just part of this notation, and at the same time it's fully backward-compatible. The commonly accepted notation for binary integer literals and the respective notation here have the same length and the proposed notation here seems just as intuitive (e.g., 0b1001100 == 2x1001100).
We could go a step further and remove octal literals from the language since they are also easily expressed with this notation, but that's a step that would not be backward-compatible. One way to make that happen w/o introducing bugs would be to disallow non-zero decimal numbers that start with a 0; octal numbers in existing code would then lead to a compiler error and could be fixed. It would also be trivial to have them fixed automatically with a simple tool. Finally, removing octals would eliminate another (albeit mostly academic issue) with them; see #28253. If octals were not supported anymore, one could condense the integer literal syntax to:
int_lit = decimal_digit { decimal_digit } [ ( "x" | "X" ) radix_digit { radix_digit } ] .
Implementation:
The implementation is straight-forward. It would likely slightly simplify some of the scanning code for numeric literals because with this proposals now all such literals simply start with a decimal_lit always. If that value is zero, or between 2 and 36, a subsequent 'x' indicates the actual literal value in that radix. The respective number conversion routines are trivial and would need minimal adjustments.
Impact:
Hard to say. It may be sufficient to just add another notation for binary integer literals per #19308. Or we could do this and lay the issue to rest for good.
Activity
cespare commentedon Oct 17, 2018
In Go, I have never wanted to write an integer literal with radix other than 2, 8, 10, or 16. I have also never read code that would have used such literals, had they existed. Therefore, the benefit seems extremely low.
The fact that the existing hexadecimal syntax doesn't fit directly into the proposed syntax but requires a special case of 0 ≡ 16 significantly detracts from the appeal.
dr2chase commentedon Oct 17, 2018
I like the idea of removing the leading-zero octal notation.
That's a source of annoying errors, and simplifies explaining the language for new users ("don't do this, you'll be surprised" vs not mentioning alternate base notation till it is needed).
griesemer commentedon Oct 17, 2018
@cespare I would have formulated your 2nd paragraph slightly differently:
The fact that the existing hexadecimal syntax neatly fits directly into the proposed syntax significantly adds to the appeal.
:-)
beoran commentedon Oct 18, 2018
While I see the appeal of having a consistent syntax, I fear this would become a very obscure feature. I never felt then need for anything else but binary, octal, decimal and hexadecimal integer constants. Binary integer literals are useful in many cases involving bit twiddling, octal is useful for file permissions, hexadecimal is useful for compact notation of bytes. But trinary or twentyone-ary, seems to be useful for obfuscation only.
I do like the idea of changing then notation for octals, now it's still the confusing C notation. And I do like the uniform notation you propose. I would just disallow anything else than base 2, 8, 10 and 16 to avoid such obfuscation.
Otherwise, could you please show us a few production open source code bases where the use of such arbitrary radix integer constants would have been beneficial?
griesemer commentedon Oct 18, 2018
I'd be ok with the restriction to 2, 8, 10, and 16, but why? It would make things (a tiny bit) more complicated; the only reason I'd see is that it might perhaps eliminate errors (somebody might write 9x066 rather than 8x066 for a file permission).
I agree that most programmers may not care much about the flexibility here, they'll be just fine that they can write down numbers in all the commonly used radixes (2, 8, 10, 16) w/o extra cost (one extra char for octal) and use a single, uniform notation.
Personally, I think that not having arbitrary radix notation is what prevents us from thinking it might be useful. Now usefulness alone is not a criteria for adding something to the language, but it this case it would address the desire for a binary notation and simplify what we already have, and remove restrictions. Seems like a win-win to me. Keep in mind that there's really strong support for adding binary integer literals, so no matter what, we'd have to make changes in all the same places. The difference is just whether we add one more special case, or whether we simplify all the code in favor for a uniform notation.
Finally, there's also the educational aspect of Go: Having a simple, uniform mechanism here rather than an agglomeration of historical notations seems like a nice cleanup.
Btw., Smalltalk supports arbitrary radix notation, too, using the same syntax but with an 'r' instead of an 'x'. Using the 'x' permits the most common other base notation fit neatly into the system.
randall77 commentedon Oct 18, 2018
Because that's 32 = 36-4 fewer bases you need to understand when reading code.
23xag56m
? It gets very confusing very quickly. I think I'd rather see((((10*23+16)*23 + 5)*23 + 6)*23 + 22
or something (an exponent operator would help here).Hexidecimal is certainly useful. Binary and octal seem marginally useful. Other bases just don't seem useful at all. Certainly their value isn't worth burdening the reader with them.
beoran commentedon Oct 18, 2018
cespare commentedon Oct 18, 2018
I don't think we should use this proposed syntax with such a restriction. I think that, if anything, we should just add the
0b
syntax for binary literals and be done with it (then Go will have all of base 2, 8, 10, and 16 literals).I don't agree that this proposal is uniform; it introduces more ways of writing the same integer literals:
0x2a
and16x2a
.griesemer commentedon Oct 19, 2018
@beoran I don't know of a Smalltalk playground offhand (which doesn't require installation), but there is of course Squeak (https://en.wikipedia.org/wiki/Squeak). For documentation see the famous "Blue Book", http://stephane.ducasse.free.fr/FreeBooks/BlueBook/Bluebook.pdf, literals with radixes are described on page 19. And the examples there are limited to radix 8 and 16.
Again, I have no strong feelings regarding restricting a radix to 2, 8, 10, 16, but I also don't think it matters much - people won't use crazy radixes for no good reason. (I suspect it's the small radixes that are interesting. For instance, I can see how I'd use a small-n (3, 5, etc.) radix to encode multiple values of n states in a single int, e.g. for some state on a game board.)
In summary, it really doesn't matter all that much; what people seem to want is binary integer literals, and there's a specific proposal for that. It happens to do what all other languages do (which is good) but it also happens to introduce yet another notation. I've submitted this proposal because I think it's a viable alternative. Especially if we're considering removing/improving the octal notation (which would be a Go 2 item) we'd have to have some replacement. This proposal would resolve all those issues in one fell swoop. Personally, I think this is a more elegant approach for the whole problem of different radix integers, but I'm biased, of course.
I think the decisions that need to be made are:
I think the decision for 2) should take into account:
griesemer commentedon Oct 19, 2018
@cespare Not to be facetious, but with the 0b notation there will also forever be two ways of writing a "hex" number: 0x2a and 0b00101010 . I'd see that as much bigger problem - there will be plenty of people arguing that one is better than the other. Realistically, with the radix notation, people will stick to the shorter 0x notation rather than 16x (but either way, the actual hex number looks the same).
What you are saying really was one of the reasons for not including 0b from day one: There's already a suitable notation, namely 0x.
josharian commentedon Oct 19, 2018
There is also the suggestion to support intN for all N from @jimmyfrasche:
And several real world uses immediately occurred to me:
I can see game states similarly benefitting from intN.
In contrast, I can't think of any real world use cases for arbitrary radix constants. Just another data point.
beoran commentedon Oct 19, 2018
To answer your questions, I think, 1. yes we need binary constants because they are useful for bit masks and other bit twiddling. And 3. Dropping C style octals and replacing them is a good idea, because C style octals are a source of beginner bugs. Though I would probably go for 0o765 notation, although seeing the Smalltalk precedent 08x765 would also be ok.
As for 2. Actually I don't care too much either way about the notation, as long as we limit it to bases 2, 8, 16 and maybe 10.
36 remaining items