Skip to content

proposal: arbitrary-radix integer literals #28256

Closed
@griesemer

Description

@griesemer

I've brought up this idea several times before informally. I'm filing this issue now for the formal documentation trail.

Currently, Go permits octal, decimal, and hexadecimal integer literals. There's a pending proposal for binary integer literals (#19308) which has wide support.

Proposal:

This is a fully backward-compatible proposal for arbitrary-radix integer literals. We change the integer literal syntax to the following:

int_lit = decimal_lit | octal_lit | radix_lit .
decimal_lit = ( "1" … "9" ) { decimal_digit } .
octal_lit = "0" { octal_digit } .
radix_lit = radix ( "x" | "X" ) radix_digit { radix_digit } .
radix = decimal_lit .

with

radix_digit = "0" … "9" | "A" … "Z" | "a" … "z" .

representing the digit values 0 to 35 (for a maximum radix of 36). The radix must be a decimal literal between 0 and 36, expressing the radix; with the radix value 0 having the same meaning as 16, and the value 1 being invalid.

Examples:

0x10   // same as 16x10 or 16
2x1001 // binary integer literal, same as 9
3x010  // ternary integer literal, same as 3
8x066  // octal integer literal, same as octal 066 or 54
36xz   // integer literal in base 36, value is 35

Discussion:

The beauty of this approach is that it permits arbitrary radix notation, thus removing any future need to expand this again, remove the need for the extra notation for hexadecimal numbers because they are just part of this notation, and at the same time it's fully backward-compatible. The commonly accepted notation for binary integer literals and the respective notation here have the same length and the proposed notation here seems just as intuitive (e.g., 0b1001100 == 2x1001100).

We could go a step further and remove octal literals from the language since they are also easily expressed with this notation, but that's a step that would not be backward-compatible. One way to make that happen w/o introducing bugs would be to disallow non-zero decimal numbers that start with a 0; octal numbers in existing code would then lead to a compiler error and could be fixed. It would also be trivial to have them fixed automatically with a simple tool. Finally, removing octals would eliminate another (albeit mostly academic issue) with them; see #28253. If octals were not supported anymore, one could condense the integer literal syntax to:

int_lit = decimal_digit { decimal_digit } [ ( "x" | "X" ) radix_digit { radix_digit } ] .

Implementation:

The implementation is straight-forward. It would likely slightly simplify some of the scanning code for numeric literals because with this proposals now all such literals simply start with a decimal_lit always. If that value is zero, or between 2 and 36, a subsequent 'x' indicates the actual literal value in that radix. The respective number conversion routines are trivial and would need minimal adjustments.

Impact:

Hard to say. It may be sufficient to just add another notation for binary integer literals per #19308. Or we could do this and lay the issue to rest for good.

Activity

added this to the Proposal milestone on Oct 17, 2018
cespare

cespare commented on Oct 17, 2018

@cespare
Contributor

In Go, I have never wanted to write an integer literal with radix other than 2, 8, 10, or 16. I have also never read code that would have used such literals, had they existed. Therefore, the benefit seems extremely low.

The fact that the existing hexadecimal syntax doesn't fit directly into the proposed syntax but requires a special case of 0 ≡ 16 significantly detracts from the appeal.

dr2chase

dr2chase commented on Oct 17, 2018

@dr2chase
Contributor

I like the idea of removing the leading-zero octal notation.
That's a source of annoying errors, and simplifies explaining the language for new users ("don't do this, you'll be surprised" vs not mentioning alternate base notation till it is needed).

griesemer

griesemer commented on Oct 17, 2018

@griesemer
ContributorAuthor

@cespare I would have formulated your 2nd paragraph slightly differently:

The fact that the existing hexadecimal syntax neatly fits directly into the proposed syntax significantly adds to the appeal.

:-)

beoran

beoran commented on Oct 18, 2018

@beoran

While I see the appeal of having a consistent syntax, I fear this would become a very obscure feature. I never felt then need for anything else but binary, octal, decimal and hexadecimal integer constants. Binary integer literals are useful in many cases involving bit twiddling, octal is useful for file permissions, hexadecimal is useful for compact notation of bytes. But trinary or twentyone-ary, seems to be useful for obfuscation only.

I do like the idea of changing then notation for octals, now it's still the confusing C notation. And I do like the uniform notation you propose. I would just disallow anything else than base 2, 8, 10 and 16 to avoid such obfuscation.

Otherwise, could you please show us a few production open source code bases where the use of such arbitrary radix integer constants would have been beneficial?

griesemer

griesemer commented on Oct 18, 2018

@griesemer
ContributorAuthor

I'd be ok with the restriction to 2, 8, 10, and 16, but why? It would make things (a tiny bit) more complicated; the only reason I'd see is that it might perhaps eliminate errors (somebody might write 9x066 rather than 8x066 for a file permission).

I agree that most programmers may not care much about the flexibility here, they'll be just fine that they can write down numbers in all the commonly used radixes (2, 8, 10, 16) w/o extra cost (one extra char for octal) and use a single, uniform notation.

Personally, I think that not having arbitrary radix notation is what prevents us from thinking it might be useful. Now usefulness alone is not a criteria for adding something to the language, but it this case it would address the desire for a binary notation and simplify what we already have, and remove restrictions. Seems like a win-win to me. Keep in mind that there's really strong support for adding binary integer literals, so no matter what, we'd have to make changes in all the same places. The difference is just whether we add one more special case, or whether we simplify all the code in favor for a uniform notation.

Finally, there's also the educational aspect of Go: Having a simple, uniform mechanism here rather than an agglomeration of historical notations seems like a nice cleanup.

Btw., Smalltalk supports arbitrary radix notation, too, using the same syntax but with an 'r' instead of an 'x'. Using the 'x' permits the most common other base notation fit neatly into the system.

randall77

randall77 commented on Oct 18, 2018

@randall77
Contributor

I'd be ok with the restriction to 2, 8, 10, and 16, but why?

Because that's 32 = 36-4 fewer bases you need to understand when reading code.

23xag56m? It gets very confusing very quickly. I think I'd rather see ((((10*23+16)*23 + 5)*23 + 6)*23 + 22 or something (an exponent operator would help here).

Hexidecimal is certainly useful. Binary and octal seem marginally useful. Other bases just don't seem useful at all. Certainly their value isn't worth burdening the reader with them.

beoran

beoran commented on Oct 18, 2018

@beoran
cespare

cespare commented on Oct 18, 2018

@cespare
Contributor

I'd be ok with the restriction to 2, 8, 10, and 16, but why?

I don't think we should use this proposed syntax with such a restriction. I think that, if anything, we should just add the 0b syntax for binary literals and be done with it (then Go will have all of base 2, 8, 10, and 16 literals).

a single, uniform notation

I don't agree that this proposal is uniform; it introduces more ways of writing the same integer literals:

  • As you mention in your proposal, the existing octal syntax doesn't match, so there will be two different ways of writing octal integers unless we take the further, backward-incompatible step of removing the current octal syntax.
  • The current way of writing hex integers doesn't exactly fit into the scheme, so the proposal includes a special case for 0 to have the same meaning as 16. There will forever be two ways of writing hex integers: 0x2a and 16x2a.
griesemer

griesemer commented on Oct 19, 2018

@griesemer
ContributorAuthor

@beoran I don't know of a Smalltalk playground offhand (which doesn't require installation), but there is of course Squeak (https://en.wikipedia.org/wiki/Squeak). For documentation see the famous "Blue Book", http://stephane.ducasse.free.fr/FreeBooks/BlueBook/Bluebook.pdf, literals with radixes are described on page 19. And the examples there are limited to radix 8 and 16.

Again, I have no strong feelings regarding restricting a radix to 2, 8, 10, 16, but I also don't think it matters much - people won't use crazy radixes for no good reason. (I suspect it's the small radixes that are interesting. For instance, I can see how I'd use a small-n (3, 5, etc.) radix to encode multiple values of n states in a single int, e.g. for some state on a game board.)

In summary, it really doesn't matter all that much; what people seem to want is binary integer literals, and there's a specific proposal for that. It happens to do what all other languages do (which is good) but it also happens to introduce yet another notation. I've submitted this proposal because I think it's a viable alternative. Especially if we're considering removing/improving the octal notation (which would be a Go 2 item) we'd have to have some replacement. This proposal would resolve all those issues in one fell swoop. Personally, I think this is a more elegant approach for the whole problem of different radix integers, but I'm biased, of course.

I think the decisions that need to be made are:

  1. Do we want a binary integer literal notation? If no, both proposals are moot.
  2. If we have a yes for 1): Do we want to just add the 0b... notation, or alternatively do this proposal (with restrictions to 2, 8, 10, 16; or even just 2, 8, 16).

I think the decision for 2) should take into account:

  1. Do we want to do anything about octals? If no, both proposals are roughly equivalent. If yes, I believe this proposal is stronger as it will address octals uniformly.
griesemer

griesemer commented on Oct 19, 2018

@griesemer
ContributorAuthor

@cespare Not to be facetious, but with the 0b notation there will also forever be two ways of writing a "hex" number: 0x2a and 0b00101010 . I'd see that as much bigger problem - there will be plenty of people arguing that one is better than the other. Realistically, with the radix notation, people will stick to the shorter 0x notation rather than 16x (but either way, the actual hex number looks the same).

What you are saying really was one of the reasons for not including 0b from day one: There's already a suitable notation, namely 0x.

josharian

josharian commented on Oct 19, 2018

@josharian
Contributor

For instance, I can see how I'd use a small-n (3, 5, etc.) radix to encode multiple values of n states in a single int, e.g. for some state on a game board.)

There is also the suggestion to support intN for all N from @jimmyfrasche:

another way to handle this would be to create a new class of paramaterized integer types. This is bad syntax, but, for discussion, let's say it's I%N where I is in an integer type and N is an integer constant. All arithmetic with a value of this type is implicitly mod N.

And several real world uses immediately occurred to me:

When working on a RISC-V port, I wanted a uint12 type, since my instruction encoding components are 12 bits; that could have been uint % (1<<12). Lots of bit-manipulation, particularly protocols, could benefit from this.

I can see game states similarly benefitting from intN.

In contrast, I can't think of any real world use cases for arbitrary radix constants. Just another data point.

beoran

beoran commented on Oct 19, 2018

@beoran

To answer your questions, I think, 1. yes we need binary constants because they are useful for bit masks and other bit twiddling. And 3. Dropping C style octals and replacing them is a good idea, because C style octals are a source of beginner bugs. Though I would probably go for 0o765 notation, although seeing the Smalltalk precedent 08x765 would also be ok.
As for 2. Actually I don't care too much either way about the notation, as long as we limit it to bases 2, 8, 16 and maybe 10.

36 remaining items

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

      Development

      No branches or pull requests

        Participants

        @nathany@haiitch@josharian@beoran@cespare

        Issue actions

          proposal: arbitrary-radix integer literals · Issue #28256 · golang/go