-
-
Notifications
You must be signed in to change notification settings - Fork 2.8k
Proposal: Number literal separators #504
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Pushed an initial implementation to a new branch here. |
A note: it may be useful to allow other common numeral separators - only per project, not universally. For example, Indian numbers use comma:
https://en.wikipedia.org/wiki/Indian_numbering_system Trailing underscores could be allowed, for situations like:
which gives a hint about expected max range for a group of numbers. Or for alignment:
|
Not sure if there would be mich benefit for customisable separators. These are only really for visual grouping and not so much for accurate numeric localization. The current implementation does allow trailing dashes as part of the literal. Those examples of yours should work. |
@tiehuis: program with lot of hardcoded numbers (e.g. ballistic tables) may have better readability and higher chance of catching typos, due to familiar style. But this is not feature for everyone, and if ever implemented, it should allow ad-hoc project customisation. I imagine wild things like ability to avoid repeated numbers:
I have a hope that Zig's metaprogramming will enable these "tricks".
Great. |
I think this has pros and cons. Reading |
Another thing to think about is that a bare number literal is a math construct. |
Here's my alternative proposal that uses status quo: To compete with this Java: long hexBytes = 0xFF_EC_DE_5E;
long hexWords = 0xCAFE_F00D;
long maxLong = 0x7fff_ffff_ffff_ffffL;
byte nybbles = 0b0010_0101;
long bytes = 0b11010010_01101001_10010100_10010010; Here's the Zig: // ++--++--
const hex_in_8s = 0xFFECDE5E;
// ++++----
const hex_in_16s = 0xCAFEF00D;
// /--\/--\/--\/--\
const max_s64 = 0x7fffffffffffffff;
// hhhhllll
const nybbles = 0b00100101;
// 33333333222222221111111100000000
const bytes = 0b11010010011010011001010010010010; Takes up extra space, but the free-form nature of comments means you can write whatever you want there, which is arguably more powerful than only being able to group digits. I admit the I think my biggest objection to this proposal is that it introduces language complexity in the form of syntactic sugar without encouraging any different semantics. This proposal enables the subjective concept of making long literals easier to read by grouping digits in some way. Consider that there are even more ways to write a literal in Zig that allow even more expression of intent than this proposal. For example: const max_s64 = (1 << 63) - 1;
const bytes =
(0b11010010 << 24) |
(0b01101001 << 16) |
(0b10010100 << 8) |
(0b10010010 << 0);
// see https://github.com/gcc-mirror/gcc/blob/61eae75c6230c7df9fa3e935b2efadda61667c5f/libiberty/crc32.c#L70
const crc32_table: [256]u32 = comptime generate_crc32_table(); |
I have yet to see people doing things like this outside IOCCC. OTOH they go to great lengths to create helpful visual artifacts in the code, like column alignment. Technical authors do the same with numeric tables or math heavy texts. Why not to make it per project option? It someone fears he can switch it off. |
Instead of allowing a separator everywhere, we could be more restrictive and only allow single separators between digits. This actually seems to be pretty normal in other languages. Ada, C++, Ruby and Julia use this method. Regarding different region details. If there are implicit semantics behind the meaning of a number literal, separators actually may help convey to a reader that there are some implied extra details. Of course if they are just separating something without any specific meaning then that is a valid concern. Valid alternative. The main draws I see over just comments are two. A standardized way of doing this is a bonus and means we don't get different competing styles to represent the same thing (only a minor). The other would be that because literal separators are much easier to insert (one character vs. annotating an entire line) this means that it is probably more likely that they would be used vs. a comment-based approach, helping code readability. It also allows one to leave comments for more important details like why the particular value may have been chosen, for example. |
@tiehuis I really appreciate the writeup, and especially the fact that you went off and coded it. Your arguments are reasonable. But I'm going to have to go with keeping the language small and only 1 way to do things. |
If you use that reasoning, I'd personally drop the The only use case I've ever seen for octal is in unix file permissions which is something you could easily handle using constants. Anything you can represent in binary, you can just represent in hexadecimal. For example compare |
Just an FYI: this can be implemented with pretty tiny changes to the syntax and lexer. However, it would require an extra sentence to explain it in the documentation, and it does mean there are more ways to write equivalent integer literals than the already existing decimal, hex, octal, and binary literals. Personally, I think it would be worth it. The grammar changes from this:
to this:
And the lexer (or parser, depending on implementation) just needs to skip the underscores when evaluating the number. |
@andrewrk could you consider reopening this? After this issue was closed in 2017 almost every mainstream language has come to support this feature. If zig's goal is to replace C and become the new lingua franca, it make sense adopting the syntax that other languages are using. I've compiled an extensive list of languages that support
|
I'd just like to note that C isn't on that list. One of the things that differentiates C from most languages is, IMO, its simplicity. While this proposal is, itself, not complex, I find that languages aren't generally brought down by a few major changes, but by many minor ones. Imagine if a dozen similar changes to the grammar were made. Each one, on its own, is relatively benign; together, they remove everything that makes Zig what it is. If Zig were to adopt every minor change that "every mainstream" language supports, there wouldn't really be a point to Zig at all. |
@pixelherodev Also note that C doesn't have binary literals |
That's true, but it's also still possible to write out, say, 0xFF or 0x1FF, is it not? If you're writing out large binary strings manually, maybe you should switch to hex. Or, if numerical separators are important, here's an alternate proposal: a comptime function in zag (which, in case you haven't come across me using that term elsewhere, is what I've started calling the Zig standard library) which takes a string literal - like, say, Usage: const a = std.fmt.parseSeparatedInt("1f3a_3904_a9ca_299c", 16); This leaves the grammar as is, provides most (albeit not all) of the advantages of implementing it as a language feature, and slightly reduces how large a Zig compiler needs to be to compete with the current stage1. |
@pixelherodev I think encouraging parsing functions for something so elementary is the wrong way to go. Using something like Person A might do this:
Person B might do this:
However, as the reader of this code, how am I supposed to know what
|
The list of languages that @momumi compiled that support this feature is the most compelling argument i've seen. I don't think it's fair to support |
Had a go at implement this in #4741 This implementation is similar to the javascript version where So these are valid:
These are invalid:
|
This is found in many other languages, aimed at making longer literals easier to read at a glance by grouping together logical units within numbers. This is especially useful for the longer 128-bit and beyond literals that are available in zig.
I propose allowing a
_
separator anywhere in a number literal to align with being the simplest rule to understand. Numeric literals are parsed into values as if the separators were not present.Examples:
A more in-depth reference of other implementations can be found in the javascript proposal.
The text was updated successfully, but these errors were encountered: