Skip to content

Allow underscores in numeric literals #2324

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 8 commits into from
Oct 23, 2019

Conversation

seriyps
Copy link
Contributor

@seriyps seriyps commented Jul 17, 2019

Just draft to get some feedback. Can update docs, erlang-mode.el and tests if we are ok to proceed.

This is to make long numeric literals more readable. The idea is borrowed from Rust language (rust spec)

Examples:

  • 123_456_789 vs 123456789
  • 123_456.789_123 vs 123456.789123
  • 16#123_ABC vs 16#123ABC
  • 2#1100_0011_0101 vs 2#110000110101

One minor problem with current implementation is that erl_scan:string("123_456", text) will return {integer,[{text,"123456"},<..>],123456} - so, text will be "123456" and not original "123_456", but this can be fixed, if it's important.

To make long integer literals more readable.

Examples:
* 123_456_789
* 123_456.789_123
* 16#123_ABC
@CLAassistant
Copy link

CLAassistant commented Jul 17, 2019

CLA assistant check
All committers have signed the CLA.

@mikpe
Copy link
Contributor

mikpe commented Jul 18, 2019

I welcome this in principle. The digit separator feature has been around since at least Ada83, and is currently present in about 7 or 8 additional languages according to a proposal to add them to "Go".

I would suggest to not allow $_ wherever a digit is expected, but only as a separator between digits (the Ada rule).

See https://go.googlesource.com/proposal/+/master/design/19308-number-literals.md section "Digit Separators" for some background.

@seriyps
Copy link
Contributor Author

seriyps commented Jul 18, 2019

@mikpe I also think it doesn't makes much sense to support this in base. In current implementation you can have 1_6#12F just because it was easier to implement.

I would suggest to not allow $_ wherever a digit is expected, but only as a separator between digits (the Ada rule).

So, only allow it if chars left and right from $_ are digits?

With current implementation all of this should work, but doesn't make much sense:

  • 123___456 (more than one underscore)
  • 123_.456
  • 123._456
  • 1_6#12F
  • 16#_12F
  • 123_
  • 123.456_
  • 16#12F_

Leading underscore _12345 will be interpreted as variable name.

@mikpe
Copy link
Contributor

mikpe commented Jul 18, 2019

Yes, leading, trailing, or repeated underscores should not be allowed. That's what's meant by being a separator between groups of digits.

In Erlang the base is simply a decimal numeral, and I would prefer to allow underscores there for consistency -- I don't think it causes any problems.

@seriyps
Copy link
Contributor Author

seriyps commented Jul 19, 2019

@mikpe I updated my implementation according to your comments (as a separate commit for now).

File that I was using to test it:

-module(mydigits).

-export([ok/0, fail/0]).

ok() ->
    [123_456,
     123_456_789,
     123_456.789,
     123.456_789,
     1.2_345e10,
     1.234e1_06,
     16#123_ABC_456_DEF_789,
     2#0011_0101_0011,
     1_6#123ABC].

fail() ->
    Results = lists:map(
      fun erl_scan:string/1,
      [
       "123_",
       "123_456_",
       "123__456",
       "123_.456",
       "123._456",
       "123.456_",
       "123._",
       "1._23e10",
       "1.23e_10",
       "1.23e10_",
       "16_#123ABC",
       "16#123_",
       "16#_123",
       "16#ABC_",
       "16#_ABC",
       "2#_0101",
       "2#0101_"
      ]),
    {Results,
     lists:filter(fun({ok, [{Tok, _, _}]}) ->
                       (Tok == integer) orelse (Tok == float);
		     (_) ->
		       false
		  end, Results)}.

In short: it's allowed as long as there is a digit from both sides of _ (digit can be just [01] for 2#... and up to [a-zA-Z] for 36#...).

@@ -940,13 +940,16 @@ escape_char(C) -> C.

scan_number([C|Cs], St, Line, Col, Toks, Ncs) when ?DIGIT(C) ->
scan_number(Cs, St, Line, Col, Toks, [C|Ncs]);
scan_number([$_,Next|Cs], St, Line, Col, Toks, [Prev|_]=Ncs) when
?DIGIT(Next) andalso ?DIGIT(Prev) ->
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

might be more efficient to rewrite (here and in other cases) as

scan_number([Pre, $_, Post | Cs], ...) when ?DIGIT(Pre), ?DIGIT(Post) ->
    scan_number(..., [Post, $_, Pre | Ncs]);

and put as 1st function clause.

Should I?

Copy link
Contributor

@mikpe mikpe Jul 21, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that would pessimize the common case (no $_), while not really optimizing the $_ case. The current code looks fine to me.

@rickard-green rickard-green added the team:VM Assigned to OTP team VM label Jul 22, 2019
@uabboli uabboli self-assigned this Sep 26, 2019
@seriyps
Copy link
Contributor Author

seriyps commented Sep 26, 2019

I guess it makes sense to update documentation as well. At least here: http://erlang.org/doc/reference_manual/data_types.html#number
Any other suggestions?

Remove underscores from numbers only when necessary.
Avoid some memory allocation when scanning based integers.

(This has nothing to do with the new underscore-in-integers feature.)
@uabboli
Copy link
Contributor

uabboli commented Sep 30, 2019

We've had a Technical Board (TB) meeting discussing your PR, and the
bottom line is that we think it's a nice thing to have, and that it's
worth the effort to finish it up. We hope you are willing to put in
some more work. What follows is a summary of the conclusions of the
meeting.

Since it's a language extension, we think there should be an EEP (see
https://github.com/erlang/eep).

We agree with @mikpe: underscore is to be allowed between digits (as a
consequence repeated underscores are not allowed). They are to be
allowed in the base of integer numbers and the exponent of floats, if
that makes the implementation simpler.

The underscores are to be lost when representing abstract forms or
terms as strings (erl_pp, io_lib).

As you suggest, the Erlang Reference Manuals needs to be updated
(section 3.2 Number).

The Erlang mode for Emacs needs to be updated.

The Erlang Debugger uses wxWidgets, which in turn uses Scintilla. It
is not a requirement for this PR to update the lexer used in Scintilla.

Beside these general comments from the TB, I've looked at your code
and fixed a bug. When at it, I've also optimized the code a bit. I'll
see if I can update your branch with my changes (3 commits).

@seriyps
Copy link
Contributor Author

seriyps commented Sep 30, 2019

we think it's a nice thing to have

Great news, thanks!

Since it's a language extension, we think there should be an EEP (see https://github.com/erlang/eep).

Oh, I will try, yes.

Erlang Reference Manuals needs to be updated (section 3.2 Number)

Will do

The Erlang mode for Emacs needs to be updated.

The only place I found in erlang.el is:

otp/lib/tools/emacs/erlang.el

Lines 1227 to 1230 in a62aed8

(list (if erlang-regexp-modern-p
"\\_<[0-9]+#\\([0-9a-zA-Z]+\\)"
"\\<[0-9]+#\\([0-9a-zA-Z]+\\)")
1 nil t)

I guess it's responsible for highlighting 16#af_56 as numeric constant and not record? Will look at it, but please let me know if I'm missing something else.

Should I care about fun my_mod:my_fun/12_34? (Apparently, this syntax is also now supported with this PR).

and fixed a bug

a-ha! That's why I was not able to use test_string in tests. Now I see. Thanks!

@uabboli
Copy link
Contributor

uabboli commented Oct 1, 2019

I guess it's responsible for highlighting 16#af_56 as numeric constant
and not record? Will look at it, but please let me know if I'm missing
something else.

Should I care about fun my_mod:my_fun/12_34? (Apparently, this syntax
is also now supported with this PR).

I have to admit that I don't know much about the Emacs mode. If you
can fix the simple cases I think it's good enough for this PR. We at
OTP don't have much time to help with the Emacs mode, so if you run
into trouble, we'll have to rely on contributions from more informed
people (as we've done for quite some time now).

@seriyps
Copy link
Contributor Author

seriyps commented Oct 9, 2019

EEP was submitter to mailing list as well as in a form of PR to eep github repository.

@seriyps
Copy link
Contributor Author

seriyps commented Oct 17, 2019

I updated the docs and emacs erlang-mode. Is there anything I can do now?

@seriyps seriyps force-pushed the underscore-numbers branch from 5d5d043 to 9d545d4 Compare October 17, 2019 09:14
@uabboli
Copy link
Contributor

uabboli commented Oct 18, 2019

I think it looks good.

I cannot make the emacs mode work. In fact we (I and Dan) cannot see
any improvement. I think it's better to leave erlang.el without any changes,
unless you can show that the modifications actually make a difference.

The examples in the docs are not colored correctly. I've been
informed that the program we use can be found at
https://highlightjs.org/.

My take on this PR is that we can manage without updated highlightings
(Emacs, Debugger, docs). Later contributions are welcome, of course.

Thank you for your effort!

@seriyps
Copy link
Contributor Author

seriyps commented Oct 18, 2019

I cannot make the emacs mode work. In fact we (I and Dan) cannot see
any improvement. I think it's better to leave erlang.el without any changes,
unless you can show that the modifications actually make a difference.

The only difference for emacs mode is how it highlights 16#abcd_cafe_fefe: without this modification it will highlight _cafe_fefe as a record and with modification it won't apply highlighting.
before:
image

after:

image

So, if you think emacs part should be skipped, should I remove emacs-related commit or you will take care of it?

@seriyps
Copy link
Contributor Author

seriyps commented Oct 18, 2019

regarding highlightjs, yep, it would be relatively easy to fix in the upstream:

https://github.com/highlightjs/highlight.js/blob/2073d6b314aa63d44007e883194c27bcd0859015/src/languages/erlang.js#L23

https://github.com/highlightjs/highlight.js/blob/2073d6b314aa63d44007e883194c27bcd0859015/src/languages/erlang-repl.js#L25

but then the version of the lib that is bundled needs to be updated in Erlang as well like it was done here 5e30a83

but it will be a separate PR i guess

@uabboli
Copy link
Contributor

uabboli commented Oct 22, 2019

So, if you think emacs part should be skipped, should I remove
emacs-related commit or you will take care of it?

I'll take of it.

Again, thanks for your effort!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
team:VM Assigned to OTP team VM
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants