Skip to content

Simplify number parsing #386

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
May 25, 2024
Merged

Conversation

chqrlie
Copy link
Collaborator

@chqrlie chqrlie commented Apr 16, 2024

  • use single test in js_strtod loop.
  • use more explicit ATOD_xxx flags
  • remove ATOD_TYPE_MASK, use ATOD_WANT_BIG_INT instead
  • remove unused arguments flags and pexponent in js_string_to_bigint
  • merge js_atof and js_atof2, remove slimb_t *pexponent argument
  • simplify and document js_atof parser, remove cumbersome labels,
  • simplify js_parseInt test for zero radix for ATOD_ACCEPT_HEX_PREFIX
  • simplify next_token number parsing, handle legacy octal in parser only
  • simplify JS_StringToBigInt, use flags only.
  • remove unused slimb_t exponent token field
  • add number syntax tests

Copy link
Contributor

@bnoordhuis bnoordhuis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only a partial review. I ran out of time.

quickjs.c Outdated
Comment on lines 10160 to 10162
#define ATOD_TYPE_MASK (1 << 8)
#define ATOD_TYPE_FLOAT64 (0 << 8)
#define ATOD_TYPE_BIG_INT (1 << 8)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since you're here... I never really liked this ATOD_TYPE_MASK thing because it's not really a mask, just a single bit discriminator to distinguish between doubles and bigints (and it's only used in one place.)

Related suggestion: replace int atod_type with BOOL want_bigint

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed the API and now use a flag ATOD_WANT_BIG_INT

/* Return an exception in case of memory error.
Return JS_NAN if invalid syntax */
// TODO(chqrlie) need more precise error message
static JSValue js_atof(JSContext *ctx, const char *p, const char *end,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

js_atof mutates radix in a number of places. Shouldn't it do that only when radix != 0?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

radix was modified in different places for specific reasons, but only if 0 was passed as the argument value.
I changed the API to always pass the default radix in the range 2 to 36 and pass flags for optional prefix parsing for 0x, 0b and 0o. I also simplified legacy octal support in the parser only.

else
return js_int32(0);
}
if (*p == '+' || *p == '-') {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Out of bounds reads from here on when p == end? Maybe it's better to get rid of end altogether and mandate that p points to a zero-terminated buffer? (Zero or some other sentinel.)

Only one call site (next_token) seems to pass a non-null end.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I Agree the API is clunky. I shall try and simplify this.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, after careful review, passing the end of the array is required to distinguish embedded null bytes from the final null byte. I am adding an explanatory comment:

   - `p` points to a null terminated UTF-8 encoded char array
   - `end` points to the end of the array.
   There is a null byte at `*end`, but there might be embedded null bytes
   between `p` and `end` which must cause an exception if the
   `ATOD_NO_TRAILING_CHARS` flag is not present.

@chqrlie chqrlie force-pushed the simplify-js-atof branch 2 times, most recently from 9993d40 to 63ec7ee Compare May 7, 2024 19:00
@chqrlie chqrlie requested a review from bnoordhuis May 8, 2024 10:33
- use single test in `js_strtod` loop.

- use more explicit `ATOD_xxx` flags
- remove `ATOD_TYPE_MASK`, use `ATOD_WANT_BIG_INT` instead
- remove unused arguments `flags` and `pexponent` in `js_string_to_bigint`
- merge `js_atof` and `js_atof2`, remove `slimb_t *pexponent` argument
- simplify and document `js_atof` parser, remove cumbersome labels,
- simplify `js_parseInt` test for zero radix for `ATOD_ACCEPT_HEX_PREFIX`
- simplify `next_token` number parsing, handle legacy octal in parser only
- simplify `JS_StringToBigInt`, use flags only.
- remove unused `slimb_t exponent` token field
- add number syntax tests
@chqrlie chqrlie force-pushed the simplify-js-atof branch from 63ec7ee to 1e1635f Compare May 25, 2024 22:03
@chqrlie chqrlie merged commit 139b51f into quickjs-ng:master May 25, 2024
47 checks passed
@chqrlie chqrlie deleted the simplify-js-atof branch May 25, 2024 22:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants