Skip to content

Commit 220be10

Browse files
committed
gh-114667: Support hexadecimal floating point literals
This add hexadecimal floating point literals (IEEE 754-2008 §5.12.3) and support construction of floats from hexadecimal strings. Note that the syntax is more permissive: everything that is currently accepted by the ``float.fromhex()``, but with a mandatory base specifier; it also allows grouping digits with underscores. Examples: ```pycon >>> 0x1.1p-1 0.53125 >>> float('0x1.1') 1.0625 >>> 0x1.1 1.0625 >>> 0x1.1_1_1 1.066650390625 ``` Minor changes: * Py_ISDIGIT/ISXDIGIT macros were transformed to functions
1 parent a768e12 commit 220be10

File tree

14 files changed

+258
-89
lines changed

14 files changed

+258
-89
lines changed

Doc/library/functions.rst

Lines changed: 11 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -656,7 +656,8 @@ are always available. They are listed here in alphabetical order.
656656

657657
Return a floating point number constructed from a number or string *x*.
658658

659-
If the argument is a string, it should contain a decimal number, optionally
659+
If the argument is a string, it should contain a decimal number
660+
or a hexadecimal number, optionally
660661
preceded by a sign, and optionally embedded in whitespace. The optional
661662
sign may be ``'+'`` or ``'-'``; a ``'+'`` sign has no effect on the value
662663
produced. The argument may also be a string representing a NaN
@@ -672,11 +673,15 @@ are always available. They are listed here in alphabetical order.
672673
digitpart: `digit` (["_"] `digit`)*
673674
number: [`digitpart`] "." `digitpart` | `digitpart` ["."]
674675
exponent: ("e" | "E") ["+" | "-"] `digitpart`
675-
floatnumber: number [`exponent`]
676+
hexfloatnumber: `~python-grammar:hexinteger` | `~python-grammar:hexfraction` | `~python-grammar:hexfloat`
677+
floatnumber: (`number` [`exponent`]) | `hexfloatnumber`
676678
floatvalue: [`sign`] (`floatnumber` | `infinity` | `nan`)
677679

678680
Case is not significant, so, for example, "inf", "Inf", "INFINITY", and
679-
"iNfINity" are all acceptable spellings for positive infinity.
681+
"iNfINity" are all acceptable spellings for positive infinity. Note also
682+
that the exponent of a hexadecimal floating point number is written in
683+
decimal, and that it gives the power of 2 by which to multiply the
684+
coefficient.
680685

681686
Otherwise, if the argument is an integer or a floating point number, a
682687
floating point number with the same value (within Python's floating point
@@ -713,6 +718,9 @@ are always available. They are listed here in alphabetical order.
713718
.. versionchanged:: 3.8
714719
Falls back to :meth:`~object.__index__` if :meth:`~object.__float__` is not defined.
715720

721+
.. versionchanged:: 3.13
722+
Added support for hexadecimal floating-point numbers.
723+
716724

717725
.. index::
718726
single: __format__

Doc/reference/lexical_analysis.rst

Lines changed: 13 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -951,25 +951,36 @@ Floating point literals
951951
Floating point literals are described by the following lexical definitions:
952952

953953
.. productionlist:: python-grammar
954-
floatnumber: `pointfloat` | `exponentfloat`
954+
floatnumber: `pointfloat` | `exponentfloat` | `hexfloat`
955955
pointfloat: [`digitpart`] `fraction` | `digitpart` "."
956956
exponentfloat: (`digitpart` | `pointfloat`) `exponent`
957+
hexfloat: ("0x | "0X") ["_"] (`hexdigitpart` | `hexpointfloat`) [`binexponent`]
957958
digitpart: `digit` (["_"] `digit`)*
958959
fraction: "." `digitpart`
959960
exponent: ("e" | "E") ["+" | "-"] `digitpart`
961+
hexpointfloat: [`hexdigit`] `hexfraction` | `hexdigitpart` "."
962+
hexfraction: "." `hexdigitpart`
963+
hexdigitpart: `hexdigit` (["_"] `hexdigit`)*
964+
binexponent: ("p" | "P") ["+" | "-"] `digitpart`
960965

961-
Note that the integer and exponent parts are always interpreted using radix 10.
966+
Note that the exponent parts are always interpreted using radix 10.
962967
For example, ``077e010`` is legal, and denotes the same number as ``77e10``. The
963968
allowed range of floating point literals is implementation-dependent. As in
964969
integer literals, underscores are supported for digit grouping.
965970

971+
The exponent of a hexadecimal floating point literal is written in decimal, and
972+
it gives the power of 2 by which to multiply the coefficient.
973+
966974
Some examples of floating point literals::
967975

968976
3.14 10. .001 1e100 3.14e-10 0e0 3.14_15_93
969977

970978
.. versionchanged:: 3.6
971979
Underscores are now allowed for grouping purposes in literals.
972980

981+
.. versionchanged:: 3.13
982+
Added support for hexadecimal floating-point literals.
983+
973984

974985
.. index::
975986
single: j; in numeric literal

Doc/tutorial/floatingpoint.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -210,7 +210,7 @@ the float value exactly:
210210

211211
.. doctest::
212212

213-
>>> x == float.fromhex('0x1.921f9f01b866ep+1')
213+
>>> x == 0x1.921f9f01b866ep+1
214214
True
215215

216216
Since the representation is exact, it is useful for reliably porting values

Include/cpython/pyctype.h

Lines changed: 8 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -21,11 +21,17 @@ PyAPI_DATA(const unsigned int) _Py_ctype_table[256];
2121
#define Py_ISLOWER(c) (_Py_ctype_table[Py_CHARMASK(c)] & PY_CTF_LOWER)
2222
#define Py_ISUPPER(c) (_Py_ctype_table[Py_CHARMASK(c)] & PY_CTF_UPPER)
2323
#define Py_ISALPHA(c) (_Py_ctype_table[Py_CHARMASK(c)] & PY_CTF_ALPHA)
24-
#define Py_ISDIGIT(c) (_Py_ctype_table[Py_CHARMASK(c)] & PY_CTF_DIGIT)
25-
#define Py_ISXDIGIT(c) (_Py_ctype_table[Py_CHARMASK(c)] & PY_CTF_XDIGIT)
2624
#define Py_ISALNUM(c) (_Py_ctype_table[Py_CHARMASK(c)] & PY_CTF_ALNUM)
2725
#define Py_ISSPACE(c) (_Py_ctype_table[Py_CHARMASK(c)] & PY_CTF_SPACE)
2826

27+
static inline int Py_ISDIGIT(char c) {
28+
return _Py_ctype_table[Py_CHARMASK(c)] & PY_CTF_DIGIT;
29+
}
30+
31+
static inline int Py_ISXDIGIT(char c) {
32+
return _Py_ctype_table[Py_CHARMASK(c)] & PY_CTF_XDIGIT;
33+
}
34+
2935
PyAPI_DATA(const unsigned char) _Py_ctype_tolower[256];
3036
PyAPI_DATA(const unsigned char) _Py_ctype_toupper[256];
3137

Include/internal/pycore_floatobject.h

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -56,6 +56,7 @@ extern PyObject* _Py_string_to_number_with_underscores(
5656

5757
extern double _Py_parse_inf_or_nan(const char *p, char **endptr);
5858

59+
extern double _Py_dg_strtod_hex(const char *str, char **ptr);
5960

6061
#ifdef __cplusplus
6162
}

Lib/test/test_float.py

Lines changed: 9 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -38,9 +38,9 @@ def test_float(self):
3838
self.assertEqual(float(3.14), 3.14)
3939
self.assertEqual(float(314), 314.0)
4040
self.assertEqual(float(" 3.14 "), 3.14)
41-
self.assertRaises(ValueError, float, " 0x3.1 ")
42-
self.assertRaises(ValueError, float, " -0x3.p-1 ")
43-
self.assertRaises(ValueError, float, " +0x3.p-1 ")
41+
self.assertEqual(float(" 0x3.1 "), 3.0625)
42+
self.assertEqual(float(" -0x3.p-1 "), -1.5)
43+
self.assertEqual(float(" +0x3.p-1 "), 1.5)
4444
self.assertRaises(ValueError, float, "++3.14")
4545
self.assertRaises(ValueError, float, "+-3.14")
4646
self.assertRaises(ValueError, float, "-+3.14")
@@ -70,13 +70,13 @@ def test_noargs(self):
7070

7171
def test_underscores(self):
7272
for lit in VALID_UNDERSCORE_LITERALS:
73-
if not any(ch in lit for ch in 'jJxXoObB'):
73+
if not any(ch in lit for ch in 'jJoObB'):
7474
self.assertEqual(float(lit), eval(lit))
7575
self.assertEqual(float(lit), float(lit.replace('_', '')))
7676
for lit in INVALID_UNDERSCORE_LITERALS:
7777
if lit in ('0_7', '09_99'): # octals are not recognized here
7878
continue
79-
if not any(ch in lit for ch in 'jJxXoObB'):
79+
if not any(ch in lit for ch in 'jJoObB'):
8080
self.assertRaises(ValueError, float, lit)
8181
# Additional test cases; nan and inf are never valid as literals,
8282
# only in the float() constructor, but we don't allow underscores
@@ -173,9 +173,9 @@ def test_float_with_comma(self):
173173
self.assertRaises(ValueError, float, " 3,14 ")
174174
self.assertRaises(ValueError, float, " +3,14 ")
175175
self.assertRaises(ValueError, float, " -3,14 ")
176-
self.assertRaises(ValueError, float, " 0x3.1 ")
177-
self.assertRaises(ValueError, float, " -0x3.p-1 ")
178-
self.assertRaises(ValueError, float, " +0x3.p-1 ")
176+
self.assertEqual(float(" 0x3.1 "), 3.0625)
177+
self.assertEqual(float(" -0x3.p-1 "), -1.5)
178+
self.assertEqual(float(" +0x3.p-1 "), 1.5)
179179
self.assertEqual(float(" 25.e-1 "), 2.5)
180180
self.assertAlmostEqual(float(" .25e-1 "), .025)
181181

@@ -1483,7 +1483,7 @@ def roundtrip(x):
14831483
except OverflowError:
14841484
pass
14851485
else:
1486-
self.identical(x, fromHex(toHex(x)))
1486+
self.identical(x, roundtrip(x))
14871487

14881488
def test_subclass(self):
14891489
class F(float):

Lib/test/test_grammar.py

Lines changed: 57 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -19,8 +19,11 @@
1919

2020
# These are shared with test_tokenize and other test modules.
2121
#
22-
# Note: since several test cases filter out floats by looking for "e" and ".",
23-
# don't add hexadecimal literals that contain "e" or "E".
22+
# Note:
23+
# 1) several test cases filter out floats by looking for "e" and ".":
24+
# don't add hexadecimal literals that contain "e" or "E".
25+
# 2) several tests also filter out binary integers by looking for "b" or "B":
26+
# so, don't add hexadecimal floating point literals with above digits.
2427
VALID_UNDERSCORE_LITERALS = [
2528
'0_0_0',
2629
'4_2',
@@ -43,6 +46,16 @@
4346
'.1_4j',
4447
'(1_2.5+3_3j)',
4548
'(.5_6j)',
49+
'0x_.1p1',
50+
'0X_.1p1',
51+
'0x1_1.p1',
52+
'0x_1_1.p1',
53+
'0x1.1_1p1',
54+
'0x1.p1_1',
55+
'0xa.p1',
56+
'0x.ap1',
57+
'0xa_c.p1',
58+
'0x.a_cp1',
4659
]
4760
INVALID_UNDERSCORE_LITERALS = [
4861
# Trailing underscores:
@@ -54,6 +67,8 @@
5467
'0xf_',
5568
'0o5_',
5669
'0 if 1_Else 1',
70+
'0x1p1_',
71+
'0x1.1p1_',
5772
# Underscores in the base selector:
5873
'0_b0',
5974
'0_xf',
@@ -71,28 +86,41 @@
7186
'0o5__77',
7287
'1e1__0',
7388
'1e1__0j',
89+
'0x1__1.1p1',
7490
# Underscore right before a dot:
7591
'1_.4',
7692
'1_.4j',
93+
'0x1_.p1',
94+
'0xa_.p1',
7795
# Underscore right after a dot:
7896
'1._4',
7997
'1._4j',
8098
'._5',
8199
'._5j',
100+
'0x1._p1',
101+
'0xa._p1',
82102
# Underscore right after a sign:
83103
'1.0e+_1',
84104
'1.0e+_1j',
105+
'0x1.1p+_1',
85106
# Underscore right before j:
86107
'1.4_j',
87108
'1.4e5_j',
88-
# Underscore right before e:
109+
'0x1.1p1_j',
110+
# Underscore right before e or p:
89111
'1_e1',
90112
'1.4_e1',
91113
'1.4_e1j',
92-
# Underscore right after e:
114+
'0x1_p1',
115+
'0x1_P1',
116+
'0x1.1_p1',
117+
'0x1.1_P1',
118+
# Underscore right after e or p:
93119
'1e_1',
94120
'1.4e_1',
95121
'1.4e_1j',
122+
'0x1p_1',
123+
'0x1.1p_1',
96124
# Complex cases with parens:
97125
'(1+1.5_j_)',
98126
'(1+1.5_j)',
@@ -173,6 +201,23 @@ def test_floats(self):
173201
x = 3.e14
174202
x = .3e14
175203
x = 3.1e4
204+
x = 0x1.2p1
205+
x = 0x1.2p+1
206+
x = 0x1.p1
207+
x = 0x1.p-1
208+
x = 0x1p0
209+
x = 0x1ap1
210+
x = 0x1P1
211+
x = 0x1cp2
212+
x = 0x1.p1
213+
x = 0x1.P1
214+
x = 0x001.1p2
215+
x = 0X1p1
216+
x = 0x1.1_1p1
217+
x = 0x1.1p1_1
218+
x = 0x1.
219+
x = 0x1.1
220+
x = 0x.1
176221

177222
def test_float_exponent_tokenization(self):
178223
# See issue 21642.
@@ -210,7 +255,14 @@ def test_bad_numerical_literals(self):
210255
"use an 0o prefix for octal integers")
211256
check("1.2_", "invalid decimal literal")
212257
check("1e2_", "invalid decimal literal")
213-
check("1e+", "invalid decimal literal")
258+
check("1e+", "invalid float literal")
259+
check("0x.p", "invalid float literal")
260+
check("0x_.p", "invalid float literal")
261+
check("0x1.1p", "invalid float literal")
262+
check("0x1.1_p", "invalid float literal")
263+
check("0x1.1p_", "invalid float literal")
264+
check("0xp", "invalid hexadecimal literal")
265+
check("0xP", "invalid hexadecimal literal")
214266

215267
def test_end_of_numerical_literals(self):
216268
def check(test, error=False):

Lib/test/test_tokenize.py

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -265,6 +265,16 @@ def test_float(self):
265265
NAME 'x' (1, 0) (1, 1)
266266
OP '=' (1, 2) (1, 3)
267267
NUMBER '3.14e159' (1, 4) (1, 12)
268+
""")
269+
self.check_tokenize("x = 0x1p1", """\
270+
NAME 'x' (1, 0) (1, 1)
271+
OP '=' (1, 2) (1, 3)
272+
NUMBER '0x1p1' (1, 4) (1, 9)
273+
""")
274+
self.check_tokenize("x = 0x.1p1", """\
275+
NAME 'x' (1, 0) (1, 1)
276+
OP '=' (1, 2) (1, 3)
277+
NUMBER '0x.1p1' (1, 4) (1, 10)
268278
""")
269279

270280
def test_underscore_literals(self):

Lib/tokenize.py

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -77,7 +77,10 @@ def maybe(*choices): return group(*choices) + '?'
7777
Pointfloat = group(r'[0-9](?:_?[0-9])*\.(?:[0-9](?:_?[0-9])*)?',
7878
r'\.[0-9](?:_?[0-9])*') + maybe(Exponent)
7979
Expfloat = r'[0-9](?:_?[0-9])*' + Exponent
80-
Floatnumber = group(Pointfloat, Expfloat)
80+
HexExponent = r'[pP][-+]?[0-9](?:_?[0-9])*'
81+
Hexfloat = group(r'0[xX]_?[0-9a-f](?:_?[0-9a-f])*\.(?:[0-9a-f](?:_?[0-9a-f])*)?',
82+
r'0[xX]_?\.[0-9a-f](?:_?[0-9a-f])*') + HexExponent
83+
Floatnumber = group(Pointfloat, Expfloat, Hexfloat)
8184
Imagnumber = group(r'[0-9](?:_?[0-9])*[jJ]', Floatnumber + r'[jJ]')
8285
Number = group(Imagnumber, Floatnumber, Intnumber)
8386

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
Add hexadecimal floating point literals (IEEE 754-2008 §5.12.3) and support
2+
construction of floats from hexadecimal strings. Patch by Sergey B
3+
Kirpichev.

0 commit comments

Comments
 (0)