gh-113804: Support formatting floats in hexadecimal notation #113805

skirpichev · 2024-01-08T05:06:13Z

The x and X format types (in str.format(), f-strings and old-style string formatting) now support also formatting floats as a hexadecimal string, i.e. in the style [±][0x]h[.hhh]p±d (which is essentially a base-16 scientific notation, but with base-2 exponent, written in decimal).

_Py_dg_dtoa_hex() helper is based on the float.hex() method with additional support for the precision setting and some format flags. Note (c.f. printf's 'a' format type of the C stdlib), that '#' option is used to control the prefix (off by default).

Trailing zeros (and the dot) in the fractional part are excluded. (This change doesn't affect the float.hex() method and float.fromhex()/float.hex() round-trip.)

Examples:

>>> f'{-0.1:#x}'
'-0x1.999999999999ap-4'
>>> (-0.1).hex()
'-0x1.999999999999ap-4'
>>> f'{3.14159:+#X}'
'+0X1.921F9F01B866EP+1'
>>> f'{3.14159:.3x}'
'1.922p+1'

Minor changes:

added Py_hexdigits_upper constant
tests for RaisingNumber are moved (to test also bytes)

Notes

Implementation was changed to use "x"/"X" format types per suggestions both in the discussion thread and here. This (1) makes output slightly more compact per default and (2) allows to extend formatting support to old-style string formatting (using "a" letter will conflict with existing ascii() converter).

Personally I would choose "a" format type (like printf) instead and keep the prefix mandatory, to simplify interaction with other languages (e.g. copy-pasting output to C code as a hexadecimal literal) or within in the CPython repl itself, if #114668 will be accepted. It worth to note, that both the gmpy2 and the bigfloat packages use the "a" format type.

Open issues:

Should we add also "a" format type, with C-like behaviour, for str.format()/f-strings to be more compatibile with existing libraries (e.g. gmpy2)?
~~Should we change at all old-style string formatting?~~
Should we support binary notation (using e.g. "b"/"B" format types as the MPFR does)?
Should we adjust tests to use f"{v:#x}" instead of v.hex()?
Should we deprecate (at least soft deprecate) float.hex() method?
Related pr: gh-114667: Support hexadecimal floating-point literals #114668 (require a PEP)

Issue: Support formatting floats in hexadecimal (and binary?) notation #113804

📚 Documentation preview 📚: https://cpython-previews--113805.org.readthedocs.build/

vstinner

I would prefer to not add 0x prefix by default, and only add it if the alternative form (#x) is used.

Doc/library/stdtypes.rst

Lib/test/test_float.py

Python/pystrtod.c

skirpichev · 2024-01-11T00:54:28Z

I would prefer to not add 0x prefix by default, and only add it if the alternative form (#x) is used.

I have no strong preference for the current syntax, but here is briefly its rationale:

It was inspired by C and Go. In fact, we could add the "a" type alias for new-style format syntax.
The mandatory prefix seems to be a standard for hexadecimal literals in other languages. (The discussion thread has also proposal for hexadecimal literals in Python, but that lacks a sponsor.) In that sense defaults were optimized for copy-pasting.

_Py_dg_dtoa_hex() helper is based on float.hex() with additional support for the precision setting and some format flags. Note (c.f. ``'a'`` format type of the C stdlib), that ``'#'`` option is used to control the prefix (off by default). Trailing zeros (and the dot) in fractional part are excluded. Examples: ```pycon >>> f'{-0.1:#x}' '-0x1.999999999999ap-4' >>> (-0.1).hex() '-0x1.999999999999ap-4' >>> f'{3.14159:+#X}' '+0X1.921F9F01B866EP+1' >>> f'{3.14159:.3x}' '1.922p+1' ``` Minor changes: * added Py_hexdigits_upper constant * tests for RaisingNumber are moved (to test also bytes)

skirpichev · 2024-01-12T01:55:19Z

Ok, I think it's ready for review. I've mentioned some open issues in the pr description.

BTW, the 'x' semantics was changed per @vstinner suggestion (to control the hexadecimal prefix instead). It's possible to use 'a'-like meaning (C stdlib), but the implementation will be more complex.

hugovk · 2024-01-12T07:02:49Z

Please could you check these warnings?

Python/pystrtod.c:475:24: warning: result of comparison of constant 16 with boolean expression is always true [-Wtautological-constant-out-of-range-compare]
    assert(0 <= (int)m < 16);
           ~~~~~~~~~~~ ^ ~~
/Library/Developer/CommandLineTools/SDKs/MacOSX.sdk/usr/include/assert.h:99:25: note: expanded from macro 'assert'
    (__builtin_expect(!(e), 0) ? __assert_rtn(__func__, __ASSERT_FILE_NAME, __LINE__, #e) : (void)0)
                        ^
Python/pystrtod.c:483:28: warning: result of comparison of constant 16 with boolean expression is always true [-Wtautological-constant-out-of-range-compare]
        assert(0 <= (int)m < 16);
               ~~~~~~~~~~~ ^ ~~
/Library/Developer/CommandLineTools/SDKs/MacOSX.sdk/usr/include/assert.h:99:25: note: expanded from macro 'assert'
    (__builtin_expect(!(e), 0) ? __assert_rtn(__func__, __ASSERT_FILE_NAME, __LINE__, #e) : (void)0)
                        ^
2 warnings generated.

skirpichev · 2024-01-12T07:55:00Z

@hugovk, thanks, fixed.

skirpichev · 2024-05-31T15:10:17Z

I'm not sure whether you want to put spaces around simple operators like that.

@picnixz, PEP 7 doesn't require that in general. I think placement of spaces mostly is consistent with the rest of file.

skirpichev · 2024-05-31T16:16:41Z

@vstinner, thanks for review. I think, except for few cases (not marked as resolved) - everything was addressed.

mdickinson · 2024-06-01T08:10:00Z

Include/internal/pycore_floatobject.h

@@ -55,6 +55,8 @@ extern PyObject* _Py_string_to_number_with_underscores(

 extern double _Py_parse_inf_or_nan(const char *p, char **endptr);

+extern char * _Py_dg_dtoa_hex(double x, int precision, int always_add_sign,


I suggest calling this something clearer like _Py_float_to_hex. The dg_dtoa stuff doesn't make much sense here: "dg" is a reference to David Gay, since he wrote the original version of the dtoa.c code that Python's dtoa.c is baed on; this code has nothing to do with Gay's code.

Thanks, does make sense.

Initially that helper was a static function in pystrtod.c; just as _Py_dg_dtoa() it, well, "convert double to ASCII string"... This was exported and moved to floatobject.c to (1) slighly reduce diff and (2) support workaround for hex().

I think (1) goal wasn't very successful. Then, maybe this helper belongs to pystrtod.c? If so, _Py_dtoa_hex() sounds ok?

mdickinson · 2024-06-01T08:14:02Z

This doesn't look right - the second result below should be showing a nonzero value:

>>> format(1e-320, "x")
'0.00000000007e8p-1022'
>>> format(1e-320, ".3x")
'0.000p-1022'

I'd suggest dropping precision support altogether. The use case for hex formatting is to be able to convey the precise value of the float being converted. Rounding to a given number of digits is something that makes sense for decimal output, but I doubt there's any use for it for the hex output.

skirpichev · 2024-06-01T08:25:35Z

This doesn't look right - the second result below should be showing a nonzero value

Hmm, are you sure? It looks gcc & clang have different opinion:

sk@note:~ $ clang -std=c11 a.c -lm && ./a.out
0x0.00000000007e8p-1022
0x0.000p-1022
sk@note:~ $ gcc -std=c11 a.c -lm && ./a.out
0x0.00000000007e8p-1022
0x0.000p-1022

a.c source

#include <stdio.h>
int
main()
{
    double x = 1e-320;
    printf("%a\n", x);
    printf("%.3a\n", x);
    return 0;
}

mdickinson · 2024-06-01T08:41:16Z

Round-ties-to-even is also not working correctly with the precision support:

>>> format(1.03125, "x")
'1.08p+0'
>>> format(1.03125, ".1x")
'1.1p+0'

That second result should be 1.0p+0.

mdickinson · 2024-06-01T08:46:23Z

Hmm, are you sure? It looks gcc & clang have different opinion:

Interesting; I get a different result on macOS / Intel (clang 15.0.0):

mdickinson@lovelace cpython % cat test.c
#include <stdio.h>

int main(void) {
    double x = 1e-320;
    printf("x = %.3a\n", x);
    return 0;
}
mdickinson@lovelace cpython % gcc -Wall test.c && ./a.out
x = 0x1.fa0p-1064
mdickinson@lovelace cpython % gcc --version
Apple clang version 15.0.0 (clang-1500.1.0.2.5)
Target: x86_64-apple-darwin22.6.0
Thread model: posix
InstalledDir: /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin

skirpichev · 2024-06-01T09:02:06Z

Hmm, JFR:

sk@note:~ $ gcc --version|head -1
gcc (Debian 12.2.0-14) 12.2.0
sk@note:~ $ clang --version|head -1
Debian clang version 14.0.6

But

sk@note:~ $ clang-15 -std=c11 a.c -lm && ./a.out
0x0.00000000007e8p-1022
0x0.000p-1022
sk@note:~ $ clang-15 --version|head -1
Debian clang version 15.0.6

format(1.03125, ".1x")

Yes, that's a bug, thanks. Code should respect sys.float_info settings.

mdickinson · 2024-06-01T09:05:55Z

Code should respect sys.float_info settings.

I don't think sys.float_info is relevant here; we should simply do round-ties-to-even, just like we do for decimal formatting.

But again, I'd suggest dropping precision support altogether - I don't think it's useful.

The only possible use-case that occurred to me would be rounding to single precision, but even that's not possible, since you'd need exactly 23 bits (5 and 3/4 hex digits) after the point, and there's no way to express that.

skirpichev · 2024-06-02T04:34:11Z

I don't think sys.float_info is relevant here

I meant in C this is determined by current rounding mode (and configurable by fesetround(), but that's another story).

we should simply do round-ties-to-even

I hope, I did this correctly.

just like we do for decimal formatting.

Just curious, where it's documented? E.g. I don't see where it's mentioned in https://docs.python.org/3/library/string.html#format-specification-mini-language

But again, I'd suggest dropping precision support altogether - I don't think it's useful.

That ruins the whole proposal.(

mdickinson · 2024-06-02T07:29:35Z

That ruins the whole proposal.(

Then perhaps it's the wrong proposal?

It certainly doesn't ruin it for me: the convenience of being able to format floats with appropriate padding and alignment is a win.

But adding precision support significantly complicates the code (not least because we have to deal with rounding correctly: how did you implement rounding for subnormals, for example? Did you deal with the cases close to sys.float_info.max where we need to round to infinity?), takes effort to implement and review, and I'm still not seeing the use-case. The functionality itself is also a little odd: we get the option to round to 1 significant bit, 5 significant bits, 9 bits, 13 bits, ... of precision, but have no way to round to 10 bits, 15 bits, or any other number of bits than isn't congruent to 1 modulo 4.

What use-case do you see for adding precision support?

mdickinson · 2024-06-02T07:30:36Z

Just curious, where it's documented?

I don't think it is, and in any case it's not a language requirement; just something that CPython itself tries to do well.

mdickinson · 2024-06-02T07:45:58Z

I meant in C this is determined by current rounding mode [...]

Hmm. I'm not seeing anything in the C standard that would suggest that. Can you point me to the relevant pieces?

skirpichev · 2024-06-02T09:40:45Z

I'm not seeing anything in the C standard that would suggest that. Can you point me to the relevant pieces?

This was rather an empirical observation:)

an example

#include <stdio.h>
#include <fenv.h>
int
main()
{
    double x = 0x1.1115p0;
    fesetround(FE_DOWNWARD);
    printf("%.3a\n", x);
    fesetround(FE_UPWARD);
    printf("%.3a\n", x);
    return 0;
}

sk@note:~ $ gcc-9 -std=c11 a.c -lm && ./a.out
0x1.111p+0
0x1.112p+0
sk@note:~ $ gcc-10 -std=c11 a.c -lm && ./a.out
0x1.111p+0
0x1.112p+0
sk@note:~ $ gcc-12 -std=c11 a.c -lm && ./a.out
0x1.111p+0
0x1.112p+0
sk@note:~ $ clang-14 -std=c11 a.c -lm && ./a.out
0x1.111p+0
0x1.112p+0
sk@note:~ $ clang-15 -std=c11 a.c -lm && ./a.out
0x1.111p+0
0x1.112p+0

I'll look into. The 7.21.6 section (C11) rather vague in this respect...

skirpichev · 2024-06-02T13:10:40Z

I'm not sure that's true for the current code. I think I can reduce diff wrt old float_hex_impl() by removing added comments etc: precision-related changes aren't big so far. Let me know if this helps review and worth efforts.

how did you implement rounding for subnormals, for example?

Hmm, I think that's done like in C stdlib, e.g.

#include <stdio.h>
int main()
{
    double x = 3e-320;
    printf("%a\n", x);     // 0x0.00000000017b8p-1022
    printf("%.9a\n", x);   // 0x0.000000000p-1022
    printf("%.10a\n", x);  // 0x0.0000000001p-1022
    printf("%.11a\n", x);  // 0x0.00000000018p-1022
    return 0;
}

Did you deal with the cases close to sys.float_info.max where we need to round to infinity?

Again, as above. Another example:

#include <stdio.h>
int main()
{
    double x = 1.5e+308;
    printf("%a\n", x);    // 0x1.ab36d48e1acfp+1023
    printf("%.0a\n", x);  // 0x2p+1023 (vs 1p+1024 in python)
    printf("%a\n", 0x2p+1023);  // a warning & prints inf
    return 0;
}

I'll double check again, compare with other implementations. But so far I think one should be correct.

What use-case do you see for adding precision support?

Probably, that's better to discuss in the issue thread. But I doubt that many languages implement useless feature.

skirpichev · 2024-06-17T02:49:50Z

I'm not seeing anything in the C standard that would suggest that. Can you point me to the relevant pieces?

Well, I find "relevant pieces" rather vague, but...

On p281 (C99 Draft): "For a and A conversions, if FLT_RADIX is a power of 2, the value is correctly rounded to a hexadecimal floating number with the given precision."
and then next, in "Recommended practice": "For a and A conversions, if FLT_RADIX is not a power of 2 and the result is not exactly representable in the given precision, the result should be one of the two adjacent numbers in hexadecimal floating style with the given precision, with the extra stipulation that the error should have a correct sign for the current rounding direction."
Following remark about other conversions for float types also mention "the current rounding direction".

I doubt this "the current rounding direction" could be interpreted otherwise than one, controllable by fegetround() / fesetround()... I think that's natural that per default - current rounding mode depends on floating-point environment settings. Where it's different - that's mentioned as an exception. For example: "The trunc functions use IEC 60559 rounding toward zero (regardless of the current rounding direction)."

skirpichev · 2024-06-19T07:07:10Z

Ok, @mdickinson, I've checked rounding code (~20 lines) and I think it's correct and match implementation e.g. in the glibc. The difference is that stdlib's code sometimes not do 1-normalization.

After all, that shouldn't be much complex than binary rounding. Which we could use instead (i.e. precision settings will be the number of bits after the decimal dot rather than number of hexadecimal digits) with almost same code. This is one variant to allow rounding to any number of bits. (Unfortunately, rounding direction can't be specified, but that's another story.)

Bulk tests

diff --git a/Lib/test/test_float.py b/Lib/test/test_float.py
index 8b56d76bb0..5e757ec5ad 100644
--- a/Lib/test/test_float.py
+++ b/Lib/test/test_float.py
@@ -750,6 +750,64 @@ def test_format(self):
         self.assertEqual(format(INF, 'f'), 'inf')
         self.assertEqual(format(INF, 'F'), 'INF')
 
+    def test_format_x_prec_bulk(self):
+        import ctypes, math, random, _testcapi, sys
+
+        libc = ctypes.CDLL("libc.so.6")
+
+        def float_print(d, i):
+            fmt = "%." + str(i) + "a\n"
+            a = b"z"*256
+            libc.sprintf(a, bytes(fmt, 'utf-8'), ctypes.c_double(d));
+            return a.decode('utf-8').split("\n")[0]
+
+        for n in range(1_000_000):
+            d = random.random()
+            i = random.randint(0,25)
+            with self.subTest(d=d, i=i):
+                f = "#." + str(i) + "x"
+                s1 = format(d, f)
+                s2 = float_print(d, i)
+                if s1 != s2:
+                    e1 = int(s1.split("p")[1])
+                    e2 = int(s2.split("p")[1])
+                    l1 = int(s1.replace("0x", "")[0])
+                    l2 = int(s2.replace("0x", "")[0])
+                    self.assertEqual(e1, e2+1)
+                    self.assertEqual(l1, l2//2)
+                    self.assertEqual(float.fromhex(s1), float.fromhex(s2))
+
+        DMAX = sys.float_info.max
+
+        for n in range(1_000_000):
+            d = DMAX-random.random()*10*math.ulp(DMAX)
+            i = random.randint(0,25)
+            with self.subTest(d=d, i=i):
+                f = "#." + str(i) + "x"
+                s1 = format(d, f)
+                s2 = float_print(d, i)
+                try:
+                    self.assertEqual(float.fromhex(s1), float.fromhex(s2))
+                except OverflowError:
+                    if s1 != s2:
+                        e1 = int(s1.split("p")[1])
+                        e2 = int(s2.split("p")[1])
+                        n1 = float.fromhex(s1.split("p")[0]+"p0")
+                        n2 = float.fromhex(s2.split("p")[0]+"p0")
+                        self.assertEqual(e1, e2+1)
+                        self.assertEqual(n1, n2/2)
+
+        DMIN = 2**sys.float_info.min_exp
+
+        for n in range(1_000_000):
+            d = random.random()*2**-30*DMIN
+            i = random.randint(0,25)
+            with self.subTest(d=d, i=i):
+                f = "#." + str(i) + "x"
+                s1 = format(d, f)
+                s2 = float_print(d, i)
+                self.assertEqual(float.fromhex(s1), float.fromhex(s2))
+
     @support.requires_IEEE_754
     def test_format_testfile(self):
         with open(format_testfile, encoding="utf-8") as testfile:

vstinner · 2024-07-02T06:48:27Z

It was decided to not pursue this approach, since float.hex() exists and it's enough for most usages, and we don't want to make str.format() more complicated.

bedevere-app bot mentioned this pull request Jan 8, 2024

Support formatting floats in hexadecimal (and binary?) notation #113804

Closed

skirpichev force-pushed the hex-format branch 3 times, most recently from 3561899 to 54d7892 Compare January 10, 2024 11:11

vstinner reviewed Jan 10, 2024

View reviewed changes

Doc/library/stdtypes.rst Outdated Show resolved Hide resolved

Lib/test/test_float.py Outdated Show resolved Hide resolved

Python/pystrtod.c Outdated Show resolved Hide resolved

skirpichev force-pushed the hex-format branch 4 times, most recently from ec3f7e3 to 6d721cd Compare January 11, 2024 05:50

skirpichev force-pushed the hex-format branch from 3450b30 to 64ab5ed Compare January 12, 2024 01:09

skirpichev marked this pull request as ready for review January 12, 2024 01:50

skirpichev requested review from rhettinger, ericsnowcurrently and brandtbucher as code owners January 12, 2024 01:50

bedevere-app bot added the awaiting review label Jan 12, 2024

Address review: fix asserts

024b02d

rhettinger removed their request for review January 12, 2024 16:07

skirpichev requested a review from vstinner January 13, 2024 23:56

This comment was marked as outdated.

Sign in to view

skirpichev added 2 commits January 16, 2024 03:21

Merge branch 'main' into hex-format

005577c

Merge branch 'main' into hex-format

97134b9

skirpichev changed the title ~~gh-113804: Support "x" and "X" format types for floats~~ gh-113804: Support "x/X" format types for floats Feb 20, 2024

skirpichev changed the title ~~gh-113804: Support "x/X" format types for floats~~ gh-113804: Support formatting floats in hexadecimal notation Feb 23, 2024

skirpichev mentioned this pull request Feb 25, 2024

Support binary/hexadecimal string output mpmath/mpmath#711

Merged

Merge branch 'main' into hex-format

e05059f

skirpichev added 2 commits May 31, 2024 18:31

fix: catch PyMem_Malloc() error and document returned value

d1ff85e

fix: warnings

089afe4

skirpichev requested a review from vstinner May 31, 2024 16:16

mdickinson reviewed Jun 1, 2024

View reviewed changes

skirpichev added 2 commits June 1, 2024 12:50

address review: _Py_dg_dtoa_hex -> _Py_float_to_hex

be84c10

address review: round-to-even

c98b092

Merge branch 'master' into hex-format

e089ff5

skirpichev added 4 commits June 17, 2024 05:58

+

795d1da

Merge branch 'master' into hex-format-test

58e84fd

cleanup

728671a

+1

4eaa4f8

vstinner closed this Jul 2, 2024

skirpichev deleted the hex-format branch August 2, 2024 14:12

		@@ -55,6 +55,8 @@ extern PyObject* _Py_string_to_number_with_underscores(

		extern double _Py_parse_inf_or_nan(const char p, char *endptr);

		extern char * _Py_dg_dtoa_hex(double x, int precision, int always_add_sign,

Uh oh!

gh-113804: Support formatting floats in hexadecimal notation #113805

gh-113804: Support formatting floats in hexadecimal notation #113805

Uh oh!

Conversation

skirpichev commented Jan 8, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Notes

Uh oh!

vstinner left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

skirpichev commented Jan 11, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

skirpichev commented Jan 12, 2024

Uh oh!

hugovk commented Jan 12, 2024

Uh oh!

skirpichev commented Jan 12, 2024

Uh oh!

This comment was marked as outdated.

skirpichev commented May 31, 2024

Uh oh!

skirpichev commented May 31, 2024

Uh oh!

mdickinson Jun 1, 2024

Choose a reason for hiding this comment

Uh oh!

skirpichev Jun 1, 2024

Choose a reason for hiding this comment

Uh oh!

mdickinson commented Jun 1, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

skirpichev commented Jun 1, 2024

Uh oh!

mdickinson commented Jun 1, 2024

Uh oh!

mdickinson commented Jun 1, 2024

Uh oh!

skirpichev commented Jun 1, 2024

Uh oh!

mdickinson commented Jun 1, 2024

Uh oh!

skirpichev commented Jun 2, 2024

Uh oh!

mdickinson commented Jun 2, 2024

Uh oh!

mdickinson commented Jun 2, 2024

Uh oh!

mdickinson commented Jun 2, 2024

Uh oh!

skirpichev commented Jun 2, 2024

Uh oh!

skirpichev commented Jun 2, 2024

Uh oh!

skirpichev commented Jun 17, 2024

Uh oh!

skirpichev commented Jun 19, 2024

Uh oh!

vstinner commented Jul 2, 2024

Uh oh!

Uh oh!

skirpichev commented Jan 8, 2024 •

edited

Loading

skirpichev commented Jan 11, 2024 •

edited

Loading

mdickinson commented Jun 1, 2024 •

edited

Loading