Skip to content

gh-113804: Support formatting floats in hexadecimal notation #113805

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 29 commits into from

Conversation

skirpichev
Copy link
Member

@skirpichev skirpichev commented Jan 8, 2024

The x and X format types (in str.format(), f-strings and old-style string formatting) now support also formatting floats as a hexadecimal string, i.e. in the style [±][0x]h[.hhh]p±d (which is essentially a base-16 scientific notation, but with base-2 exponent, written in decimal).

_Py_dg_dtoa_hex() helper is based on the float.hex() method with additional support for the precision setting and some format flags. Note (c.f. printf's 'a' format type of the C stdlib), that '#' option is used to control the prefix (off by default).

Trailing zeros (and the dot) in the fractional part are excluded. (This change doesn't affect the float.hex() method and float.fromhex()/float.hex() round-trip.)

Examples:

>>> f'{-0.1:#x}'
'-0x1.999999999999ap-4'
>>> (-0.1).hex()
'-0x1.999999999999ap-4'
>>> f'{3.14159:+#X}'
'+0X1.921F9F01B866EP+1'
>>> f'{3.14159:.3x}'
'1.922p+1'

Minor changes:

  • added Py_hexdigits_upper constant
  • tests for RaisingNumber are moved (to test also bytes)

Notes

Implementation was changed to use "x"/"X" format types per suggestions both in the discussion thread and here. This (1) makes output slightly more compact per default and (2) allows to extend formatting support to old-style string formatting (using "a" letter will conflict with existing ascii() converter).

Personally I would choose "a" format type (like printf) instead and keep the prefix mandatory, to simplify interaction with other languages (e.g. copy-pasting output to C code as a hexadecimal literal) or within in the CPython repl itself, if #114668 will be accepted. It worth to note, that both the gmpy2 and the bigfloat packages use the "a" format type.

Open issues:

  • Should we add also "a" format type, with C-like behaviour, for str.format()/f-strings to be more compatibile with existing libraries (e.g. gmpy2)?
  • Should we change at all old-style string formatting?
  • Should we support binary notation (using e.g. "b"/"B" format types as the MPFR does)?
  • Should we adjust tests to use f"{v:#x}" instead of v.hex()?
  • Should we deprecate (at least soft deprecate) float.hex() method?
  • Related pr: gh-114667: Support hexadecimal floating-point literals #114668 (require a PEP)


📚 Documentation preview 📚: https://cpython-previews--113805.org.readthedocs.build/

Copy link
Member

@vstinner vstinner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer to not add 0x prefix by default, and only add it if the alternative form (#x) is used.

@skirpichev
Copy link
Member Author

skirpichev commented Jan 11, 2024

I would prefer to not add 0x prefix by default, and only add it if the alternative form (#x) is used.

I have no strong preference for the current syntax, but here is briefly its rationale:

  1. It was inspired by C and Go. In fact, we could add the "a" type alias for new-style format syntax.
  2. The mandatory prefix seems to be a standard for hexadecimal literals in other languages. (The discussion thread has also proposal for hexadecimal literals in Python, but that lacks a sponsor.) In that sense defaults were optimized for copy-pasting.

@skirpichev skirpichev force-pushed the hex-format branch 4 times, most recently from ec3f7e3 to 6d721cd Compare January 11, 2024 05:50
_Py_dg_dtoa_hex() helper is based on float.hex() with additional support
for the precision setting and some format flags.  Note (c.f. ``'a'``
format type of the C stdlib), that ``'#'`` option is used to control the
prefix (off by default).

Trailing zeros (and the dot) in fractional part are excluded.

Examples:
```pycon
>>> f'{-0.1:#x}'
'-0x1.999999999999ap-4'
>>> (-0.1).hex()
'-0x1.999999999999ap-4'
>>> f'{3.14159:+#X}'
'+0X1.921F9F01B866EP+1'
>>> f'{3.14159:.3x}'
'1.922p+1'
```

Minor changes:
* added Py_hexdigits_upper constant
* tests for RaisingNumber are moved (to test also bytes)
@skirpichev
Copy link
Member Author

Ok, I think it's ready for review. I've mentioned some open issues in the pr description.

BTW, the 'x' semantics was changed per @vstinner suggestion (to control the hexadecimal prefix instead). It's possible to use 'a'-like meaning (C stdlib), but the implementation will be more complex.

@hugovk
Copy link
Member

hugovk commented Jan 12, 2024

Please could you check these warnings?

Python/pystrtod.c:475:24: warning: result of comparison of constant 16 with boolean expression is always true [-Wtautological-constant-out-of-range-compare]
    assert(0 <= (int)m < 16);
           ~~~~~~~~~~~ ^ ~~
/Library/Developer/CommandLineTools/SDKs/MacOSX.sdk/usr/include/assert.h:99:25: note: expanded from macro 'assert'
    (__builtin_expect(!(e), 0) ? __assert_rtn(__func__, __ASSERT_FILE_NAME, __LINE__, #e) : (void)0)
                        ^
Python/pystrtod.c:483:28: warning: result of comparison of constant 16 with boolean expression is always true [-Wtautological-constant-out-of-range-compare]
        assert(0 <= (int)m < 16);
               ~~~~~~~~~~~ ^ ~~
/Library/Developer/CommandLineTools/SDKs/MacOSX.sdk/usr/include/assert.h:99:25: note: expanded from macro 'assert'
    (__builtin_expect(!(e), 0) ? __assert_rtn(__func__, __ASSERT_FILE_NAME, __LINE__, #e) : (void)0)
                        ^
2 warnings generated.

@skirpichev
Copy link
Member Author

@hugovk, thanks, fixed.

@rhettinger rhettinger removed their request for review January 12, 2024 16:07
@skirpichev skirpichev requested a review from vstinner January 13, 2024 23:56
@skirpichev

This comment was marked as outdated.

@skirpichev skirpichev changed the title gh-113804: Support "x" and "X" format types for floats gh-113804: Support "x/X" format types for floats Feb 20, 2024
@skirpichev skirpichev changed the title gh-113804: Support "x/X" format types for floats gh-113804: Support formatting floats in hexadecimal notation Feb 23, 2024
@skirpichev
Copy link
Member Author

I'm not sure whether you want to put spaces around simple operators like that.

@picnixz, PEP 7 doesn't require that in general. I think placement of spaces mostly is consistent with the rest of file.

@skirpichev
Copy link
Member Author

@vstinner, thanks for review. I think, except for few cases (not marked as resolved) - everything was addressed.

@skirpichev skirpichev requested a review from vstinner May 31, 2024 16:16
@@ -55,6 +55,8 @@ extern PyObject* _Py_string_to_number_with_underscores(

extern double _Py_parse_inf_or_nan(const char *p, char **endptr);

extern char * _Py_dg_dtoa_hex(double x, int precision, int always_add_sign,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest calling this something clearer like _Py_float_to_hex. The dg_dtoa stuff doesn't make much sense here: "dg" is a reference to David Gay, since he wrote the original version of the dtoa.c code that Python's dtoa.c is baed on; this code has nothing to do with Gay's code.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, does make sense.

Initially that helper was a static function in pystrtod.c; just as _Py_dg_dtoa() it, well, "convert double to ASCII string"... This was exported and moved to floatobject.c to (1) slighly reduce diff and (2) support workaround for hex().

I think (1) goal wasn't very successful. Then, maybe this helper belongs to pystrtod.c? If so, _Py_dtoa_hex() sounds ok?

@mdickinson
Copy link
Member

mdickinson commented Jun 1, 2024

This doesn't look right - the second result below should be showing a nonzero value:

>>> format(1e-320, "x")
'0.00000000007e8p-1022'
>>> format(1e-320, ".3x")
'0.000p-1022'

I'd suggest dropping precision support altogether. The use case for hex formatting is to be able to convey the precise value of the float being converted. Rounding to a given number of digits is something that makes sense for decimal output, but I doubt there's any use for it for the hex output.

@skirpichev
Copy link
Member Author

This doesn't look right - the second result below should be showing a nonzero value

Hmm, are you sure? It looks gcc & clang have different opinion:

sk@note:~ $ clang -std=c11 a.c -lm && ./a.out
0x0.00000000007e8p-1022
0x0.000p-1022
sk@note:~ $ gcc -std=c11 a.c -lm && ./a.out
0x0.00000000007e8p-1022
0x0.000p-1022
a.c source
#include <stdio.h>
int
main()
{
    double x = 1e-320;
    printf("%a\n", x);
    printf("%.3a\n", x);
    return 0;
}

@mdickinson
Copy link
Member

Round-ties-to-even is also not working correctly with the precision support:

>>> format(1.03125, "x")
'1.08p+0'
>>> format(1.03125, ".1x")
'1.1p+0'

That second result should be 1.0p+0.

@mdickinson
Copy link
Member

Hmm, are you sure? It looks gcc & clang have different opinion:

Interesting; I get a different result on macOS / Intel (clang 15.0.0):

mdickinson@lovelace cpython % cat test.c
#include <stdio.h>

int main(void) {
    double x = 1e-320;
    printf("x = %.3a\n", x);
    return 0;
}
mdickinson@lovelace cpython % gcc -Wall test.c && ./a.out
x = 0x1.fa0p-1064
mdickinson@lovelace cpython % gcc --version
Apple clang version 15.0.0 (clang-1500.1.0.2.5)
Target: x86_64-apple-darwin22.6.0
Thread model: posix
InstalledDir: /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin

@skirpichev
Copy link
Member Author

Hmm, JFR:

sk@note:~ $ gcc --version|head -1
gcc (Debian 12.2.0-14) 12.2.0
sk@note:~ $ clang --version|head -1
Debian clang version 14.0.6

But

sk@note:~ $ clang-15 -std=c11 a.c -lm && ./a.out
0x0.00000000007e8p-1022
0x0.000p-1022
sk@note:~ $ clang-15 --version|head -1
Debian clang version 15.0.6

format(1.03125, ".1x")

Yes, that's a bug, thanks. Code should respect sys.float_info settings.

@mdickinson
Copy link
Member

Code should respect sys.float_info settings.

I don't think sys.float_info is relevant here; we should simply do round-ties-to-even, just like we do for decimal formatting.

But again, I'd suggest dropping precision support altogether - I don't think it's useful.

The only possible use-case that occurred to me would be rounding to single precision, but even that's not possible, since you'd need exactly 23 bits (5 and 3/4 hex digits) after the point, and there's no way to express that.

@skirpichev
Copy link
Member Author

I don't think sys.float_info is relevant here

I meant in C this is determined by current rounding mode (and configurable by fesetround(), but that's another story).

we should simply do round-ties-to-even

I hope, I did this correctly.

just like we do for decimal formatting.

Just curious, where it's documented? E.g. I don't see where it's mentioned in https://docs.python.org/3/library/string.html#format-specification-mini-language

But again, I'd suggest dropping precision support altogether - I don't think it's useful.

That ruins the whole proposal.(

@mdickinson
Copy link
Member

That ruins the whole proposal.(

Then perhaps it's the wrong proposal?

It certainly doesn't ruin it for me: the convenience of being able to format floats with appropriate padding and alignment is a win.

But adding precision support significantly complicates the code (not least because we have to deal with rounding correctly: how did you implement rounding for subnormals, for example? Did you deal with the cases close to sys.float_info.max where we need to round to infinity?), takes effort to implement and review, and I'm still not seeing the use-case. The functionality itself is also a little odd: we get the option to round to 1 significant bit, 5 significant bits, 9 bits, 13 bits, ... of precision, but have no way to round to 10 bits, 15 bits, or any other number of bits than isn't congruent to 1 modulo 4.

What use-case do you see for adding precision support?

@mdickinson
Copy link
Member

Just curious, where it's documented?

I don't think it is, and in any case it's not a language requirement; just something that CPython itself tries to do well.

@mdickinson
Copy link
Member

I meant in C this is determined by current rounding mode [...]

Hmm. I'm not seeing anything in the C standard that would suggest that. Can you point me to the relevant pieces?

@skirpichev
Copy link
Member Author

I'm not seeing anything in the C standard that would suggest that. Can you point me to the relevant pieces?

This was rather an empirical observation:)

an example
#include <stdio.h>
#include <fenv.h>
int
main()
{
    double x = 0x1.1115p0;
    fesetround(FE_DOWNWARD);
    printf("%.3a\n", x);
    fesetround(FE_UPWARD);
    printf("%.3a\n", x);
    return 0;
}
sk@note:~ $ gcc-9 -std=c11 a.c -lm && ./a.out
0x1.111p+0
0x1.112p+0
sk@note:~ $ gcc-10 -std=c11 a.c -lm && ./a.out
0x1.111p+0
0x1.112p+0
sk@note:~ $ gcc-12 -std=c11 a.c -lm && ./a.out
0x1.111p+0
0x1.112p+0
sk@note:~ $ clang-14 -std=c11 a.c -lm && ./a.out
0x1.111p+0
0x1.112p+0
sk@note:~ $ clang-15 -std=c11 a.c -lm && ./a.out
0x1.111p+0
0x1.112p+0

I'll look into. The 7.21.6 section (C11) rather vague in this respect...

@skirpichev
Copy link
Member Author

I'm not sure that's true for the current code. I think I can reduce diff wrt old float_hex_impl() by removing added comments etc: precision-related changes aren't big so far. Let me know if this helps review and worth efforts.

how did you implement rounding for subnormals, for example?

Hmm, I think that's done like in C stdlib, e.g.

#include <stdio.h>
int main()
{
    double x = 3e-320;
    printf("%a\n", x);     // 0x0.00000000017b8p-1022
    printf("%.9a\n", x);   // 0x0.000000000p-1022
    printf("%.10a\n", x);  // 0x0.0000000001p-1022
    printf("%.11a\n", x);  // 0x0.00000000018p-1022
    return 0;
}

Did you deal with the cases close to sys.float_info.max where we need to round to infinity?

Again, as above. Another example:

#include <stdio.h>
int main()
{
    double x = 1.5e+308;
    printf("%a\n", x);    // 0x1.ab36d48e1acfp+1023
    printf("%.0a\n", x);  // 0x2p+1023 (vs 1p+1024 in python)
    printf("%a\n", 0x2p+1023);  // a warning & prints inf
    return 0;
}

I'll double check again, compare with other implementations. But so far I think one should be correct.

What use-case do you see for adding precision support?

Probably, that's better to discuss in the issue thread. But I doubt that many languages implement useless feature.

@skirpichev
Copy link
Member Author

I'm not seeing anything in the C standard that would suggest that. Can you point me to the relevant pieces?

Well, I find "relevant pieces" rather vague, but...

On p281 (C99 Draft): "For a and A conversions, if FLT_RADIX is a power of 2, the value is correctly rounded to a hexadecimal floating number with the given precision."
and then next, in "Recommended practice": "For a and A conversions, if FLT_RADIX is not a power of 2 and the result is not exactly representable in the given precision, the result should be one of the two adjacent numbers in hexadecimal floating style with the given precision, with the extra stipulation that the error should have a correct sign for the current rounding direction."
Following remark about other conversions for float types also mention "the current rounding direction".

I doubt this "the current rounding direction" could be interpreted otherwise than one, controllable by fegetround() / fesetround()... I think that's natural that per default - current rounding mode depends on floating-point environment settings. Where it's different - that's mentioned as an exception. For example: "The trunc functions use IEC 60559 rounding toward zero (regardless of the current rounding direction)."

@skirpichev
Copy link
Member Author

Ok, @mdickinson, I've checked rounding code (~20 lines) and I think it's correct and match implementation e.g. in the glibc. The difference is that stdlib's code sometimes not do 1-normalization.

After all, that shouldn't be much complex than binary rounding. Which we could use instead (i.e. precision settings will be the number of bits after the decimal dot rather than number of hexadecimal digits) with almost same code. This is one variant to allow rounding to any number of bits. (Unfortunately, rounding direction can't be specified, but that's another story.)

Bulk tests
diff --git a/Lib/test/test_float.py b/Lib/test/test_float.py
index 8b56d76bb0..5e757ec5ad 100644
--- a/Lib/test/test_float.py
+++ b/Lib/test/test_float.py
@@ -750,6 +750,64 @@ def test_format(self):
         self.assertEqual(format(INF, 'f'), 'inf')
         self.assertEqual(format(INF, 'F'), 'INF')
 
+    def test_format_x_prec_bulk(self):
+        import ctypes, math, random, _testcapi, sys
+
+        libc = ctypes.CDLL("libc.so.6")
+
+        def float_print(d, i):
+            fmt = "%." + str(i) + "a\n"
+            a = b"z"*256
+            libc.sprintf(a, bytes(fmt, 'utf-8'), ctypes.c_double(d));
+            return a.decode('utf-8').split("\n")[0]
+
+        for n in range(1_000_000):
+            d = random.random()
+            i = random.randint(0,25)
+            with self.subTest(d=d, i=i):
+                f = "#." + str(i) + "x"
+                s1 = format(d, f)
+                s2 = float_print(d, i)
+                if s1 != s2:
+                    e1 = int(s1.split("p")[1])
+                    e2 = int(s2.split("p")[1])
+                    l1 = int(s1.replace("0x", "")[0])
+                    l2 = int(s2.replace("0x", "")[0])
+                    self.assertEqual(e1, e2+1)
+                    self.assertEqual(l1, l2//2)
+                    self.assertEqual(float.fromhex(s1), float.fromhex(s2))
+
+        DMAX = sys.float_info.max
+
+        for n in range(1_000_000):
+            d = DMAX-random.random()*10*math.ulp(DMAX)
+            i = random.randint(0,25)
+            with self.subTest(d=d, i=i):
+                f = "#." + str(i) + "x"
+                s1 = format(d, f)
+                s2 = float_print(d, i)
+                try:
+                    self.assertEqual(float.fromhex(s1), float.fromhex(s2))
+                except OverflowError:
+                    if s1 != s2:
+                        e1 = int(s1.split("p")[1])
+                        e2 = int(s2.split("p")[1])
+                        n1 = float.fromhex(s1.split("p")[0]+"p0")
+                        n2 = float.fromhex(s2.split("p")[0]+"p0")
+                        self.assertEqual(e1, e2+1)
+                        self.assertEqual(n1, n2/2)
+
+        DMIN = 2**sys.float_info.min_exp
+
+        for n in range(1_000_000):
+            d = random.random()*2**-30*DMIN
+            i = random.randint(0,25)
+            with self.subTest(d=d, i=i):
+                f = "#." + str(i) + "x"
+                s1 = format(d, f)
+                s2 = float_print(d, i)
+                self.assertEqual(float.fromhex(s1), float.fromhex(s2))
+
     @support.requires_IEEE_754
     def test_format_testfile(self):
         with open(format_testfile, encoding="utf-8") as testfile:

@vstinner
Copy link
Member

vstinner commented Jul 2, 2024

It was decided to not pursue this approach, since float.hex() exists and it's enough for most usages, and we don't want to make str.format() more complicated.

@vstinner vstinner closed this Jul 2, 2024
@skirpichev skirpichev deleted the hex-format branch August 2, 2024 14:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants