-
-
Notifications
You must be signed in to change notification settings - Fork 32k
gh-113804: Support formatting floats in hexadecimal notation #113805
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
3561899
to
54d7892
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would prefer to not add 0x
prefix by default, and only add it if the alternative form (#x
) is used.
I have no strong preference for the current syntax, but here is briefly its rationale:
|
ec3f7e3
to
6d721cd
Compare
_Py_dg_dtoa_hex() helper is based on float.hex() with additional support for the precision setting and some format flags. Note (c.f. ``'a'`` format type of the C stdlib), that ``'#'`` option is used to control the prefix (off by default). Trailing zeros (and the dot) in fractional part are excluded. Examples: ```pycon >>> f'{-0.1:#x}' '-0x1.999999999999ap-4' >>> (-0.1).hex() '-0x1.999999999999ap-4' >>> f'{3.14159:+#X}' '+0X1.921F9F01B866EP+1' >>> f'{3.14159:.3x}' '1.922p+1' ``` Minor changes: * added Py_hexdigits_upper constant * tests for RaisingNumber are moved (to test also bytes)
3450b30
to
64ab5ed
Compare
Ok, I think it's ready for review. I've mentioned some open issues in the pr description. BTW, the |
Please could you check these warnings?
|
@hugovk, thanks, fixed. |
This comment was marked as outdated.
This comment was marked as outdated.
@picnixz, PEP 7 doesn't require that in general. I think placement of spaces mostly is consistent with the rest of file. |
@vstinner, thanks for review. I think, except for few cases (not marked as resolved) - everything was addressed. |
@@ -55,6 +55,8 @@ extern PyObject* _Py_string_to_number_with_underscores( | |||
|
|||
extern double _Py_parse_inf_or_nan(const char *p, char **endptr); | |||
|
|||
extern char * _Py_dg_dtoa_hex(double x, int precision, int always_add_sign, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggest calling this something clearer like _Py_float_to_hex
. The dg_dtoa
stuff doesn't make much sense here: "dg" is a reference to David Gay, since he wrote the original version of the dtoa.c code that Python's dtoa.c is baed on; this code has nothing to do with Gay's code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, does make sense.
Initially that helper was a static function in pystrtod.c; just as _Py_dg_dtoa()
it, well, "convert double to ASCII string"... This was exported and moved to floatobject.c to (1) slighly reduce diff and (2) support workaround for hex().
I think (1) goal wasn't very successful. Then, maybe this helper belongs to pystrtod.c? If so, _Py_dtoa_hex()
sounds ok?
This doesn't look right - the second result below should be showing a nonzero value:
I'd suggest dropping precision support altogether. The use case for hex formatting is to be able to convey the precise value of the float being converted. Rounding to a given number of digits is something that makes sense for decimal output, but I doubt there's any use for it for the hex output. |
Hmm, are you sure? It looks gcc & clang have different opinion:
a.c source#include <stdio.h>
int
main()
{
double x = 1e-320;
printf("%a\n", x);
printf("%.3a\n", x);
return 0;
} |
Round-ties-to-even is also not working correctly with the precision support:
That second result should be |
Interesting; I get a different result on macOS / Intel (clang 15.0.0):
|
Hmm, JFR:
But
Yes, that's a bug, thanks. Code should respect sys.float_info settings. |
I don't think But again, I'd suggest dropping precision support altogether - I don't think it's useful. The only possible use-case that occurred to me would be rounding to single precision, but even that's not possible, since you'd need exactly 23 bits (5 and 3/4 hex digits) after the point, and there's no way to express that. |
I meant in C this is determined by current rounding mode (and configurable by fesetround(), but that's another story).
I hope, I did this correctly.
Just curious, where it's documented? E.g. I don't see where it's mentioned in https://docs.python.org/3/library/string.html#format-specification-mini-language
That ruins the whole proposal.( |
Then perhaps it's the wrong proposal? It certainly doesn't ruin it for me: the convenience of being able to format floats with appropriate padding and alignment is a win. But adding precision support significantly complicates the code (not least because we have to deal with rounding correctly: how did you implement rounding for subnormals, for example? Did you deal with the cases close to What use-case do you see for adding precision support? |
I don't think it is, and in any case it's not a language requirement; just something that CPython itself tries to do well. |
Hmm. I'm not seeing anything in the C standard that would suggest that. Can you point me to the relevant pieces? |
This was rather an empirical observation:) an example#include <stdio.h>
#include <fenv.h>
int
main()
{
double x = 0x1.1115p0;
fesetround(FE_DOWNWARD);
printf("%.3a\n", x);
fesetround(FE_UPWARD);
printf("%.3a\n", x);
return 0;
}
I'll look into. The 7.21.6 section (C11) rather vague in this respect... |
I'm not sure that's true for the current code. I think I can reduce diff wrt old
Hmm, I think that's done like in C stdlib, e.g. #include <stdio.h>
int main()
{
double x = 3e-320;
printf("%a\n", x); // 0x0.00000000017b8p-1022
printf("%.9a\n", x); // 0x0.000000000p-1022
printf("%.10a\n", x); // 0x0.0000000001p-1022
printf("%.11a\n", x); // 0x0.00000000018p-1022
return 0;
}
Again, as above. Another example: #include <stdio.h>
int main()
{
double x = 1.5e+308;
printf("%a\n", x); // 0x1.ab36d48e1acfp+1023
printf("%.0a\n", x); // 0x2p+1023 (vs 1p+1024 in python)
printf("%a\n", 0x2p+1023); // a warning & prints inf
return 0;
} I'll double check again, compare with other implementations. But so far I think one should be correct.
Probably, that's better to discuss in the issue thread. But I doubt that many languages implement useless feature. |
Well, I find "relevant pieces" rather vague, but... On p281 (C99 Draft): "For a and A conversions, if FLT_RADIX is a power of 2, the value is correctly rounded to a hexadecimal floating number with the given precision." I doubt this "the current rounding direction" could be interpreted otherwise than one, controllable by fegetround() / fesetround()... I think that's natural that per default - current rounding mode depends on floating-point environment settings. Where it's different - that's mentioned as an exception. For example: "The trunc functions use IEC 60559 rounding toward zero (regardless of the current rounding direction)." |
Ok, @mdickinson, I've checked rounding code (~20 lines) and I think it's correct and match implementation e.g. in the glibc. The difference is that stdlib's code sometimes not do 1-normalization. After all, that shouldn't be much complex than binary rounding. Which we could use instead (i.e. precision settings will be the number of bits after the decimal dot rather than number of hexadecimal digits) with almost same code. This is one variant to allow rounding to any number of bits. (Unfortunately, rounding direction can't be specified, but that's another story.) Bulk testsdiff --git a/Lib/test/test_float.py b/Lib/test/test_float.py
index 8b56d76bb0..5e757ec5ad 100644
--- a/Lib/test/test_float.py
+++ b/Lib/test/test_float.py
@@ -750,6 +750,64 @@ def test_format(self):
self.assertEqual(format(INF, 'f'), 'inf')
self.assertEqual(format(INF, 'F'), 'INF')
+ def test_format_x_prec_bulk(self):
+ import ctypes, math, random, _testcapi, sys
+
+ libc = ctypes.CDLL("libc.so.6")
+
+ def float_print(d, i):
+ fmt = "%." + str(i) + "a\n"
+ a = b"z"*256
+ libc.sprintf(a, bytes(fmt, 'utf-8'), ctypes.c_double(d));
+ return a.decode('utf-8').split("\n")[0]
+
+ for n in range(1_000_000):
+ d = random.random()
+ i = random.randint(0,25)
+ with self.subTest(d=d, i=i):
+ f = "#." + str(i) + "x"
+ s1 = format(d, f)
+ s2 = float_print(d, i)
+ if s1 != s2:
+ e1 = int(s1.split("p")[1])
+ e2 = int(s2.split("p")[1])
+ l1 = int(s1.replace("0x", "")[0])
+ l2 = int(s2.replace("0x", "")[0])
+ self.assertEqual(e1, e2+1)
+ self.assertEqual(l1, l2//2)
+ self.assertEqual(float.fromhex(s1), float.fromhex(s2))
+
+ DMAX = sys.float_info.max
+
+ for n in range(1_000_000):
+ d = DMAX-random.random()*10*math.ulp(DMAX)
+ i = random.randint(0,25)
+ with self.subTest(d=d, i=i):
+ f = "#." + str(i) + "x"
+ s1 = format(d, f)
+ s2 = float_print(d, i)
+ try:
+ self.assertEqual(float.fromhex(s1), float.fromhex(s2))
+ except OverflowError:
+ if s1 != s2:
+ e1 = int(s1.split("p")[1])
+ e2 = int(s2.split("p")[1])
+ n1 = float.fromhex(s1.split("p")[0]+"p0")
+ n2 = float.fromhex(s2.split("p")[0]+"p0")
+ self.assertEqual(e1, e2+1)
+ self.assertEqual(n1, n2/2)
+
+ DMIN = 2**sys.float_info.min_exp
+
+ for n in range(1_000_000):
+ d = random.random()*2**-30*DMIN
+ i = random.randint(0,25)
+ with self.subTest(d=d, i=i):
+ f = "#." + str(i) + "x"
+ s1 = format(d, f)
+ s2 = float_print(d, i)
+ self.assertEqual(float.fromhex(s1), float.fromhex(s2))
+
@support.requires_IEEE_754
def test_format_testfile(self):
with open(format_testfile, encoding="utf-8") as testfile: |
It was decided to not pursue this approach, since float.hex() exists and it's enough for most usages, and we don't want to make str.format() more complicated. |
The
x
andX
format types (instr.format()
, f-strings and old-style string formatting) now support also formatting floats as a hexadecimal string, i.e. in the style[±][0x]h[.hhh]p±d
(which is essentially a base-16 scientific notation, but with base-2 exponent, written in decimal)._Py_dg_dtoa_hex()
helper is based on thefloat.hex()
method with additional support for the precision setting and some format flags. Note (c.f. printf's'a'
format type of the C stdlib), that'#'
option is used to control the prefix (off by default).Trailing zeros (and the dot) in the fractional part are excluded. (This change doesn't affect the
float.hex()
method andfloat.fromhex()
/float.hex()
round-trip.)Examples:
Minor changes:
Py_hexdigits_upper
constantNotes
Implementation was changed to use "x"/"X" format types per suggestions both in the discussion thread and here. This (1) makes output slightly more compact per default and (2) allows to extend formatting support to old-style string formatting (using "a" letter will conflict with existing
ascii()
converter).Personally I would choose "a" format type (like printf) instead and keep the prefix mandatory, to simplify interaction with other languages (e.g. copy-pasting output to C code as a hexadecimal literal) or within in the CPython repl itself, if #114668 will be accepted. It worth to note, that both the gmpy2 and the bigfloat packages use the "a" format type.
Open issues:
str.format()
/f-strings to be more compatibile with existing libraries (e.g. gmpy2)?Should we change at all old-style string formatting?f"{v:#x}"
instead ofv.hex()
?float.hex()
method?📚 Documentation preview 📚: https://cpython-previews--113805.org.readthedocs.build/