-
Notifications
You must be signed in to change notification settings - Fork 10.5k
Fix precision of float to string to max significant decimals #348
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix precision of float to string to max significant decimals #348
Conversation
A link to the JIRA issue. https://bugs.swift.org/browse/SR-106 |
I'm also wondering if I should implement tests for this specific behavior. Similar tests are in |
The issue is because of the hardcoded |
No, the issue is completely different. 32 bytes is the buffer size for the string. It was just the matter of format string. |
@@ -172,13 +172,13 @@ static uint64_t swift_floatingPointToString(char *Buffer, size_t BufferLength, | |||
extern "C" uint64_t swift_float32ToString(char *Buffer, size_t BufferLength, | |||
float Value) { | |||
return swift_floatingPointToString<float>(Buffer, BufferLength, Value, | |||
"%0.*g"); | |||
"%0.9g"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By changing the format string, you are invoking undefined behavior when swift_snprintf_l is called within swift_floatingPointToString; the call now passes the wrong number of arguments*. The goal here is broadly fairly sensible, but instead of changing the format string, the more correct fix would be to replace numeric_limits::digits10 with numeric_limits::max_digits10. It's worth noting that that's a C++11-ism, but I believe that's fine (someone else please confirm).
*there should have been a compiler warning about this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not a C++ guru, so I have no opinion about the latter part but changing the format string is needed (too). %f defaults to 6 decimal places. The format string I suggested does not take two arguments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
C++ punts to C to define format strings. The C standard says the following:
As noted above, a field width, or precision, or both, may be indicated by an asterisk. In this case, an int argument supplies the field width or precision. The arguments specifying field width, or precision, or both, shall appear (in that order) before the argument (if any) to be converted.
So "%0.9g" expects a single value argument, but "%0.*g" expects two arguments, first an integer specifying how many digits to print ("Precision"), and then the value to be printed ("Value"). This is why simply changing the definition of Precision and not modifying the format string suffices.
The format string I suggested does not take two arguments.
Correct; that's the problem. The format string takes one argument, but you're passing two:
swift_snprintf_l(Buffer, BufferLength, nullptr, Format, Precision, Value);
~~~~~~~~~ ~~~~~
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I get it. it was dumb. I will change it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
An easy mistake to make. Thanks for addressing it. @dabrahams once that's resolved, I'm fine with this change.
@stephentyrone thanks a lot. I fixed the issue you pointed. I will squash the commits and change commit message before merge. |
@stephentyrone and what about tests? |
Yes, tests would be great. It would be easy to add regression tests to ensure that a few values like 1.0000000000000002 round-trip Double -> String -> Double with exact equality. If you have time to do more than that, that's even better. |
@stephentyrone So, as you might have expected, the change breaks I still however, maintain that the change is needed because getting more precise values from I'm wondering how to change the tests. I can see two possible solutions:
|
Obviously longer term we need a richer system for converting between floating-point numbers and strings. However, we don't have that yet, and that's a big feature to design[1]. The current behavior favors String -> FP -> String round trips; your change would favor FP -> String -> FP round trips instead. I think that the latter is more important, so this change makes sense, but it is a behavior change, and some people will be confused no matter what. If folks are onboard for this change (I haven't seen any objections yet), then we should update the tests to conform to the new behavior, probably by simply specifying more digits in the source values so that the tests don't look insane. [1] for simple print statements, the most intuitive thing would be to change the print behavior to print exactly as many digits as are needed to round-trip the number being printed (this would avoid the problem you're hitting here). However, the C standard library doesn't have that mode, so we'd need to write our own converters, which is a big project. Still, in the long term, this would be a nice thing to do. |
@stephentyrone The current design was an explicit decision, IIRC. We thought it is more intuitive to omit decimal digits that are not guaranteed to be correct, and if one wants to round-trip FP through serialization, they should do something different (e.g., use a different printing function, use hexadecimal representation, or just binary representation). We have been trying to capture the distinction between "user-presentable" and "accurate serialization" in |
@gribozavr I thought that might be the case. As I hinted, it's not at all cut-and-dry. To provide some context for everyone, the trade-off boils down to:
Certainly debugDescription should use As for Long-term, we should make [1] On the other hand, it hints that maybe 3.2 isn't really "3.2", so maybe that's not so bad. |
@stephentyrone I think I'm convinced! I want to know what @dabrahams thinks about this change. I'm concerned about taking this change in Swift 2.2 (this might be a significant breaking change for those apps that use this API to display floating point numbers in UI), but it should be fine in Swift 3.0. Maybe we also need to provide a replacement API that allows one to specify the precision, so that one does not need to fall back to format strings? |
Agreed on all points: I want to hear Dave's take, it makes sense to keep out of 2.2, we need an easy-to-use "format nicely for display", as well as finer-grained control, and we need to document all of it better so confused people can figure out what to do. Post-2.2, a change along these lines seems like a good starting point. |
But we can probably take a change in Swift 2.2 to |
@@ -140,12 +140,21 @@ static int swift_snprintf_l(char *Str, size_t StrSize, locale_t Locale, | |||
#endif | |||
|
|||
template <typename T> | |||
static int swift_floatingPointToStringPrecision(bool Debug) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As far as I understand the convention, methods are defined before being called(?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, we try to avoid forward declarations if we can just reorder functions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not totally obvious to me that this warrants a separate function. Do we expect this to get more logic to become more complex in the future? If not, it seems clearer to me if this is just folded into the caller.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks to me like a good alternative to code like this:
int Precision = std::numeric_limits<T>::digits10;
if (Debug) {
Precision = std::numeric_limits<T>::max_digits10;
}
which adds complexity for the reader - it adds a 5th conditional to the function body.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Personally, I find that much clearer, but I definitely don't feel strongly enough to argue for it. If you're happy with it as is, great.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you want to be extra clear:
int Precision;
if (Debug) {
Precision = std::numeric_limits<T>::max_digits10;
} else {
Precision = std::numeric_limits<T>::digits10;
}
@stephentyrone I made a change as suggested in the conversation to speed up the process. In fact, the way I started to work on a fix for this issue, was because of a misleading value shown in the debugger. I'm not too opinionated about this topic, but from the user perspective |
} | ||
} | ||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Extra newline.
@wczekalski This patch probably would affect tests (did you run them? any fallout?) If it does not, then our tests are not good enough, and more need to be written :) |
@gribozavr There are only two tests failing and I will add some for One of the existing tests fail because Since both |
The default precision which is used for converting floating point numbers to strings leads to many confusing results. If we take a Float32 1.00000000 value and 1.00000012 of the same type, these two, obviously are not equal. However, if we log them, we are displayed the same value. So a much more helpful display using 9 decimal digits is thus: [1.00000000 != 1.00000012] showing that the two values are in fact different. (example taken from: http://www.boost.org/doc/libs/1_59_0/libs/test/doc/html/boost_test/test_output/log_floating_points.html) I'm by no means a floating point number expert, however having investigated this issue I found numerous sources saying that "magic" numbers 9 and 17 for 32 and 64 bit values respectively are the correct format. Numbers 9 and 17 represent the maximum number of decimal digits that round trips. This means that number 0.100000000000000005 and 0.1000000000000000 are the same as their floating-point representations are concerned.
6d41d93
to
65408c8
Compare
debugDescription prints numbers with greater precision
debugDescription of all floating point numbers shows the number with greater precision, thus the tests had to be changed.
They didn't fail if expected2 was nil (nil was default value) but the actual value was different than expected1
65408c8
to
575300e
Compare
@gribozavr I changed the formatting, fixed the tests, and added some. I also cleaned up the history. |
The tests pass on Linux. |
@wczekalski The change to @dabrahams We are still interested to know what you think about making the same change for |
Fix precision of float to string to max significant decimals
Thanks for following through on this, @wczekalski ! |
Hi guys, thanks for the awesome work. Just wondering if this is going to fix this issue too. Thank you.
|
@zwang Playground display style is not controlled by the standard library, please file a radar. |
@gribozavr The same issue exists for print() too. I just use PlayGround as an example of showing the issue. I will file a radar for playground issue. Thank you. |
@zwang I'm no numerical expert but having worked on this one I think it works as expected, i.e. we print 17 digits which is enough for float->text->float roundtrip (read this or google |
@zwang The particular issue you are seeing is that |
Both |
Re-enable SwifterSwift-watchOS since it's no longer failing
The default precision which is used for converting floating point numbers to strings leads to many confusing results. If we take a Float32 1.00000000 value and 1.00000012 of the same type, these two, obviously are not equal. However, if we log them, we are displayed the same value. So a much more helpful display using 9 decimal digits is thus: [1.00000000 != 1.00000012] showing that the two values are in fact different.
(example taken from: http://www.boost.org/doc/libs/1_59_0/libs/test/doc/html/boost_test/test_output/log_floating_points.html)
I'm by no means a floating point number expert, however having investigated this issue I found numerous sources saying that "magic" numbers 9 and 17 for 32 and 64 bit values respectively are the correct format. Numbers 9 and 17 represent the maximum number of decimal digits that round trips. This means that number 0.100000000000000005 and 0.1000000000000000 are the same as their floating-point representations are concerned.
Please let me know if you want me to elaborate more on this subject.