Skip to content

Fix precision of float to string to max significant decimals #348

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
Dec 15, 2015

Conversation

wokalski
Copy link
Contributor

@wokalski wokalski commented Dec 8, 2015

The default precision which is used for converting floating point numbers to strings leads to many confusing results. If we take a Float32 1.00000000 value and 1.00000012 of the same type, these two, obviously are not equal. However, if we log them, we are displayed the same value. So a much more helpful display using 9 decimal digits is thus: [1.00000000 != 1.00000012] showing that the two values are in fact different.
(example taken from: http://www.boost.org/doc/libs/1_59_0/libs/test/doc/html/boost_test/test_output/log_floating_points.html)

I'm by no means a floating point number expert, however having investigated this issue I found numerous sources saying that "magic" numbers 9 and 17 for 32 and 64 bit values respectively are the correct format. Numbers 9 and 17 represent the maximum number of decimal digits that round trips. This means that number 0.100000000000000005 and 0.1000000000000000 are the same as their floating-point representations are concerned.


Please let me know if you want me to elaborate more on this subject.

@wokalski
Copy link
Contributor Author

wokalski commented Dec 8, 2015

A link to the JIRA issue. https://bugs.swift.org/browse/SR-106

@wokalski
Copy link
Contributor Author

wokalski commented Dec 8, 2015

I'm also wondering if I should implement tests for this specific behavior. Similar tests are in /test/1_stdlib/Print.swift, specifically test_FloatingPointPrinting().

@lattner lattner assigned gribozavr and dabrahams and unassigned gribozavr Dec 8, 2015
@getaaron
Copy link

getaaron commented Dec 8, 2015

The issue is because of the hardcoded 32 here right? (A Double is 64 bytes). Should / can we replace that number with the correct number based on the type? If not, this solution seems acceptable.

@wokalski
Copy link
Contributor Author

wokalski commented Dec 8, 2015

No, the issue is completely different. 32 bytes is the buffer size for the string. It was just the matter of format string.

@@ -172,13 +172,13 @@ static uint64_t swift_floatingPointToString(char *Buffer, size_t BufferLength,
extern "C" uint64_t swift_float32ToString(char *Buffer, size_t BufferLength,
float Value) {
return swift_floatingPointToString<float>(Buffer, BufferLength, Value,
"%0.*g");
"%0.9g");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By changing the format string, you are invoking undefined behavior when swift_snprintf_l is called within swift_floatingPointToString; the call now passes the wrong number of arguments*. The goal here is broadly fairly sensible, but instead of changing the format string, the more correct fix would be to replace numeric_limits::digits10 with numeric_limits::max_digits10. It's worth noting that that's a C++11-ism, but I believe that's fine (someone else please confirm).

*there should have been a compiler warning about this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not a C++ guru, so I have no opinion about the latter part but changing the format string is needed (too). %f defaults to 6 decimal places. The format string I suggested does not take two arguments.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

C++ punts to C to define format strings. The C standard says the following:

As noted above, a field width, or precision, or both, may be indicated by an asterisk. In this case, an int argument supplies the field width or precision. The arguments specifying field width, or precision, or both, shall appear (in that order) before the argument (if any) to be converted.

So "%0.9g" expects a single value argument, but "%0.*g" expects two arguments, first an integer specifying how many digits to print ("Precision"), and then the value to be printed ("Value"). This is why simply changing the definition of Precision and not modifying the format string suffices.

The format string I suggested does not take two arguments.

Correct; that's the problem. The format string takes one argument, but you're passing two:

swift_snprintf_l(Buffer, BufferLength, nullptr, Format, Precision, Value);
                                                        ~~~~~~~~~  ~~~~~                              

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I get it. it was dumb. I will change it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An easy mistake to make. Thanks for addressing it. @dabrahams once that's resolved, I'm fine with this change.

@wokalski
Copy link
Contributor Author

wokalski commented Dec 8, 2015

@stephentyrone thanks a lot. I fixed the issue you pointed. I will squash the commits and change commit message before merge.

@wokalski
Copy link
Contributor Author

wokalski commented Dec 8, 2015

@stephentyrone and what about tests?

@stephentyrone
Copy link
Contributor

Yes, tests would be great. It would be easy to add regression tests to ensure that a few values like 1.0000000000000002 round-trip Double -> String -> Double with exact equality. If you have time to do more than that, that's even better.

@wokalski
Copy link
Contributor Author

wokalski commented Dec 9, 2015

@stephentyrone So, as you might have expected, the change breaks stdlib tests defined in Print.swift. The test becomes very unintuitive when you see things like printedIs(asFloat32(1.00001), "1.00001001"), which is true but a bit confusing.

I still however, maintain that the change is needed because getting more precise values from description is very precious, especially when debugging.

I'm wondering how to change the tests. I can see two possible solutions:

  1. Change the numbers in tests so that the test is more intuitive
  2. Replace the existing tests which assert the way values are printed with tests which verify if round trips from String -> Float -> String and Float -> String -> Float are defined correctly for floating point numbers with precision of digits10 and max_digits10 respectively.

@stephentyrone
Copy link
Contributor

Obviously longer term we need a richer system for converting between floating-point numbers and strings. However, we don't have that yet, and that's a big feature to design[1].

The current behavior favors String -> FP -> String round trips; your change would favor FP -> String -> FP round trips instead. I think that the latter is more important, so this change makes sense, but it is a behavior change, and some people will be confused no matter what. If folks are onboard for this change (I haven't seen any objections yet), then we should update the tests to conform to the new behavior, probably by simply specifying more digits in the source values so that the tests don't look insane.

[1] for simple print statements, the most intuitive thing would be to change the print behavior to print exactly as many digits as are needed to round-trip the number being printed (this would avoid the problem you're hitting here). However, the C standard library doesn't have that mode, so we'd need to write our own converters, which is a big project. Still, in the long term, this would be a nice thing to do.

@gribozavr
Copy link
Contributor

@stephentyrone The current design was an explicit decision, IIRC. We thought it is more intuitive to omit decimal digits that are not guaranteed to be correct, and if one wants to round-trip FP through serialization, they should do something different (e.g., use a different printing function, use hexadecimal representation, or just binary representation).

We have been trying to capture the distinction between "user-presentable" and "accurate serialization" in description and debugDescription. Would it make sense to you to keep description as is, but change debugDescription to use max_digits10? Or is always using max_digits10 better in your opinion?

@stephentyrone
Copy link
Contributor

@gribozavr I thought that might be the case. As I hinted, it's not at all cut-and-dry. To provide some context for everyone, the trade-off boils down to:

  • If we use max_digits10, then all floating-point values round trip T -> String -> T exactly, but values like 3.2 get printed as "3.2000000000000002", which is annoying and potentially confusing.[1]
  • If we instead use digits10, then all strings of up to digits10 decimal digits round-trip String -> T -> String exactly (which as a corollary means that "3.2" prints as "3.2"), but (a) different floating-point values print as the same decimal string and (b) many floating values do not round trip T -> String -> T correctly.

Certainly debugDescription should use max_digits10. That much is clear--debug descriptions should accurately reflect the data (personally as a numerical programmer, I would prefer they use the hexadecimal floating-point format, which is exact, but I recognize that most people won't know what to do with this).

As for description, I favor max_digits10 there too, because the risk (programmer / user confusion) is less than the alternative (data loss). (There is also the risk of data loss with the first option, if someone decided to try to move string data around as doubles, but that's such a comically bad idea that I'm mostly willing to discount it[2].) It's also easier to fix programmer / user confusion via education about floating-point. I can see the argument the other way too, however.

Long-term, we should make description use as many digits as are necessary to round-trip, and not more (so 3.2 prints as "3.2"), but that's a much more invasive code change, outside the scope of this pull request. We could fake it by making multiple calls to snprintf( ), but that seems non-ideal too.

[1] On the other hand, it hints that maybe 3.2 isn't really "3.2", so maybe that's not so bad.
[2] insert aside about people storing numeric string data as doubles in JavaScript here.

@gribozavr
Copy link
Contributor

@stephentyrone I think I'm convinced! I want to know what @dabrahams thinks about this change.

I'm concerned about taking this change in Swift 2.2 (this might be a significant breaking change for those apps that use this API to display floating point numbers in UI), but it should be fine in Swift 3.0. Maybe we also need to provide a replacement API that allows one to specify the precision, so that one does not need to fall back to format strings?

@stephentyrone
Copy link
Contributor

Agreed on all points: I want to hear Dave's take, it makes sense to keep out of 2.2, we need an easy-to-use "format nicely for display", as well as finer-grained control, and we need to document all of it better so confused people can figure out what to do.

Post-2.2, a change along these lines seems like a good starting point.

@gribozavr
Copy link
Contributor

But we can probably take a change in Swift 2.2 to debugDescription only.

@@ -140,12 +140,21 @@ static int swift_snprintf_l(char *Str, size_t StrSize, locale_t Locale,
#endif

template <typename T>
static int swift_floatingPointToStringPrecision(bool Debug) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far as I understand the convention, methods are defined before being called(?)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we try to avoid forward declarations if we can just reorder functions.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not totally obvious to me that this warrants a separate function. Do we expect this to get more logic to become more complex in the future? If not, it seems clearer to me if this is just folded into the caller.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks to me like a good alternative to code like this:

  int Precision = std::numeric_limits<T>::digits10;
  if (Debug) {
    Precision = std::numeric_limits<T>::max_digits10;
  }

which adds complexity for the reader - it adds a 5th conditional to the function body.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Personally, I find that much clearer, but I definitely don't feel strongly enough to argue for it. If you're happy with it as is, great.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you want to be extra clear:

  int Precision;
  if (Debug) {
    Precision = std::numeric_limits<T>::max_digits10;
  } else {
    Precision = std::numeric_limits<T>::digits10;
  }

@wokalski
Copy link
Contributor Author

@stephentyrone I made a change as suggested in the conversation to speed up the process.

In fact, the way I started to work on a fix for this issue, was because of a misleading value shown in the debugger. I'm not too opinionated about this topic, but from the user perspective digits10 for description and max_digits10 for debugDescription are better than the current behavior IMO.

}
}


Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Extra newline.

@gribozavr
Copy link
Contributor

@wczekalski This patch probably would affect tests (did you run them? any fallout?) If it does not, then our tests are not good enough, and more need to be written :)

@wokalski
Copy link
Contributor Author

@gribozavr There are only two tests failing and I will add some for debugDescription in Print.swift tests.

One of the existing tests fail because Array.description calls debugDescription on its contents. Is it desired behavior? I'd expect Array.description to call respective method on its contents and the same about Array.debugDescription.

Since both Array.debugDescription and Array.description call the same method Array._makeDescription which takes a Bool value as a parameter, the behavior is should not be too hard to change.

The default precision which is used for converting floating point numbers to strings leads to many confusing results. If we take a Float32 1.00000000 value and 1.00000012 of the same type, these two, obviously are not equal. However, if we log them, we are displayed the same value. So a much more helpful display using 9 decimal digits is thus: [1.00000000 != 1.00000012] showing that the two values are in fact different.
(example taken from: http://www.boost.org/doc/libs/1_59_0/libs/test/doc/html/boost_test/test_output/log_floating_points.html)

I'm by no means a floating point number expert, however having investigated this issue I found numerous sources saying that "magic" numbers 9 and 17 for 32 and 64 bit values respectively are the correct format. Numbers 9 and 17 represent the maximum number of decimal digits that round trips. This means that number 0.100000000000000005 and 0.1000000000000000 are the same as their floating-point representations are concerned.
@wokalski wokalski force-pushed the bugfix/float-string-precision branch 2 times, most recently from 6d41d93 to 65408c8 Compare December 14, 2015 18:21
debugDescription prints numbers with greater precision
debugDescription of all floating point numbers shows the number with greater precision, thus the tests had to be changed.
They didn't fail if expected2 was nil (nil was default value) but the actual value was different than expected1
@wokalski wokalski force-pushed the bugfix/float-string-precision branch from 65408c8 to 575300e Compare December 14, 2015 18:25
@wokalski
Copy link
Contributor Author

@gribozavr I changed the formatting, fixed the tests, and added some. I also cleaned up the history.

@gribozavr
Copy link
Contributor

The tests pass on Linux.

@gribozavr
Copy link
Contributor

@wczekalski The change to debugDescription LGTM, thanks!

@dabrahams We are still interested to know what you think about making the same change for description.

gribozavr added a commit that referenced this pull request Dec 15, 2015
Fix precision of float to string to max significant decimals
@gribozavr gribozavr merged commit 10a8f4e into swiftlang:master Dec 15, 2015
@stephentyrone
Copy link
Contributor

Thanks for following through on this, @wczekalski !

frootloops added a commit to frootloops/swift that referenced this pull request Dec 24, 2015
@zwang
Copy link

zwang commented Feb 5, 2016

Hi guys, thanks for the awesome work. Just wondering if this is going to fix this issue too. Thank you.

// Tested in Xcode 7.2 Swift 2.1 Playground
let v: Double = 2.6090288509851067  // Note there are 16 decimals, swift double minimum is 15 decimals and when we do print(v) directly, it only print up to 15 decimals)
let dictionary = ["test": v]   //["test": 2.609028850985107]  <-- Playground shows only 15 decimals and rounded

@gribozavr
Copy link
Contributor

@zwang Playground display style is not controlled by the standard library, please file a radar.

@zwang
Copy link

zwang commented Feb 6, 2016

@gribozavr The same issue exists for print() too. I just use PlayGround as an example of showing the issue.

I will file a radar for playground issue.

Thank you.

@wokalski
Copy link
Contributor Author

wokalski commented Feb 6, 2016

@zwang I'm no numerical expert but having worked on this one I think it works as expected, i.e. we print 17 digits which is enough for float->text->float roundtrip (read this or google max_digits10 for more info).
Also read this comment (it's in this PR). I think the issue you encountered will be addressed in the future.

@wokalski
Copy link
Contributor Author

wokalski commented Feb 6, 2016

@zwang The particular issue you are seeing is that print() invokes description on the argument, not debugDescription. This issue is also addressed in the comments above. Sorry for the confusion!

@tbkka
Copy link
Contributor

tbkka commented May 9, 2017

Both description and debugDescription should be accurate (in the sense explained in Steele & White's classic 1990 paper). Having only debugDescription be accurate is a very strange state of affairs.

@wokalski wokalski deleted the bugfix/float-string-precision branch May 9, 2017 20:45
freak4pc pushed a commit to freak4pc/swift that referenced this pull request Sep 28, 2022
Re-enable SwifterSwift-watchOS since it's no longer failing
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants