math: math on 387 does not match float64 due to extended exponents

In runtime/asm_386.s's asminit, we try to set the floating point control word so that all 387 floating-point math happens at 64-bit precision, with rounding to 64-bit after every operation. 

The Intel manual says:

> The precision-control (PC) field (bits 8 and 9 of the x87 FPU control word) determines the precision
> (64, 53, or 24 bits) of floating-point calculations made by the x87 FPU (see Table 8-2). The
> default precision is double extended precision, which uses the full 64-bit significand available
> with the double extended-precision floating-point format of the x87 FPU data registers. This
> setting is best suited for most applications, because it allows applications to take full advantage
> of the maximum precision available with the x87 FPU data registers.
> 
> The double precision and single precision settings reduce the size of the significand to 53 bits
> and 24 bits, respectively. These settings are provided to support IEEE Standard 754 and to
> provide compatibility with the specifications of certain existing programming languages. Using
> these settings nullifies the advantages of the double extended-precision floating-point format's
> 64-bit significand length. When reduced precision is specified, the rounding of the significand
> value clears the unused bits on the right to zeros.
> 
> The precision-control bits only affect the results of the following floating-point instructions:
> FADD, FADDP, FIADD, FSUB, FSUBP, FISUB, FSUBR, FSUBRP, FISUBR, FMUL,
> FMULP, FIMUL, FDIV, FDIVP, FIDIV, FDIVR, FDIVRP, FIDIVR, and FSQRT.

Suppose x=1, y=100, z=1e308 and we compute x/(y*z). 

The value y*z is 1e310, which is too big for a float64, so it should compute as +Inf. Then 1/+Inf is 0. So we expect to get zero from x/(y*z), and we do in x86 with SSE and on non-x86 systems. But on x86 using 387 instructions, even with the FPU precision set to 64-bit, we get 1e-310. Clearly the FPU is storing the intermediate y*z result in something more than a float64.

Here is a program, suppose its in a directory called x:

```
$ cat x.go
package main

import (
    "fmt"
    "runtime"
)

func fpuControlWord() uint16

func main() {
    runtime.LockOSThread()

    fmt.Printf("Control Word: %#x\n", fpuControlWord())
    x, y, z := vals()
    fmt.Printf("x=%v y=%v z=%v y*z=%v x/(y*z)=%v\n", x, y, z, y*z, x/(y*z))
    fmt.Printf("g(x, y, z)=%v\n", g(x, y, z))
    fmt.Printf("Control Word: %#x\n", fpuControlWord())
}

//go:noinline
func g(x, y, z float64) float64 {
    return x / (y * z)
}

//go:noinline
func vals() (float64, float64, float64) {
    return 1, 100, 1e308
}
$ cat fld.s
#include "go_asm.h"

TEXT ·fpuControlWord(SB),$0-0
    FSTCW ret+0(FP)
    RET
$ GOARCH=386 GO386=387 go build
$ ./x
Control Word: 0x27f
x=1 y=100 z=1e+308 y*z=+Inf x/(y*z)=1e-310
g(x, y, z)=1e-310
Control Word: 0x27f
$ 
```

The control word is set correctly.

The function g is here to make it easier to see the computation instructions. Here they are from compile -S:

```
"".g t=1 size=52 args=0x20 locals=0x0
    0x0000 00000 (x.go:20)  TEXT    "".g(SB), $0-32
    0x0000 00000 (x.go:20)  MOVL    (TLS), CX
    0x0007 00007 (x.go:20)  CMPL    SP, 8(CX)
    0x000a 00010 (x.go:20)  JLS 45
    0x000c 00012 (x.go:20)  FUNCDATA    $0, gclocals·895d0569a38a56443b84805daa09d838(SB)
    0x000c 00012 (x.go:20)  FUNCDATA    $1, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB)
    0x000c 00012 (x.go:21)  FMOVD   "".y+12(FP), F0
    0x0010 00016 (x.go:21)  FMOVD   "".z+20(FP), F0
    0x0014 00020 (x.go:21)  FMOVD   F0, F0
    0x0016 00022 (x.go:21)  FMULDP  F0, F2
    0x0018 00024 (x.go:21)  FMOVD   "".x+4(FP), F0
    0x001c 00028 (x.go:21)  FMOVDP  F0, F1
    0x001e 00030 (x.go:21)  FMOVD   F1, F0
    0x0020 00032 (x.go:21)  FDIVDP  F0, F1
    0x0022 00034 (x.go:21)  FMOVD   F0, F0
    0x0024 00036 (x.go:21)  FMOVDP  F0, "".~r3+28(FP)
    0x0028 00040 (x.go:21)  FMOVDP  F0, F0
    0x002a 00042 (x.go:21)  FMOVDP  F0, F0
    0x002c 00044 (x.go:21)  RET
    0x002d 00045 (x.go:21)  NOP
    0x002d 00045 (x.go:20)  PCDATA  $0, $-1
    0x002d 00045 (x.go:20)  CALL    runtime.morestack_noctxt(SB)
    0x0032 00050 (x.go:20)  JMP 0
```

And here they are from lldb, just to confirm:

```
(lldb) disas -n main.g
x`main.g:
x[0x2370] <+0>:  movl   %gs:0x468, %ecx
x[0x2377] <+7>:  cmpl   0x8(%ecx), %esp
x[0x237a] <+10>: jbe    0x239d                    ; <+45> at x.go:20
x[0x237c] <+12>: fldl   0xc(%esp)
x[0x2380] <+16>: fldl   0x14(%esp)
x[0x2384] <+20>: fld    %st(0)
x[0x2386] <+22>: fmulp  %st(2)
x[0x2388] <+24>: fldl   0x4(%esp)
x[0x238c] <+28>: fstp   %st(1)
x[0x238e] <+30>: fld    %st(1)
x[0x2390] <+32>: fdivrp %st(1)
x[0x2392] <+34>: fld    %st(0)
x[0x2394] <+36>: fstpl  0x1c(%esp)
x[0x2398] <+40>: fstp   %st(0)
x[0x239a] <+42>: fstp   %st(0)
x[0x239c] <+44>: retl   
x[0x239d] <+45>: calll  0x41f40                   ; runtime.morestack_noctxt at asm_386.s:395
x[0x23a2] <+50>: jmp    0x2370                    ; <+0> at x.go:20
```

I think this shows:
- The multiplication result (product) is stored in a 387 register and then used in the division.
- The control word is set such that the product should be rounded to float64.
- If the product were rounded, you'd get +Inf.
- From the division result, it's clear that the product is stored as 1e310.

As I finish writing this, I realize the problem: the product mantissa is being rounded to double precision, but the exponent is being left alone: the truncated registers still have extended exponents until they are converted to float64 by transiting memory. Hence the discrepancy.

Lesson: even with the control word change the 387 does not behave exactly like standard float64 hardware.

I'm bothering to file this at all so that maybe I can find it the next time I get confused by this.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

math: math on 387 does not match float64 due to extended exponents #17357

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

math: math on 387 does not match float64 due to extended exponents #17357

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions