Skip to content

cmd/compile: -race instrumentation leads to different behaviour due to 16 to 32-bit expansion #21963

Closed
@martelletto

Description

@martelletto

What version of Go are you using (go version)?

go version devel +eca45997df Thu Sep 21 03:00:51 2017 +0000 linux/amd64

Does this issue reproduce with the latest release?

Yes; in fact it does not happen with earlier releases.

What operating system and processor architecture are you using (go env)?

GOARCH="amd64"
GOBIN=""
GOEXE=""
GOHOSTARCH="amd64"
GOHOSTOS="linux"
GOOS="linux"
GOPATH="/home/pedro/go"
GORACE=""
GOROOT="/home/pedro/go-current"
GOTOOLDIR="/home/pedro/go-current/pkg/tool/linux_amd64"
GCCGO="gccgo"
CC="gcc"
GOGCCFLAGS="-fPIC -m64 -pthread -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build492614889=/tmp/go-build -gno-record-gcc-switches"
CXX="g++"
CGO_ENABLED="1"
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"

What did you do?

Hi,

I have noticed a rather interesting phenomenon. Before expanding on it,
I would like to apologize in advance for my lack of familiarity with Go,
which is most likely at the root of the problem.

When compiled with Go >= 1.9 and -race, the behaviour of the function
rq.Mult changes. The problem cannot be reproduced with Go 1.7.6 or
1.8.3, with or without -race, or with Go 1.9 without -race. All the
information in this report relates to a x86_64 Arch Linux installation
running go version devel +eca45997df linux/amd64.

The issue can be reproduced by cloning the sntrup4591761 repository
and using the provided keygen.go tool to generate a public/private
key pair with a fixed randomness source:

$ git clone https://github.com/companyzero/sntrup4591761/
$ curl https://ambientworks.net/tmp/r > /tmp/r # 64kB worth of random bytes
$ go build examples/keygen/keygen.go
$ ./keygen /tmp/r /tmp/x0 /tmp/y0
$ sha256sum /tmp/[xy]0
a1f9db8f41d4a87b5464414dc6042e55a4803d9345c247317c959278a79c4e50 /tmp/x0
fba5a885bb31fd7845a893e48cce95e267c31d068a14de6886c1843d2480c4b5 /tmp/y0
$ go build -race examples/keygen/keygen.go
$ ./keygen /tmp/r /tmp/x1 /tmp/y1
$ sha256sum /tmp/[xy]1
699680a059bdf65ac98fff6a6dce62583297e2db56360a9d5d5bf8a730db25ea /tmp/x1
049758eb6ff0c63d0e4018787d17540ee56dbebe363d54c5218990ac01dd0e0c /tmp/y1

I tracked down the problem to the first multiplication in
modq.PlusProduct, which happens with A=0, B=-235, and C=-1. Without
-race, this is the prelude leading to the first multiplication in
rq.Mult:

Dump of assembler code for function github.com/companyzero/sntrup4591761/rq.Mult:
=> 0x48fa10 <+0>:       mov    %fs:0xfffffffffffffff8,%rcx
   0x48fa19 <+9>:       lea    -0xb70(%rsp),%rax
   0x48fa21 <+17>:      cmp    0x10(%rcx),%rax
   0x48fa25 <+21>:      jbe    0x48fcf9 <github.com/companyzero/sntrup4591761/rq.Mult+745>
   0x48fa2b <+27>:      sub    $0xbf0,%rsp
   0x48fa32 <+34>:      mov    %rbp,0xbe8(%rsp)
   0x48fa3a <+42>:      lea    0xbe8(%rsp),%rbp
   0x48fa42 <+50>:      movq   $0x0,0x6(%rsp)
   0x48fa4b <+59>:      lea    0x8(%rsp),%rdi
   0x48fa50 <+64>:      mov    $0x17c,%ecx
   0x48fa55 <+69>:      xor    %eax,%eax
   0x48fa57 <+71>:      rep stos %rax,%es:(%rdi)
   0x48fa5a <+74>:      mov    0xc00(%rsp),%rdx
   0x48fa62 <+82>:      mov    0xc08(%rsp),%rbx
   0x48fa6a <+90>:      xor    %eax,%eax
   0x48fa6c <+92>:      jmpq   0x48fb03 <github.com/companyzero/sntrup4591761/rq.Mult+243>
   0x48fa71 <+97>:      lea    0x1(%rcx),%r8
   0x48fa75 <+101>:     movzwl (%rdx,%rcx,2),%r9d
   0x48fa7a <+106>:     movswq %si,%r10
   0x48fa7e <+110>:     movswq %r9w,%r9
   0x48fa82 <+114>:     imul   %r9d,%eax

The first multiplication goes:

   0x48fa75 <+101>:     movzwl (%rdx,%rcx,2),%r9d
   0x48fa7a <+106>:     movswq %si,%r10
   0x48fa7e <+110>:     movswq %r9w,%r9
   0x48fa82 <+114>:     imul   %r9d,%eax

At 0x48fa82, r9d and eax contain:

(gdb) i r r9d
r9d            0xffffff15       -235
(gdb) i r eax
eax            0xffffffff       -1
(gdb) si
(gdb) i r eax
eax            0xeb     235

With -race, the corresponding rq.Mult text reads:

Dump of assembler code for function github.com/companyzero/sntrup4591761/rq.Mult:
=> 0x4dfa60 <+0>:       mov    %fs:0xfffffffffffffff8,%rcx
   0x4dfa69 <+9>:       lea    -0xbd8(%rsp),%rax
   0x4dfa71 <+17>:      cmp    0x10(%rcx),%rax
   0x4dfa75 <+21>:      jbe    0x4dff6a <github.com/companyzero/sntrup4591761/rq.Mult+1290>
   0x4dfa7b <+27>:      sub    $0xc58,%rsp
   0x4dfa82 <+34>:      mov    %rbp,0xc50(%rsp)
   0x4dfa8a <+42>:      lea    0xc50(%rsp),%rbp
   0x4dfa92 <+50>:      mov    0xc58(%rsp),%rax
   0x4dfa9a <+58>:      mov    %rax,(%rsp)
   0x4dfa9e <+62>:      callq  0x47c8d0 <runtime.racefuncenter>
   0x4dfaa3 <+67>:      movq   $0x0,0x4e(%rsp)
   0x4dfaac <+76>:      lea    0x50(%rsp),%rdi
   0x4dfab1 <+81>:      mov    $0x17c,%ecx
   0x4dfab6 <+86>:      xor    %eax,%eax
   0x4dfab8 <+88>:      rep stos %rax,%es:(%rdi)
   0x4dfabb <+91>:      xor    %eax,%eax
   0x4dfabd <+93>:      jmpq   0x4dfbce <github.com/companyzero/sntrup4591761/rq.Mult+366>
   0x4dfac2 <+98>:      mov    %cx,0x12(%rsp)
   0x4dfac7 <+103>:     mov    0xc68(%rsp),%rax
   0x4dfacf <+111>:     lea    (%rax,%rdx,2),%rcx
   0x4dfad3 <+115>:     mov    %rcx,(%rsp)
   0x4dfad7 <+119>:     callq  0x47c770 <runtime.raceread>
   0x4dfadc <+124>:     mov    0xc68(%rsp),%rax
   0x4dfae4 <+132>:     test   %al,(%rax)
   0x4dfae6 <+134>:     mov    0x18(%rsp),%rcx
   0x4dfaeb <+139>:     lea    0x1(%rcx),%rdx
   0x4dfaef <+143>:     movzwl (%rax,%rcx,2),%ecx
   0x4dfaf3 <+147>:     movzwl 0x10(%rsp),%ebx
   0x4dfaf8 <+152>:     movswq %bx,%rbx
   0x4dfafc <+156>:     movswq %cx,%rcx
   0x4dfb00 <+160>:     movzwl 0x12(%rsp),%esi
   0x4dfb05 <+165>:     imul   %ecx,%esi

Following the execution flow from 0x4dfac2:

(gdb) display/i $pc
1: x/i $pc
=> 0x4dfac2 <github.com/companyzero/sntrup4591761/rq.Mult+98>:  mov    %cx,0x12(%rsp)
(gdb) i r cx
cx             0xffff   -1
(gdb) x/x $rsp+0x12
0xc4200abcaa:   0x00000000
(gdb) si
1: x/i $pc
=> 0x4dfac7 <github.com/companyzero/sntrup4591761/rq.Mult+103>: mov    0xc68(%rsp),%rax
(gdb) x/x $rsp+0x12
0xc4200abcaa:   0x0000ffff
(gdb) b *0x4dfb00
(gdb) c
1: x/i $pc
=> 0x4dfb00 <github.com/companyzero/sntrup4591761/rq.Mult+160>: movzwl 0x12(%rsp),%esi
(gdb) i r ecx
ecx            0xffffff15       -235
(gdb)  x/x $rsp+0x12
0xc4200abcaa:   0x0000ffff
(gdb) si
1: x/i $pc
=> 0x4dfb05 <github.com/companyzero/sntrup4591761/rq.Mult+165>: imul   %ecx,%esi
(gdb) i r ecx
ecx            0xffffff15       -235
(gdb) i r esi
esi            0xffff   65535
(gdb) si
1: x/i $pc
=> 0x4dfb08 <github.com/companyzero/sntrup4591761/rq.Mult+168>: lea    (%rsi,%rbx,1),%ecx
(gdb) i r esi
esi            0xff1500eb       -15400725

At a first moment, cx holds -1 and is placed on the stack as a 16-bit
value. After the call to runtime.raceread, cx's previous contents are
retrieved from the stack, expanded from 16 to 32 bits, and placed into
esi. This expansion does not preserve the value's signedness, and -1
becomes 65535, which causes the subsequent multiplication to result in
-15400725 instead of 235.

I gave Go's source code a quick look, and nailed down the generation of
that particular movzwl instruction to loadByType. If I adjust
loadByType with the diff below, then go -race emits movswl instead of
movzwl, and keygen compiled with -race produces the expected output.

diff --git src/cmd/compile/internal/amd64/ssa.go src/cmd/compile/internal/amd64/ssa.go
index 8c92f07320..3399de1c46 100644
--- src/cmd/compile/internal/amd64/ssa.go
+++ src/cmd/compile/internal/amd64/ssa.go
@@ -45,7 +45,11 @@ func loadByType(t *types.Type) obj.As {
                if t.Size() == 1 {
                        return x86.AMOVBLZX
                } else {
-                       return x86.AMOVWLZX
+                       if t.IsSigned() {
+                               return x86.AMOVWLSX
+                       } else {
+                               return x86.AMOVWLZX
+                       }
                }
        }
        // Otherwise, there's no difference between load and store opcodes.

Please note that I am not suggesting the diff above as a fix, but just
as an additional data point. I am not sure whether the problem is in my
code (could it be relying on undefined behaviour?), or elsewhere. Any
input on this would be very much appreciated. Thank you for your time
and attention.

-p.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions