Description
What version of Go are you using (go version
)?
go version devel +eca45997df Thu Sep 21 03:00:51 2017 +0000 linux/amd64
Does this issue reproduce with the latest release?
Yes; in fact it does not happen with earlier releases.
What operating system and processor architecture are you using (go env
)?
GOARCH="amd64"
GOBIN=""
GOEXE=""
GOHOSTARCH="amd64"
GOHOSTOS="linux"
GOOS="linux"
GOPATH="/home/pedro/go"
GORACE=""
GOROOT="/home/pedro/go-current"
GOTOOLDIR="/home/pedro/go-current/pkg/tool/linux_amd64"
GCCGO="gccgo"
CC="gcc"
GOGCCFLAGS="-fPIC -m64 -pthread -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build492614889=/tmp/go-build -gno-record-gcc-switches"
CXX="g++"
CGO_ENABLED="1"
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
What did you do?
Hi,
I have noticed a rather interesting phenomenon. Before expanding on it,
I would like to apologize in advance for my lack of familiarity with Go,
which is most likely at the root of the problem.
When compiled with Go >= 1.9 and -race, the behaviour of the function
rq.Mult changes. The problem cannot be reproduced with Go 1.7.6 or
1.8.3, with or without -race, or with Go 1.9 without -race. All the
information in this report relates to a x86_64 Arch Linux installation
running go version devel +eca45997df linux/amd64.
The issue can be reproduced by cloning the sntrup4591761 repository
and using the provided keygen.go tool to generate a public/private
key pair with a fixed randomness source:
$ git clone https://github.com/companyzero/sntrup4591761/
$ curl https://ambientworks.net/tmp/r > /tmp/r # 64kB worth of random bytes
$ go build examples/keygen/keygen.go
$ ./keygen /tmp/r /tmp/x0 /tmp/y0
$ sha256sum /tmp/[xy]0
a1f9db8f41d4a87b5464414dc6042e55a4803d9345c247317c959278a79c4e50 /tmp/x0
fba5a885bb31fd7845a893e48cce95e267c31d068a14de6886c1843d2480c4b5 /tmp/y0
$ go build -race examples/keygen/keygen.go
$ ./keygen /tmp/r /tmp/x1 /tmp/y1
$ sha256sum /tmp/[xy]1
699680a059bdf65ac98fff6a6dce62583297e2db56360a9d5d5bf8a730db25ea /tmp/x1
049758eb6ff0c63d0e4018787d17540ee56dbebe363d54c5218990ac01dd0e0c /tmp/y1
I tracked down the problem to the first multiplication in
modq.PlusProduct, which happens with A=0, B=-235, and C=-1. Without
-race, this is the prelude leading to the first multiplication in
rq.Mult:
Dump of assembler code for function github.com/companyzero/sntrup4591761/rq.Mult:
=> 0x48fa10 <+0>: mov %fs:0xfffffffffffffff8,%rcx
0x48fa19 <+9>: lea -0xb70(%rsp),%rax
0x48fa21 <+17>: cmp 0x10(%rcx),%rax
0x48fa25 <+21>: jbe 0x48fcf9 <github.com/companyzero/sntrup4591761/rq.Mult+745>
0x48fa2b <+27>: sub $0xbf0,%rsp
0x48fa32 <+34>: mov %rbp,0xbe8(%rsp)
0x48fa3a <+42>: lea 0xbe8(%rsp),%rbp
0x48fa42 <+50>: movq $0x0,0x6(%rsp)
0x48fa4b <+59>: lea 0x8(%rsp),%rdi
0x48fa50 <+64>: mov $0x17c,%ecx
0x48fa55 <+69>: xor %eax,%eax
0x48fa57 <+71>: rep stos %rax,%es:(%rdi)
0x48fa5a <+74>: mov 0xc00(%rsp),%rdx
0x48fa62 <+82>: mov 0xc08(%rsp),%rbx
0x48fa6a <+90>: xor %eax,%eax
0x48fa6c <+92>: jmpq 0x48fb03 <github.com/companyzero/sntrup4591761/rq.Mult+243>
0x48fa71 <+97>: lea 0x1(%rcx),%r8
0x48fa75 <+101>: movzwl (%rdx,%rcx,2),%r9d
0x48fa7a <+106>: movswq %si,%r10
0x48fa7e <+110>: movswq %r9w,%r9
0x48fa82 <+114>: imul %r9d,%eax
The first multiplication goes:
0x48fa75 <+101>: movzwl (%rdx,%rcx,2),%r9d
0x48fa7a <+106>: movswq %si,%r10
0x48fa7e <+110>: movswq %r9w,%r9
0x48fa82 <+114>: imul %r9d,%eax
At 0x48fa82, r9d and eax contain:
(gdb) i r r9d
r9d 0xffffff15 -235
(gdb) i r eax
eax 0xffffffff -1
(gdb) si
(gdb) i r eax
eax 0xeb 235
With -race, the corresponding rq.Mult text reads:
Dump of assembler code for function github.com/companyzero/sntrup4591761/rq.Mult:
=> 0x4dfa60 <+0>: mov %fs:0xfffffffffffffff8,%rcx
0x4dfa69 <+9>: lea -0xbd8(%rsp),%rax
0x4dfa71 <+17>: cmp 0x10(%rcx),%rax
0x4dfa75 <+21>: jbe 0x4dff6a <github.com/companyzero/sntrup4591761/rq.Mult+1290>
0x4dfa7b <+27>: sub $0xc58,%rsp
0x4dfa82 <+34>: mov %rbp,0xc50(%rsp)
0x4dfa8a <+42>: lea 0xc50(%rsp),%rbp
0x4dfa92 <+50>: mov 0xc58(%rsp),%rax
0x4dfa9a <+58>: mov %rax,(%rsp)
0x4dfa9e <+62>: callq 0x47c8d0 <runtime.racefuncenter>
0x4dfaa3 <+67>: movq $0x0,0x4e(%rsp)
0x4dfaac <+76>: lea 0x50(%rsp),%rdi
0x4dfab1 <+81>: mov $0x17c,%ecx
0x4dfab6 <+86>: xor %eax,%eax
0x4dfab8 <+88>: rep stos %rax,%es:(%rdi)
0x4dfabb <+91>: xor %eax,%eax
0x4dfabd <+93>: jmpq 0x4dfbce <github.com/companyzero/sntrup4591761/rq.Mult+366>
0x4dfac2 <+98>: mov %cx,0x12(%rsp)
0x4dfac7 <+103>: mov 0xc68(%rsp),%rax
0x4dfacf <+111>: lea (%rax,%rdx,2),%rcx
0x4dfad3 <+115>: mov %rcx,(%rsp)
0x4dfad7 <+119>: callq 0x47c770 <runtime.raceread>
0x4dfadc <+124>: mov 0xc68(%rsp),%rax
0x4dfae4 <+132>: test %al,(%rax)
0x4dfae6 <+134>: mov 0x18(%rsp),%rcx
0x4dfaeb <+139>: lea 0x1(%rcx),%rdx
0x4dfaef <+143>: movzwl (%rax,%rcx,2),%ecx
0x4dfaf3 <+147>: movzwl 0x10(%rsp),%ebx
0x4dfaf8 <+152>: movswq %bx,%rbx
0x4dfafc <+156>: movswq %cx,%rcx
0x4dfb00 <+160>: movzwl 0x12(%rsp),%esi
0x4dfb05 <+165>: imul %ecx,%esi
Following the execution flow from 0x4dfac2:
(gdb) display/i $pc
1: x/i $pc
=> 0x4dfac2 <github.com/companyzero/sntrup4591761/rq.Mult+98>: mov %cx,0x12(%rsp)
(gdb) i r cx
cx 0xffff -1
(gdb) x/x $rsp+0x12
0xc4200abcaa: 0x00000000
(gdb) si
1: x/i $pc
=> 0x4dfac7 <github.com/companyzero/sntrup4591761/rq.Mult+103>: mov 0xc68(%rsp),%rax
(gdb) x/x $rsp+0x12
0xc4200abcaa: 0x0000ffff
(gdb) b *0x4dfb00
(gdb) c
1: x/i $pc
=> 0x4dfb00 <github.com/companyzero/sntrup4591761/rq.Mult+160>: movzwl 0x12(%rsp),%esi
(gdb) i r ecx
ecx 0xffffff15 -235
(gdb) x/x $rsp+0x12
0xc4200abcaa: 0x0000ffff
(gdb) si
1: x/i $pc
=> 0x4dfb05 <github.com/companyzero/sntrup4591761/rq.Mult+165>: imul %ecx,%esi
(gdb) i r ecx
ecx 0xffffff15 -235
(gdb) i r esi
esi 0xffff 65535
(gdb) si
1: x/i $pc
=> 0x4dfb08 <github.com/companyzero/sntrup4591761/rq.Mult+168>: lea (%rsi,%rbx,1),%ecx
(gdb) i r esi
esi 0xff1500eb -15400725
At a first moment, cx holds -1 and is placed on the stack as a 16-bit
value. After the call to runtime.raceread, cx's previous contents are
retrieved from the stack, expanded from 16 to 32 bits, and placed into
esi. This expansion does not preserve the value's signedness, and -1
becomes 65535, which causes the subsequent multiplication to result in
-15400725 instead of 235.
I gave Go's source code a quick look, and nailed down the generation of
that particular movzwl instruction to loadByType. If I adjust
loadByType with the diff below, then go -race emits movswl instead of
movzwl, and keygen compiled with -race produces the expected output.
diff --git src/cmd/compile/internal/amd64/ssa.go src/cmd/compile/internal/amd64/ssa.go
index 8c92f07320..3399de1c46 100644
--- src/cmd/compile/internal/amd64/ssa.go
+++ src/cmd/compile/internal/amd64/ssa.go
@@ -45,7 +45,11 @@ func loadByType(t *types.Type) obj.As {
if t.Size() == 1 {
return x86.AMOVBLZX
} else {
- return x86.AMOVWLZX
+ if t.IsSigned() {
+ return x86.AMOVWLSX
+ } else {
+ return x86.AMOVWLZX
+ }
}
}
// Otherwise, there's no difference between load and store opcodes.
Please note that I am not suggesting the diff above as a fix, but just
as an additional data point. I am not sure whether the problem is in my
code (could it be relying on undefined behaviour?), or elsewhere. Any
input on this would be very much appreciated. Thank you for your time
and attention.
-p.