Skip to content

crypto/rsa: linux/arm64 Go 1.9 performance is +10X slower than OpenSSL #22807

Open
Listed in
@williamweixiao

Description

@williamweixiao

Please answer these questions before submitting your issue. Thanks!

What version of Go are you using (go version)?

go version go1.9.2 linux/arm64

Does this issue reproduce with the latest release?

yes

What operating system and processor architecture are you using (go env)?

GOARCH="arm64"
GOBIN=""
GOEXE=""
GOHOSTARCH="arm64"
GOHOSTOS="linux"
GOOS="linux"
GOPATH=""
GORACE=""
GOROOT="/usr/lib/go-1.6"
GOTOOLDIR="/usr/lib/go-1.6/pkg/tool/linux_arm64"
GO15VENDOREXPERIMENT="1"
CC="gcc"
GOGCCFLAGS="-fPIC -pthread -fmessage-length=0"
CXX="g++"
CGO_ENABLED="1"

What did you do?

go test crypto/rsa -bench .

What did you expect to see?

Performance can be on par with OpenSSL (https://blog.cloudflare.com/content/images/2017/11/pub_key_1_core-2.png)

What did you see instead?

+10X slower than OpenSSL (https://blog.cloudflare.com/content/images/2017/11/go_pub_key_1_core.png)

Activity

added this to the Unplanned milestone on Nov 21, 2017
vielmetti

vielmetti commented on Jun 26, 2018

@vielmetti

Go 1.11beta1 is substantially faster than Go 1.10.2 on this test, on Cavium ThunderX / Packet c1.large.arm ("Type 2A").

ed@ed-2a-bcc-llvm:~$ go version
go version go1.10.2 linux/arm64
ed@ed-2a-bcc-llvm:~$ go test crypto/rsa -bench .
goos: linux
goarch: arm64
pkg: crypto/rsa
BenchmarkRSA2048Decrypt-96                    20          74651551 ns/op
BenchmarkRSA2048Sign-96                       20          77650290 ns/op
Benchmark3PrimeRSA2048Decrypt-96              50          35958813 ns/op
PASS
ok      crypto/rsa      8.809s
ed@ed-2a-bcc-llvm:~$ ~/go/bin/go1.11beta1 test crypto/rsa -bench .
goos: linux
goarch: arm64
pkg: crypto/rsa
BenchmarkRSA2048Decrypt-96                   100          11466566 ns/op
BenchmarkRSA2048Sign-96                      100          11855513 ns/op
Benchmark3PrimeRSA2048Decrypt-96             200           7684199 ns/op
PASS
ok      crypto/rsa      6.584s
bobby-stripe

bobby-stripe commented on Mar 6, 2023

@bobby-stripe

some updated numbers on a 3rd generation AWS Graviton (c7g) host:

$ go version
go version devel go1.21-b94dc384ca Sat Mar 4 00:00:01 2023 +0000 linux/arm64
$ go test crypto/rsa -bench .
goos: linux
goarch: arm64
pkg: crypto/rsa
BenchmarkDecryptPKCS1v15/2048-32         	     597	   2000184 ns/op
BenchmarkDecryptPKCS1v15/3072-32         	     200	   5976582 ns/op
BenchmarkDecryptPKCS1v15/4096-32         	      88	  13397414 ns/op
BenchmarkEncryptPKCS1v15/2048-32         	    6457	    185655 ns/op
BenchmarkDecryptOAEP/2048-32             	     603	   1990501 ns/op
BenchmarkEncryptOAEP/2048-32             	    6457	    185121 ns/op
BenchmarkSignPKCS1v15/2048-32            	     583	   2048907 ns/op
BenchmarkVerifyPKCS1v15/2048-32          	    6528	    183649 ns/op
BenchmarkSignPSS/2048-32                 	     583	   2052886 ns/op
BenchmarkVerifyPSS/2048-32               	    6442	    185743 ns/op
PASS
ok  	crypto/rsa	14.990s
$ cat /proc/cpuinfo | head -n 9
processor	: 0
BogoMIPS	: 2100.00
Features	: fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm ssbs paca pacg dcpodp svei8mm svebf16 i8mm bf16 dgh rng
CPU implementer	: 0x41
CPU architecture: 8
CPU variant	: 0x1
CPU part	: 0xd40
CPU revision	: 1

and on an M1 Max:

$ go version
go version devel go1.21-b94dc384ca Sat Mar 4 00:00:01 2023 +0000 darwin/arm64
$ go test crypto/rsa -bench . -cpu 1
goos: darwin
goarch: arm64
pkg: crypto/rsa
BenchmarkDecryptPKCS1v15/2048         	    1040	   1217645 ns/op
BenchmarkDecryptPKCS1v15/3072         	     303	   3562839 ns/op
BenchmarkDecryptPKCS1v15/4096         	     148	   8073468 ns/op
BenchmarkEncryptPKCS1v15/2048         	    8928	    130840 ns/op
BenchmarkDecryptOAEP/2048             	    1023	   1146886 ns/op
BenchmarkEncryptOAEP/2048             	    8979	    131854 ns/op
BenchmarkSignPKCS1v15/2048            	     994	   1194395 ns/op
BenchmarkVerifyPKCS1v15/2048          	    9250	    131157 ns/op
BenchmarkSignPSS/2048                 	     997	   1199584 ns/op
BenchmarkVerifyPSS/2048               	    9013	    131653 ns/op
PASS
ok  	crypto/rsa	15.288s

AWS c6i.8xlarge (Intel(R) Xeon(R) Platinum 8375C CPU @ 2.90GHz) compared to c7g.8xlarge`:

name                     old time/op  new time/op  delta
DecryptPKCS1v15/2048-32  1.52ms ± 0%  2.00ms ± 0%  +31.41%  (p=0.008 n=5+5)
DecryptPKCS1v15/3072-32  4.56ms ± 1%  5.98ms ± 0%  +31.06%  (p=0.008 n=5+5)
DecryptPKCS1v15/4096-32  10.2ms ± 0%  13.4ms ± 0%  +31.67%  (p=0.008 n=5+5)
EncryptPKCS1v15/2048-32   180µs ± 0%   185µs ± 0%   +3.09%  (p=0.008 n=5+5)
DecryptOAEP/2048-32      1.54ms ± 0%  1.99ms ± 0%  +28.88%  (p=0.008 n=5+5)
EncryptOAEP/2048-32       183µs ± 1%   185µs ± 0%   +1.29%  (p=0.008 n=5+5)
SignPKCS1v15/2048-32     1.58ms ± 0%  2.05ms ± 0%  +29.66%  (p=0.008 n=5+5)
VerifyPKCS1v15/2048-32    179µs ± 1%   184µs ± 0%   +2.56%  (p=0.008 n=5+5)
SignPSS/2048-32          1.59ms ± 1%  2.05ms ± 0%  +29.24%  (p=0.008 n=5+5)
VerifyPSS/2048-32         182µs ± 1%   186µs ± 0%   +2.06%  (p=0.008 n=5+5)

This is actually slightly better than the Ubuntu Focal OpenSSL 1.1.1f performance difference (Graviton 37% slower than Intel for same host types), although it looks like 2048-bit RSA is 2x as fast in OpenSSL (compared to Go benchmarks above) as reported by openssl speed rsa2048 on the c7g Graviton 3 hosts:

$ openssl speed rsa2048
Doing 2048 bits private rsa's for 10s: 10322 2048 bits private RSA's in 10.00s
Doing 2048 bits public rsa's for 10s: 419431 2048 bits public RSA's in 9.98s
OpenSSL 1.1.1f  31 Mar 2020
built on: Mon Feb  6 17:57:17 2023 UTC
options:bn(64,64) rc4(char) des(int) aes(partial) blowfish(ptr)
compiler: gcc -fPIC -pthread -Wa,--noexecstack -Wall -Wa,--noexecstack -g -O2 -fdebug-prefix-map=/build/openssl-0kQqA1/openssl-1.1.1f=. -fstack-protector-strong -Wformat -Werror=format-security -DOPENSSL_TLS_SECURITY_LEVEL=2 -DOPENSSL_USE_NODELETE -DOPENSSL_PIC -DOPENSSL_CPUID_OBJ -DOPENSSL_BN_ASM_MONT -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DKECCAK1600_ASM -DVPAES_ASM -DECP_NISTZ256_ASM -DPOLY1305_ASM -DNDEBUG -Wdate-time -D_FORTIFY_SOURCE=2
                  sign    verify    sign/s verify/s
rsa 2048 bits 0.000969s 0.000024s   1032.2  42027.2

OK finally Go vs. GOEXPERIMENT=boringcrypto on an AWS c7g/3rd generation Graviton:

name                     old time/op  new time/op  delta
DecryptPKCS1v15/2048-32  2.00ms ± 0%  0.91ms ± 0%  -54.59%  (p=0.008 n=5+5)
DecryptPKCS1v15/3072-32  5.98ms ± 0%  2.71ms ± 0%  -54.62%  (p=0.008 n=5+5)
DecryptPKCS1v15/4096-32  13.4ms ± 0%   6.1ms ± 0%  -54.84%  (p=0.008 n=5+5)
EncryptPKCS1v15/2048-32   185µs ± 0%     8µs ± 0%  -95.80%  (p=0.008 n=5+5)
DecryptOAEP/2048-32      1.99ms ± 0%  0.91ms ± 0%  -54.16%  (p=0.008 n=5+5)
EncryptOAEP/2048-32       185µs ± 0%    12µs ± 0%  -93.47%  (p=0.008 n=5+5)
SignPKCS1v15/2048-32     2.05ms ± 0%  0.91ms ± 0%  -55.72%  (p=0.008 n=5+5)
VerifyPKCS1v15/2048-32    184µs ± 0%     7µs ± 0%  -96.45%  (p=0.008 n=5+5)
SignPSS/2048-32          2.05ms ± 0%  0.91ms ± 0%  -55.67%  (p=0.008 n=5+5)
VerifyPSS/2048-32         186µs ± 0%     7µs ± 0%  -96.16%  (p=0.008 n=5+5)

(with those boringcrypto sign numbers roughly matching up with the rsa2048 perf reported by OpenSSL above).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

No one assigned

    Labels

    NeedsFixThe path to resolution is known, but the work has not been done.Performancehelp wanted

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

      Development

      No branches or pull requests

        Participants

        @titanous@vielmetti@williamweixiao@bobby-stripe

        Issue actions

          crypto/rsa: linux/arm64 Go 1.9 performance is +10X slower than OpenSSL · Issue #22807 · golang/go