Skip to content

cmd/compile,unicode/utf8: utf8.EncodeRune has different performance from the equivalent string conversion #48684

Open
@bcmills

Description

@bcmills

What version of Go are you using (go version)?

$ go version
go version devel go1.18-435718edd Tue Sep 28 23:59:17 2021 +0000 linux/amd64

Does this issue reproduce with the latest release?

Yes.

What operating system and processor architecture are you using (go env)?

go env Output
$ go env
GO111MODULE=""
GOARCH="amd64"
GOBIN=""
GOCACHE="/usr/local/google/home/bcmills/.cache/go-build"
GOENV="/usr/local/google/home/bcmills/.config/go/env"
GOEXE=""
GOEXPERIMENT=""
GOFLAGS=""
GOHOSTARCH="amd64"
GOHOSTOS="linux"
GOINSECURE=""
GOMODCACHE="/tmp/tmp.ysR1YLE79p/.gopath/pkg/mod"
GONOPROXY=""
GONOSUMDB=""
GOOS="linux"
GOPATH="/tmp/tmp.ysR1YLE79p/.gopath"
GOPRIVATE=""
GOPROXY="https://proxy.golang.org,direct"
GOROOT="/usr/local/google/home/bcmills/sdk/gotip"
GOSUMDB="sum.golang.org"
GOTMPDIR=""
GOTOOLDIR="/usr/local/google/home/bcmills/sdk/gotip/pkg/tool/linux_amd64"
GOVCS=""
GOVERSION="devel go1.18-435718edd Tue Sep 28 23:59:17 2021 +0000"
GCCGO="/usr/local/google/home/bcmills/bin/gccgo"
GOAMD64="v1"
AR="ar"
CC="gcc"
CXX="c++"
CGO_ENABLED="1"
GOMOD="/tmp/tmp.ysR1YLE79p/go.mod"
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -m64 -pthread -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build4264475613=/tmp/go-build -gno-record-gcc-switches"

What did you do?

package main

import (
	"runtime"
	"testing"
	"unicode/utf8"
)

func BenchmarkEncodeRune(b *testing.B) {
	b.ReportAllocs()

	const r = '💩'
	for n := b.N; n > 0; n-- {
		b := [utf8.UTFMax]byte{}
		_ = utf8.EncodeRune(b[:], r)
		runtime.KeepAlive(b)
	}
}

func BenchmarkByteConvert(b *testing.B) {
	b.ReportAllocs()

	const r = '💩'
	for n := b.N; n > 0; n-- {
		b := [utf8.UTFMax]byte{}
		_ = copy(b[:], string(r))
		runtime.KeepAlive(b)
	}
}

What did you expect to see?

utf8.EncodeRune performance exactly equal to the code using copy and a string conversion, since they are (as far as I can tell) semantically equivalent (compare #3939).

I wrote a fuzz test to demonstrate equivalence, and found no counterexamples after 5 minutes of fuzzing:

func FuzzEncodeRune(f *testing.F) {
	f.Fuzz(func(t *testing.T, r rune) {
		b1 := [utf8.UTFMax]byte{}
		n1 := utf8.EncodeRune(b1[:], r)
		t.Logf("EncodeRune(_, %c) = %d; wrote %q", r, n1, b1[:n1])

		b2 := [utf8.UTFMax]byte{}
		n2 := copy(b2[:], string(r))
		t.Logf("copy(_, string(%c)) = %d; wrote %q", r, n2, b2[:n2])

		if n2 != n1 || !bytes.Equal(b2[:], b1[:]) {
			t.FailNow()
		}
	})
}

What did you see instead?

utf8.EncodeRune takes something like 9x longer than the string-and-copy equivalent:

$ go test -bench=. .
goos: linux
goarch: amd64
pkg: example
cpu: Intel(R) Xeon(R) CPU E5-1650 v4 @ 3.60GHz
BenchmarkEncodeRune-12          157204260                7.206 ns/op           0 B/op          0 allocs/op
BenchmarkByteConvert-12         1000000000               0.7992 ns/op          0 B/op          0 allocs/op
PASS
ok      example 2.704s

Metadata

Metadata

Assignees

No one assigned

    Labels

    NeedsInvestigationSomeone must examine and confirm this is a valid issue and not a duplicate of an existing one.Performancecompiler/runtimeIssues related to the Go compiler and/or runtime.

    Type

    No type

    Projects

    Status

    Triage Backlog

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions