Description
What version of Go are you using (go version
)?
$ go version go version go1.14.4 darwin/amd64 Also tested on Go docker containers: 1.2, 1.10
Does this issue reproduce with the latest release?
Yes
What operating system and processor architecture are you using (go env
)?
go env
Output
$ go env GO111MODULE="" GOARCH="amd64" GOBIN="" GOCACHE="/Users/adrian/Library/Caches/go-build" GOENV="/Users/adrian/Library/Application Support/go/env" GOEXE="" GOFLAGS="" GOHOSTARCH="amd64" GOHOSTOS="darwin" GOINSECURE="" GONOPROXY="" GONOSUMDB="" GOOS="darwin" GOPATH="/Users/adrian/go" GOPRIVATE="" GOPROXY="https://proxy.golang.org,direct" GOROOT="/nix/store/6f0x9mycyrcc3gsfm36bci0b4ls1q8zc-go-1.14.4/share/go" GOSUMDB="sum.golang.org" GOTMPDIR="" GOTOOLDIR="/nix/store/6f0x9mycyrcc3gsfm36bci0b4ls1q8zc-go-1.14.4/share/go/pkg/tool/darwin_amd64" GCCGO="gccgo" AR="ar" CC="clang" CXX="clang++" CGO_ENABLED="1" GOMOD="" CGO_CFLAGS="-g -O2" CGO_CPPFLAGS="" CGO_CXXFLAGS="-g -O2" CGO_FFLAGS="-g -O2" CGO_LDFLAGS="-g -O2" PKG_CONFIG="pkg-config" GOGCCFLAGS="-fPIC -m64 -pthread -fno-caret-diagnostics -Qunused-arguments -fmessage-length=0 -fdebug-prefix-map=/var/folders/zz/yrqp3xx513sf6drry24cw8r40000gp/T/go-build152779906=/tmp/go-build -gno-record-gcc-switches -fno-common"
What did you do?
https://play.golang.org/p/WYmtk1Gi-T_4
https://play.golang.org/p/XAdw3-cgx0z
package main
import (
"fmt"
"regexp"
)
func main() {
re := regexp.MustCompile(`a(.)c`)
fmt.Println(re.ReplaceAllString("abc", "c$1a"))
// expected "cba", got "c"
}
What did you expect to see?
Expected that the first capture submatch (.)
resulted in the character b
, so the result should have been cba
, according to the docs at https://golang.org/pkg/regexp/#Regexp.ReplaceAllString
ReplaceAllString returns a copy of src, replacing matches of the Regexp with the replacement string repl. Inside repl, $ signs are interpreted as in Expand, so for instance $1 represents the text of the first submatch.
What did you see instead?
I just got the character c
, I expected to get cba
, because the replacement had been interpreted differently.
When I dug into it to raise this report, I found this in the documentation on the Expand
function:
In the $name form, name is taken to be as long as possible:
$1x is equivalent to $ {1x}, not ${1}x, and,$10 is equivalent to $ {10}, not ${1}0.
I didn't realise I was using the name form, I thought I was using the number form. This made it clearer that the regexp group was being interpreted as ${1a}
, not $1
, followed by a
.
That means, I would need to use "c${1}a" as my replacement, which is quite unusual because I haven't seen another programming language that works this way, e.g. Node.js will return cba
.
"abc".replace(/a(.)c/, "c$1a");
sed
also works as I expected (although it uses \1
as the reference instead of $1
).
echo "abc" | sed -E 's/a(.)c/c\1a/g'
It was surprising behaviour to me, and I spent a while debugging the program I was working on before I realised there was something else going on.
Ideally, the Go implementation would change to match other programming languages and text editors, i.e. after a dollar, if the first character is [0-9]
, then it's a numeric reference up until the next non-numeric character, e.g.
$1 = $1
$1a = $1, followed by a
${1a} = ${1a}
$10 = $10
$1_ = $1, followed by _
a$1c = a, $1, c
etc.
However, if that's not possible, then a documentation change might help. $1
is brittle, it only works if your reference to the capture group is the whole replacement ($1
), or is followed by a space character ($1 a
), at the end of the replacement (a$1
) etc. So the documentation should use the long form by default.