Skip to content

regexp: Expand rules are confusing #40329

Open
@a-h

Description

@a-h

What version of Go are you using (go version)?

$ go version
go version go1.14.4 darwin/amd64

Also tested on Go docker containers: 1.2, 1.10

Does this issue reproduce with the latest release?

Yes

What operating system and processor architecture are you using (go env)?

go env Output
$ go env
GO111MODULE=""
GOARCH="amd64"
GOBIN=""
GOCACHE="/Users/adrian/Library/Caches/go-build"
GOENV="/Users/adrian/Library/Application Support/go/env"
GOEXE=""
GOFLAGS=""
GOHOSTARCH="amd64"
GOHOSTOS="darwin"
GOINSECURE=""
GONOPROXY=""
GONOSUMDB=""
GOOS="darwin"
GOPATH="/Users/adrian/go"
GOPRIVATE=""
GOPROXY="https://proxy.golang.org,direct"
GOROOT="/nix/store/6f0x9mycyrcc3gsfm36bci0b4ls1q8zc-go-1.14.4/share/go"
GOSUMDB="sum.golang.org"
GOTMPDIR=""
GOTOOLDIR="/nix/store/6f0x9mycyrcc3gsfm36bci0b4ls1q8zc-go-1.14.4/share/go/pkg/tool/darwin_amd64"
GCCGO="gccgo"
AR="ar"
CC="clang"
CXX="clang++"
CGO_ENABLED="1"
GOMOD=""
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -m64 -pthread -fno-caret-diagnostics -Qunused-arguments -fmessage-length=0 -fdebug-prefix-map=/var/folders/zz/yrqp3xx513sf6drry24cw8r40000gp/T/go-build152779906=/tmp/go-build -gno-record-gcc-switches -fno-common"

What did you do?

https://play.golang.org/p/WYmtk1Gi-T_4
https://play.golang.org/p/XAdw3-cgx0z

package main

import (
	"fmt"
	"regexp"
)

func main() {
	re := regexp.MustCompile(`a(.)c`)
	fmt.Println(re.ReplaceAllString("abc", "c$1a"))
	// expected "cba", got "c"
}

What did you expect to see?

Expected that the first capture submatch (.) resulted in the character b, so the result should have been cba, according to the docs at https://golang.org/pkg/regexp/#Regexp.ReplaceAllString

ReplaceAllString returns a copy of src, replacing matches of the Regexp with the replacement string repl. Inside repl, $ signs are interpreted as in Expand, so for instance $1 represents the text of the first submatch.

What did you see instead?

I just got the character c, I expected to get cba, because the replacement had been interpreted differently.

When I dug into it to raise this report, I found this in the documentation on the Expand function:

In the $name form, name is taken to be as long as possible: $1x is equivalent to ${1x}, not ${1}x, and, $10 is equivalent to ${10}, not ${1}0.

I didn't realise I was using the name form, I thought I was using the number form. This made it clearer that the regexp group was being interpreted as ${1a}, not $1, followed by a.

That means, I would need to use "c${1}a" as my replacement, which is quite unusual because I haven't seen another programming language that works this way, e.g. Node.js will return cba.

"abc".replace(/a(.)c/, "c$1a");

sed also works as I expected (although it uses \1 as the reference instead of $1).

echo "abc" | sed -E 's/a(.)c/c\1a/g'

It was surprising behaviour to me, and I spent a while debugging the program I was working on before I realised there was something else going on.

Ideally, the Go implementation would change to match other programming languages and text editors, i.e. after a dollar, if the first character is [0-9], then it's a numeric reference up until the next non-numeric character, e.g.

$1 = $1
$1a = $1, followed by a
${1a} = ${1a}
$10 = $10
$1_ = $1, followed by _
a$1c = a, $1, c
etc.

However, if that's not possible, then a documentation change might help. $1 is brittle, it only works if your reference to the capture group is the whole replacement ($1), or is followed by a space character ($1 a), at the end of the replacement (a$1) etc. So the documentation should use the long form by default.

Metadata

Metadata

Assignees

No one assigned

    Labels

    DocumentationIssues describing a change to documentation.NeedsFixThe path to resolution is known, but the work has not been done.help wanted

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions