cmd/go: spuriously stale dependencies #46377

Closed

cmd/go: spuriously stale dependencies#46377

Labels

During compiler development, I periodically take snapshots of my git working tree, copy them to my workstation, and then run a bunch of tests. The script I run on the workstation currently looks like this:

cd $GOROOT/test
git checkout $TRYREV
go install -toolexec=toolstash cmd/compile
go build -a -v -gcflags=-d=inlfuncswithclosures=0 -toolexec='toolstash -cmp' std cmd
go install -v std cmd/cgo cmd/go cmd/link
go install -v -x cmd/cgo cmd/go cmd/link  # why??
go run run.go
go test -short std cmd

I've added the second go install -v -x cmd/cgo cmd/go cmd/link invocation because otherwise tests like cmd/link's TestDWARF fail because cmd/go says cmd/link is stale. I would expect the one go install invocation to be sufficient, but the go install -x output (e.g., https://gist.github.com/mdempsky/bdfc2247df2bba822b833b78309660e9 has the output from my most recent script run) confirms that cmd/cgo, cmd/go, and cmd/link are all being relinked and reinstalled.

The staleness issue bites me quite frequently during my workflow, but I haven't managed to isolate steps that reliably reproduce the failure.

/cc @bcmills

added

Contributor

This is my understanding.

Running go install cmd/link will update the installed link tool if the hash code has changed. The hash code of linking a program includes the hash code of cmd/link itself. So the first install of cmd/link gives you a link tool with a hash code that records the link tool used to build it. When you run go install again, the link tool has changed, so the hash code has changed, so the linker gets rebuilt and reinstalled. This time, it won't actually change, so another go install cmd/link will see that hash code matches and it won't do anything.

This is basically why make.bash builds the tools three times. See this comment from cmd/dist/build.go:

// Toolchain2 should be semantically equivalent to toolchain1,
// but it was built using the new compilers instead of the Go 1.4 compilers,
// so it should at the least run faster. Also, toolchain1 had no build IDs
// in the binaries, while toolchain2 does. In non-release builds, the
// toolchain's build IDs feed into constructing the build IDs of built targets,
// so in non-release builds, everything now looks out-of-date due to
// toolchain2 having build IDs - that is, due to the go command seeing
// that there are new compilers. In release builds, the toolchain's reported
// version is used in place of the build ID, and the go command does not
// see that change from toolchain1 to toolchain2, so in release builds,
// nothing looks out of date.
// To keep the behavior the same in both non-release and release builds,
// we force-install everything here.
//
//	toolchain3 = mk(new toolchain, toolchain2, go_bootstrap)
//

At least, that is my understanding.

thanm

Contributor

It is interesting to get some insight into this problem.

I run the cmd/link TestDWARF tests by hand a good deal (since I spend a fair amount of time making changes to the dwarf code), and it is frustrating to have the the test inform me that the linker is stale right after I just did "go install cmd/link".

Maybe we need a new "go installthreetimes cmd/link" or something like that.

mdempsky

ContributorAuthor

I see. I knew bootstrapping reinstalls multiple times, but I guess I assumed cmd/go would handle the details after that.

If it stays the user's responsibility to run go install to fixpoint and things are working as intended, then that's surprising, but okay.

rsc

Contributor

Re "the user's responsiblity", this is not something that affects ordinary users. It is something that affects only people who install the tools that the go command uses to build things, namely 'cmd'.

The go command should not go out of its way to add special cases for developing the toolchain itself. That is the job of cmd/dist.

Apologies for needing to run 'go install' multiple times to reach a fixed point, but this is a fundamental consequence of the interaction between (1) cmd/link essentially depending on itself and (2) content-hash-based staleness.

mvdan

Member

For this reason I use toolstash restore && go install cmd/xxx if I'm hacking on the toolchain itself. This speeds up the iteration by making sure that I always build with the master version (stored with toolstash save), so I don't need to rebuild half of std and cmd again.

If we agree that this is something we should simply document better, maybe we can add it to #30074.

mdempsky

ContributorAuthor

Re "the user's responsiblity", this is not something that affects ordinary users. It is something that affects only people who install the tools that the go command uses to build things, namely 'cmd'.

Ack, I agree this issue has limited scope of people affected. But as someone who works primarily on cmd/compile, I still consider myself a user of cmd/go. That's all I meant.

That is the job of cmd/dist.

I'm not very familiar with cmd/dist's full functionality. After I make changes to cmd/compile, is there a cmd/dist invocation to bring things back to a fixed point with the minimal amount of redundant work?

I generally avoid running the full make.bash/all.bash script because I like the flexibility of tweaking my test script to reorder tests to prioritize whatever particular corner case I'm actively dealing with. Waiting for the complete triple-bootstrap really slows down my test cycles when I'm trying to figure out why my change broke fixedbugs/issue12345.go.

For this reason I use toolstash restore && go install cmd/xxx if I'm hacking on the toolchain itself.

Yeah, I'm thinking I should use toolstash restore in my script instead of -toolexec=toolstash. The latter is just muscle memory from doing cmd/compile-only hacking.

But does that actually address the fixpoint issue?

mdempsky

ContributorAuthor

Edit: Nevermind, this was likely my fault. I forgot that I'd changed ~/pkg/toolstash/compile to a symlink so I could easily switch between a few stashed compilers. I guess that worked with toolstash save but not with toolstash restore.

~~My build script is currently:~~

toolstash restore
go tool compile -V

go install cmd/compile

go build -a -v -gcflags='-d=inlfuncswithclosures=0' -toolexec='toolstash -cmp' std cmd

# golang.org/issue/46377
go install std cmd
go install std cmd
go install std cmd

go run run.go
go test -short std cmd

But I keep running into a situation where the initial go install cmd/compile command seems to try recompiling cmd/compile with a broken compiler instead of the known-good version that I stashed and restored.

~~For example, right now I have this reliable behavior from cmd/go:~~

$ go install -a -toolexec=toolstash cmd/compile
[succeeds]

$ toolstash restore
$ go install -a cmd/compile
# math
math/trig_reduce.go:36:19: internal compiler error: math/fma.go:96:27: internal compiler error: math/expm1.go:170:14: internal compiler error: 'trigReduce': addr of canSSA expression: 
.   NAME-math.f esc(no) tc(1) Class:PAUTO Offset:0 InlFormal OnStack Used float64 # unsafe.go:23:18,trig_reduce.go:36:19

~~Why is cmd/go giving me different behavior depending on whether I use -toolexec=toolstash or toolstash restore immediately before?~~

(I'm not surprised about the compiler error: that's explained by a previous change I made. But I've reverted it in my source tree, and it wasn't in the stashed compiler, so I don't understand why the failure is persisting.)

rsc

Contributor

If you want a complete fixed-point where nothing at all in the tree appears to be stale after modifying the sources to cmd/compile or cmd/link, then the minimum is

go install cmd
go install cmd
go install cmd
go install std

Looking at the inputs and outputs of each (the input to all steps is the same modified source code, so I'm only looking at the toolchain inputs, namely which toolchain is being used):

Inputs = original toolchain.
Outputs = new toolchain as compiled by original.
Inputs = new toolchain as compiled by original.
Outputs = new toolchain as compiled by (new toolchain as compiled by original).
Inputs = new toolchain as compiled by (new toolchain as compiled by original).
Outputs = new toolchain as compiled by (new toolchain as compiled by (new toolchain as compiled by original)).

Because we are using content-based (as opposed to mtime-based) staleness, a particular build result is indexed by the content hashes of the input sources (not changing here) and the binaries being used.

The binaries used as inputs in steps 1 and 2 are certainly different, since the one in step 2 has the modifications compiled into it.

The binaries used as inputs in steps 2 and 3 are different as well, at least when the modifications affect generated code.
Consider the case where the modifications contain an optimization that applies to some part of the compiler.
The output of step 1 = input of step 2 is an unoptimized compilation of the modified compiler sources.
The output of step 2 = input of step 3 is the optimized compilation of the modified compiler sources.
In the absence of compiler bugs, these two toolchains, although different in binary form, should be semantically equivalent.

Assuming those two toolchains really are semantically equivalent, they should produce identical binaries as output, except for the fact that the input hashes recorded in the binaries are different. We can't just rewrite the hashes, because maybe the toolchains aren't semantically equivalent. That is, maybe there's a bug in the compiler (it happens!).

Assuming the two toolchains really behave identically, then step 3 is a semantic repeat of step 2, just using a toolchain with different binaries. So it should produce the same outputs. But these outputs will have been compiled with themselves as their own input hashes, proving that we really have reached a fixed point and that the toolchains really are semantically equivalent, at least as applied to their own source code.

If go list says any part of cmd is stale after step 3, then there is a bug in the toolchain that is causing it not to produce deterministic outputs. This is what the staleness check in cmd/dist looks for.

Finally we need to recompile the standard library with the fixed-point compilers, which is step 4.

All this only matters if you are trying to make the compilers themselves not be reported as stale. The go command has no problem invoking stale compilers, so when I'm working on the toolchain, I just do one go install cmd (or go install cmd/link etc, whatever I'm changing) and then immediately move on to go test -short std. If you trust that the existing compiler works well enough to compile your changes, then there's no problem not forcing the fixed point. You're just trusting instead of verifying. At that point the only tests that fail are the ones that are picky about staleness.