Closed
Description
Context
go mod download
is often used in build systems to "warm the cache" so subsequent operations don't need to fetch any more data. However, it can currently download more than expected. See #41431.
In #41431 (comment), @bcmills suggests using go list -test -deps ./...
instead. This would only fetch only the modules transitively imported by packages in the main module.
A similar suggestion of go list -test all
was made recently in the Gophers slack.
Proposal
This issue proposes changing the default selection for go mod download
to match go list all
. This would make go mod download
more intuitive for the cache-warming case which typically does not involve needing dependencies' test dependencies.
The current selection could be moved to a flag if necessary.
Metadata
Metadata
Assignees
Type
Projects
Relationships
Development
No branches or pull requests
Activity
bcmills commentedon Feb 19, 2021
CC @jayconrod @matloob
[-]proposal: change go mod download's default selection[/-][+]proposal: cmd/go: narrow 'mod download' default set[/+]rsc commentedon May 4, 2021
For clarity, right now
go mod download
downloads the modules listed bygo list -m all
.This proposal is to limit it to the modules supplying packages for
go list all
, which is a smaller set.Note that we would still need to download go.mod files for something like the whole original set,
and those should still be reported in
go mod download -json
, in entries that would simply omitthe full-source links if the full source was not downloaded.
rsc commentedon May 19, 2021
Based on the discussion above, this proposal seems like a likely accept.
— rsc for the proposal review group
zikaeroh commentedon May 25, 2021
I think that this would break the cache warming case for when you only have
go.mod
andgo.sum
and none of the code, as I described in: #45551 (comment)This case is common when building docker containers as it allows you to populate a module cache before you copy the code in. If the module cache state is solely dependent on the code (which is what
go list all
effectively does), then the cache is populated after the code is added in a new layer, and any code change will invalidate the layer and cause the cache to be thrown away.It seemed like based on #45551 (comment) that this was a supported case and would actually be improving in the future, but
go list all
when I only havego.mod
andgo.sum
nets me no packages whilego list -m all
does appear to query them.cc @jayconrod
bcmills commentedon May 25, 2021
@zikaeroh, if you're setting up a fast Docker builder, you probably want to prime the build cache too — not just the module cache.
In that case, you would either want to
go build $PACKAGES
(where$PACKAGES
is an externally-computed list of the packages that you expect to be relevant to the build), or go ahead and upload a baseline copy of the code to be build and rungo test -c -o /dev/null ./...
or similar.Or, to put it another way: if you're ok with wasting some space on the build image (as you would be with
go mod download
today), then you're likely better off wasting that space on a redundant copy of the source code for your module instead of redundant copies of the source code for unused modules in your dependency graph.zikaeroh commentedon May 25, 2021
In my experience, getting the 2 GB of dependencies my project needs (or about 1 GB non-test) is much, much more time consuming than the actual build itself.
Unfortunately, docker can't work this way. The build context for docker is based on the current file system. I can't pre-copy a different version (which version?) of the code into the container when building because there's only one version; the current one. If I make any change to the source code, any layer that depended on it is invalidated, hence why I only copy in
go.mod
andgo.sum
, as they don't change that often and provide the info required to pre-populate the cache (even if overzealous).Note that I'm not "wasting space"; I build the binary, then copy it to another image (see: https://github.com/hortbot/hortbot/blob/master/Dockerfile). In this case, the only "waste" is the cache that'd on the builder (versus where the image runs), which is what I want to happen anyway. This is a common pattern when building docker images with compiled code; there's no reason to ship the source, the Go compiler, caches, etc.
seankhliao commentedon May 25, 2021
It can work that way: use an old version of your build stage as the base image for your newer builds. (Note newer
docker
versions can mount volumes as caches removing the need for layering gymnastics)The "wasting space" refers to your build stage, which you are obviously caching to be able to reuse it. In most cases an old copy of your code + accurate dependencies takes up less space than the extended set of dependencies. The wasted space also translates to longer build times if your CI is stateless as it now needs to restore/save larger caches from a remote
zikaeroh commentedon May 25, 2021
If you believe it can be done, then I would appreciate some sort of example to look at; I'm having a hard time conceptualizing how that would work without shifting the caching responsibility to me or extra scripts (managing which parent image I'm using when that's encoded in Dockerfile, removing old items, etc) or breaking statelessness (some at-build volume). It seems like a shame to have to be managing all of this when it was so simple to achieve before following only docker's best practices documentation.
rsc commentedon May 26, 2021
This is only about changing the default for
go mod download
with no arguments.@bcmills, would
go mod download all
still get the old behavior?Or maybe the default should be "what's listed in go.mod", which would work well with the new module pruning?
rsc commentedon May 26, 2021
This proposal has been added to the active column of the proposals project
and will now be reviewed at the weekly proposal review meetings.
— rsc for the proposal review group
28 remaining items