Skip to content

cmd/go: narrow 'mod download' default set #44435

Closed
@danp

Description

@danp

Context

go mod download is often used in build systems to "warm the cache" so subsequent operations don't need to fetch any more data. However, it can currently download more than expected. See #41431.

In #41431 (comment), @bcmills suggests using go list -test -deps ./... instead. This would only fetch only the modules transitively imported by packages in the main module.

A similar suggestion of go list -test all was made recently in the Gophers slack.

Proposal

This issue proposes changing the default selection for go mod download to match go list all. This would make go mod download more intuitive for the cache-warming case which typically does not involve needing dependencies' test dependencies.

The current selection could be moved to a flag if necessary.

Activity

added this to the Proposal milestone on Feb 19, 2021
bcmills

bcmills commented on Feb 19, 2021

@bcmills
Contributor
changed the title [-]proposal: change go mod download's default selection[/-] [+]proposal: cmd/go: narrow 'mod download' default set[/+] on May 4, 2021
rsc

rsc commented on May 4, 2021

@rsc
Contributor

For clarity, right now go mod download downloads the modules listed by go list -m all.
This proposal is to limit it to the modules supplying packages for go list all, which is a smaller set.

Note that we would still need to download go.mod files for something like the whole original set,
and those should still be reported in go mod download -json, in entries that would simply omit
the full-source links if the full source was not downloaded.

rsc

rsc commented on May 19, 2021

@rsc
Contributor

Based on the discussion above, this proposal seems like a likely accept.
— rsc for the proposal review group

zikaeroh

zikaeroh commented on May 25, 2021

@zikaeroh
Contributor

I think that this would break the cache warming case for when you only have go.mod and go.sum and none of the code, as I described in: #45551 (comment)

This case is common when building docker containers as it allows you to populate a module cache before you copy the code in. If the module cache state is solely dependent on the code (which is what go list all effectively does), then the cache is populated after the code is added in a new layer, and any code change will invalidate the layer and cause the cache to be thrown away.

It seemed like based on #45551 (comment) that this was a supported case and would actually be improving in the future, but go list all when I only have go.mod and go.sum nets me no packages while go list -m all does appear to query them.

cc @jayconrod

bcmills

bcmills commented on May 25, 2021

@bcmills
Contributor

@zikaeroh, if you're setting up a fast Docker builder, you probably want to prime the build cache too — not just the module cache.

In that case, you would either want to go build $PACKAGES (where $PACKAGES is an externally-computed list of the packages that you expect to be relevant to the build), or go ahead and upload a baseline copy of the code to be build and run go test -c -o /dev/null ./... or similar.

Or, to put it another way: if you're ok with wasting some space on the build image (as you would be with go mod download today), then you're likely better off wasting that space on a redundant copy of the source code for your module instead of redundant copies of the source code for unused modules in your dependency graph.

zikaeroh

zikaeroh commented on May 25, 2021

@zikaeroh
Contributor

if you're setting up a fast Docker builder, you probably want to prime the build cache too — not just the module cache.

In my experience, getting the 2 GB of dependencies my project needs (or about 1 GB non-test) is much, much more time consuming than the actual build itself.

In that case, you would either want to go build $PACKAGES (where $PACKAGES is an externally-computed list of the packages that you expect to be relevant to the build), or go ahead and upload a baseline copy of the code to be build and run go test -c -o /dev/null ./... or similar.

Or, to put it another way: if you're ok with wasting some space on the build image (as you would be with go mod download today), then you're likely better off wasting that space on a redundant copy of the source code for your module instead of redundant copies of the source code for unused modules in your dependency graph.

Unfortunately, docker can't work this way. The build context for docker is based on the current file system. I can't pre-copy a different version (which version?) of the code into the container when building because there's only one version; the current one. If I make any change to the source code, any layer that depended on it is invalidated, hence why I only copy in go.mod and go.sum, as they don't change that often and provide the info required to pre-populate the cache (even if overzealous).

Note that I'm not "wasting space"; I build the binary, then copy it to another image (see: https://github.com/hortbot/hortbot/blob/master/Dockerfile). In this case, the only "waste" is the cache that'd on the builder (versus where the image runs), which is what I want to happen anyway. This is a common pattern when building docker images with compiled code; there's no reason to ship the source, the Go compiler, caches, etc.

seankhliao

seankhliao commented on May 25, 2021

@seankhliao
Member

It can work that way: use an old version of your build stage as the base image for your newer builds. (Note newer docker versions can mount volumes as caches removing the need for layering gymnastics)

The "wasting space" refers to your build stage, which you are obviously caching to be able to reuse it. In most cases an old copy of your code + accurate dependencies takes up less space than the extended set of dependencies. The wasted space also translates to longer build times if your CI is stateless as it now needs to restore/save larger caches from a remote

zikaeroh

zikaeroh commented on May 25, 2021

@zikaeroh
Contributor

If you believe it can be done, then I would appreciate some sort of example to look at; I'm having a hard time conceptualizing how that would work without shifting the caching responsibility to me or extra scripts (managing which parent image I'm using when that's encoded in Dockerfile, removing old items, etc) or breaking statelessness (some at-build volume). It seems like a shame to have to be managing all of this when it was so simple to achieve before following only docker's best practices documentation.

rsc

rsc commented on May 26, 2021

@rsc
Contributor

This is only about changing the default for go mod download with no arguments.
@bcmills, would go mod download all still get the old behavior?
Or maybe the default should be "what's listed in go.mod", which would work well with the new module pruning?

rsc

rsc commented on May 26, 2021

@rsc
Contributor

This proposal has been added to the active column of the proposals project
and will now be reviewed at the weekly proposal review meetings.
— rsc for the proposal review group

28 remaining items

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

      Development

      No branches or pull requests

        Participants

        @danp@rsc@andig@bcmills@gopherbot

        Issue actions

          cmd/go: narrow 'mod download' default set · Issue #44435 · golang/go