Skip to content

Improve performance of fetching git dependencies by rev #10078

Closed
@dtolnay

Description

@dtolnay

Problem

GitHub recently enabled support for fetching individual commits by commit hash (uploadpack.allowReachableSHA1InWant on the server side).

$ GIT_TRACE_PACKET=1 git fetch | head -1
17:49:50.750968 pkt-line.c:80           packet:        fetch< 458d3459cfb0f923fbe968e6868a4893af34ba69 HEAD\0multi_ack thin-pack side-band side-band-64k ofs-delta shallow deepen-since deepen-not deepen-relative no-progress include-tag multi_ack_detailed allow-tip-sha1-in-want allow-reachable-sha1-in-want symref=HEAD:refs/heads/master filter object-format=sha1 agent=git/github-g6409641ef0c2

Notice the presence of allow-reachable-sha1-in-want in the advertised protocol capabilities.

Example of using "reachable sha1 in want" on the CLI:

$ mkdir cargo

$ cd cargo

$ git fetch --depth=1 https://github.com/rust-lang/cargo 88117505b8b691e0e7892630a71a85bb5e9945de
remote: Enumerating objects: 782, done.
remote: Counting objects: 100% (782/782), done.
remote: Compressing objects: 100% (685/685), done.
remote: Total 782 (delta 136), reused 316 (delta 65), pack-reused 0
Receiving objects: 100% (782/782), 2.08 MiB | 3.34 MiB/s, done.
Resolving deltas: 100% (136/136), done.
From https://github.com/rust-lang/cargo
 * branch                88117505b8b691e0e7892630a71a85bb5e9945de -> FETCH_HEAD

$ git log -1 88117505b8b691e0e7892630a71a85bb5e9945de
commit 88117505b8b691e0e7892630a71a85bb5e9945de (grafted)
Author: Eric Huss <[email protected]>
Date:   Fri Oct 22 07:53:17 2021 -0700

    Bump to 0.59.0

Notice in the above log that only 2.08 MiB total were downloaded. This is significantly less than the 50+ MiB of the whole Cargo repo. For larger repos the difference can be even more significant.

Unfortunately today Cargo doesn't make use of "reachable sha1 in want". Instead, when you specify a git dependency like cargo = { git = "https://github.com/rust-lang/cargo", rev = "88117505b8b691e0e7892630a71a85bb5e9945de" }, Cargo fetches all branches and tags and their entire history, hoping that the requested commit id is somewhere among that potentially enormous pile of commits:

// We don't know what the rev will point to. To handle this
// situation we fetch all branches and tags, and then we pray
// it's somewhere in there.
refspecs.push(String::from("+refs/heads/*:refs/remotes/origin/*"));
refspecs.push(String::from("+HEAD:refs/remotes/origin/HEAD"));
tags = true;

Proposed Solution

I've opened a PR containing the Cargo side of the implementation:

However libgit2, which is the C library wrapped by Cargo's git2 dependency, does not yet support "reachable sha1 in want" as far as I can tell (the git cli does, which is not based on libgit2, and is why the git fetch above is able to use it).

Someone will need to send a PR to libgit2 implementing "reachable sha1 in want", then pull the changes into https://github.com/rust-lang/git2-rs, and finally land the Cargo change to use it.

Notes

As a side benefit, this change will make rev = "..." dependencies support revs which are not in the history of any upstream branch or tag. But the performance or disk usage improvement will be the more noticeable benefit to most Cargo users.

Metadata

Metadata

Assignees

No one assigned

    Labels

    A-gitArea: anything dealing with gitC-feature-requestCategory: proposal for a feature. Before PR, ping rust-lang/cargo if this is not `Feature accepted`S-blocked-externalStatus: ❌ blocked on something out of the direct control of the Cargo project, e.g., upstream fix

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions