Skip to content

Update a Github dependency with --precise <short_id> fails #13188

@JEnoch

Description

@JEnoch

Problem

Updating a Github dependency with a short commit id fails with such log:

error: Unable to update https://github.com/rust-lang/regex#837fd85

Caused by:
  object not found - no match for id (837fd85000000000000000000000000000000000); class=Odb (9); code=NotFound (-3)

Steps

cargo new test-cargo-precise 
cd test-cargo-precise 
cargo add --git https://github.com/rust-lang/regex regex 
CARGO_LOG=trace cargo update -p regex --precise 837fd85

Resulting cargo update log:

   0.000038417s TRACE cargo: start="2023-12-21T07:57:53.908041000Z"
   0.000538542s TRACE cargo::util::config: get cv ConfigKey { env: "CARGO_ALIAS_UPDATE", parts: [("alias", 5), ("update", 11)] }
   0.000570375s TRACE cargo::util::config: get cv ConfigKey { env: "CARGO_TERM", parts: [("term", 5)] }
   0.000589250s TRACE cargo::util::config: get cv ConfigKey { env: "CARGO_NET", parts: [("net", 5)] }
   0.001122417s TRACE cargo::util::config: get cv ConfigKey { env: "CARGO_HTTP", parts: [("http", 5)] }
   0.001523083s TRACE cargo::util::config: get cv ConfigKey { env: "CARGO_BUILD", parts: [("build", 5)] }
   0.001558208s TRACE cargo::util::toml: read_manifest; path=/Users/julienenoch/tmp/test-cargo-precise/Cargo.toml; source-id=/Users/julienenoch/tmp/test-cargo-precise
   0.001857042s DEBUG cargo::core::workspace: find_members - only me as a member
   0.002212917s TRACE cargo::util::config: get cv ConfigKey { env: "CARGO_PATCH", parts: [("patch", 5)] }
   0.002268583s TRACE cargo::util::config: get cv ConfigKey { env: "CARGO_REGISTRY_INDEX", parts: [("registry", 5), ("index", 14)] }
   0.002286292s TRACE cargo::util::config: get cv ConfigKey { env: "CARGO_REGISTRIES_CRATES_IO_PROTOCOL", parts: [("registries", 5), ("crates-io", 16), ("protocol", 26)] }
   0.002298250s TRACE cargo::util::config: get cv ConfigKey { env: "CARGO_REGISTRIES_CRATES_IO_PROTOCOL", parts: [("registries", 5), ("crates-io", 16), ("protocol", 26)] }
   0.002305875s TRACE cargo::util::config: get cv ConfigKey { env: "CARGO_REGISTRY_INDEX", parts: [("registry", 5), ("index", 14)] }
   0.002316875s TRACE cargo::util::config: get cv ConfigKey { env: "CARGO_SOURCE", parts: [("source", 5)] }
   0.002328917s DEBUG cargo::core::registry: load/missing  https://github.com/rust-lang/regex#837fd85
   0.002347417s DEBUG cargo::core::registry: loading source https://github.com/rust-lang/regex#837fd85
   0.002353417s DEBUG cargo::sources::config: loading: https://github.com/rust-lang/regex#837fd85
   0.002360000s TRACE cargo::core::source_id: loading SourceId; https://github.com/rust-lang/regex#837fd85
    Updating git repository `https://github.com/rust-lang/regex`
   0.009446625s TRACE cargo::sources::git::source: updating git source `GitRemote { url: Url { scheme: "https", cannot_be_a_base: false, username: "", password: None, host: Some(Domain("github.com")), port: None, path: "/rust-lang/regex", query: None, fragment: None } }`
   0.010665208s DEBUG cargo::sources::git::utils: attempting GitHub fast path for https://github.com/api/repos/rust-lang/regex/commits/837fd85000000000000000000000000000000000
   0.225184458s DEBUG cargo::sources::git::utils: skipping gc as there's only 2 pack files
   0.225689625s DEBUG cargo::sources::git::utils: doing a fetch for https://github.com/rust-lang/regex
   0.226630292s DEBUG cargo::sources::git::utils: initiating fetch of ["+refs/heads/*:refs/remotes/origin/*", "+HEAD:refs/remotes/origin/HEAD"] from https://github.com/rust-lang/regex
   0.716648583s DEBUG cargo::sources::git::utils: attempting GitHub fast path for https://github.com/api/repos/rust-lang/regex/commits/837fd85000000000000000000000000000000000
   0.739785417s DEBUG cargo::sources::git::utils: skipping gc as there's only 0 pack files
   0.739872500s DEBUG cargo::sources::git::utils: doing a fetch for https://github.com/rust-lang/regex
   0.740022625s DEBUG cargo::sources::git::utils: initiating fetch of ["+refs/heads/*:refs/remotes/origin/*", "+HEAD:refs/remotes/origin/HEAD"] from https://github.com/rust-lang/regex
   2.332392833s DEBUG cargo: exit_with_error; err=CliError { error: Some(Unable to update https://github.com/rust-lang/regex#837fd85

Caused by:
    object not found - no match for id (837fd85000000000000000000000000000000000); class=Odb (9); code=NotFound (-3)), exit_code: 101 }
   2.332410958s DEBUG cargo: display_error; err=Unable to update https://github.com/rust-lang/regex#837fd85

Caused by:
    object not found - no match for id (837fd85000000000000000000000000000000000); class=Odb (9); code=NotFound (-3)
error: Unable to update https://github.com/rust-lang/regex#837fd85

Caused by:
  object not found - no match for id (837fd85000000000000000000000000000000000); class=Odb (9); code=NotFound (-3)

Possible Solution(s)

No response

Notes

This Github API URL used by cargo returns an error:
https://github.com/api/repos/rust-lang/regex/commits/837fd85000000000000000000000000000000000

While the same but with short commit id succeeds:
https://github.com/api/repos/rust-lang/regex/commits/837fd85

What's strange is that I successfully used such cargo update -p <dep> --precise <short_id> few weeks ago.
Could it be that Github changed its API ?

Version

cargo 1.74.1 (ecb9851af 2023-10-18)
release: 1.74.1
commit-hash: ecb9851afd3095e988daaa35a48bc7f3cb748e04
commit-date: 2023-10-18
host: aarch64-apple-darwin
libgit2: 1.7.1 (sys:0.18.0 vendored)
libcurl: 8.1.2 (sys:0.4.68+curl-8.4.0 system ssl:(SecureTransport) LibreSSL/3.3.6)
ssl: OpenSSL 1.1.1u  30 May 2023
os: Mac OS 14.1.2 [64-bit]

--
Also same issue with:

cargo 1.71.0 (cfd3bbd8f 2023-06-08)
release: 1.71.0
commit-hash: cfd3bbd8fe4fd92074dfad04b7eb9a923646839f
commit-date: 2023-06-08
host: aarch64-apple-darwin
libgit2: 1.6.4 (sys:0.17.1 vendored)
libcurl: 8.1.2 (sys:0.4.61+curl-8.0.1 system ssl:(SecureTransport) LibreSSL/3.3.6)
ssl: OpenSSL 1.1.1t  7 Feb 2023
os: Mac OS 14.1.2 [64-bit]

Activity

added
C-bugCategory: bug
S-triageStatus: This issue is waiting on initial triage.
on Dec 21, 2023
weihanglo

weihanglo commented on Dec 21, 2023

@weihanglo
Member

Got an impression that @Eh2406 just talked about this somewhere. @Eh2406 do you remember anything?

JEnoch

JEnoch commented on Dec 22, 2023

@JEnoch
Author

Digging into the code, I see this completion with 0s of short commit id comes from usage of git2::Oid::from_str(s) in SourceId::precise_git_oid().
It somehow ends up to be used in github_fast_path(), probably after conversion from git2::Oid to GitReference::Rev(String). Possibly this one ?

Could the regression come from #12849 ?
But that wouldn't explain why the issue also occurs with cargo 1.71.0 that was release before this PR.

weihanglo

weihanglo commented on Dec 22, 2023

@weihanglo
Member

Good finding! Thank you.

I've tested several toolchains. Starting from 1.46.0 Cargo contains this bug, which contains the PR #8363 that you observed the potential bad commit. The Cargo started using Oid::from_str to parse precise field since then:

Some(s) => Some(git2::Oid::from_str(s).with_context(|| {
format!("precise value for git is not a git revision: {}", s)
})?),

The zero-padding behavior is a documented behavior in libgit2 like from the beginning, so not a dependency issue.

It might or might not be GitHub has changed the API. I cannot tell, as Cargo had no such a test exercising --precise <SHA> until recently. And that doesn't even test against short ID.

Feel like SourceId::from_url has an assumption that precise for Git source is always a valid long-enough Git SHA (the format is opaque to cargo users btw). One fix I can think of is comparing if the value of --precise is a hex string of 40/64 characters here, and give a nice error if not. This is only a fix on the surface, if we find more commands have this kind of bug, we might want to push the fix down to lower-level.

added
A-gitArea: anything dealing with git
S-acceptedStatus: Issue or feature is accepted, and has a team member available to help mentor or review
and removed
S-triageStatus: This issue is waiting on initial triage.
on Dec 22, 2023
linyihai

linyihai commented on Dec 25, 2023

@linyihai
Contributor

One fix I can think of is comparing if the value of --precise is a hex string of 40/64 characters here, and give a nice error if not.

I will provide a PR to fix it based on this

This is only a fix on the surface, if we find more commands have this kind of bug, we might want to push the fix down to lower-level.

This is worth a look, I will check the relevant code by the way, once there is any feasible plan I will make a corresponding plan

@rustbot claim

JEnoch

JEnoch commented on Dec 27, 2023

@JEnoch
Author

One fix I can think of is comparing if the value of --precise is a hex string of 40/64 characters here, and give a nice error if not.

If possible I would like to still have the possibility to use --precise with short SHA.

As a use case in Zenoh: there are some plugins (shared libraries) for which the dependency to the zenoh crate must be the exact same commit than the hosting binary (to avoid ABI incompatibility). The short commit id can be retrieved via --version that return the result of git describe.
Using this short commit id in --precise would avoid the user to search for the long commit id.

I think a solution could be the following:

I tested such a patch here, and it seems to work.
What do you think?

weihanglo

weihanglo commented on Dec 27, 2023

@weihanglo
Member

Thanks for looking into the issue! The effort is much appreciated :)

However, there are probably two things we need to take care of:

  • The patch seems for GitHub only. I would like to see a general fix, as this issue is not GitHub-specific. GitLab or even a local git repository has the bug.
  • Changing locked_rev is not desired, as the new rev might not be the same as the old one, which we lose the meaning of "locked". Git short hashes can also be generated deliberately, and it doesn't seem too safe to trim and query over network.

That said, it doesn't mean we cannot support short hashes. We need to find a reliable way to pass down the value from --precise, and get it resolved to a full SHA. If the solution is not yet ready, for now we can give a nice message of "short hash not supported".

JEnoch

JEnoch commented on Dec 28, 2023

@JEnoch
Author

Understood, that makes sense.

I think the root of the issue is that the conversion from short SHA to git2::Oid leads to a 20 bytes SHA completed with 0s that cannot be found in the git2::Repository via call to revparse_single().

I made some tests with git2:
revparse_single() accepts short SHA and returns the full SHA. It detects short SHA collisions, returning such this error message: "ambiguous OID prefix - found multiple offsets for pack entry". So I think it's safe to use it to resolve a short SHA into a full SHA before storing it as locked_rev.
Actually, GitDatabase::resolve(&self, r: &GitReference) uses it and will return a git2::Oid containing the full SHA.

This call to GitDatabase::resolve() for full SHA retrieval could be done in GitSource::new() as a replacement to source_id.precise_git_oid() call. But this requires to create the GitDatabase here, while it's only created in GitSource::block_until_ready() so far. Not sure if it's a problem.

An alternative is to retrieve the full SHA in GitSource::block_until_ready(), just after GitDatabase creation. No need to replace self.locked_rev, but just to use the retrieved full SHA instead in the following match (self.locked_rev, db)

weihanglo

weihanglo commented on Jan 4, 2024

@weihanglo
Member

I think the root of the issue is that the conversion from short SHA to git2::Oid leads to a 20 bytes SHA completed with 0s that cannot be found in the git2::Repository via call to revparse_single().

Yeah. Totally agree on the root cause.

There seems to be an “implicit assumption” that a git SourceId has a precise field of a full SHA. cargo update is an exception that makes use of --precise to have a transient precise value presenting a arbitrary git revision. It would eventually become a full SHA when the revision is resolved.

The proposed solutions aren't wrong, though I would like to see locked_rev to be a type that represent a "revision", so we could have either full SHA (oid), or an arbitrary git revision. That would also make the behavior aligned to the doc:

--precise precise
When used with spec, allows you to specify a specific version number to set the package to. If the package comes from a git repository, this can be a git revision (such as a SHA hash or tag).

BTW, I accidentally have a working patch of this during code exploration. I may post a PR later but feel free to propose more better solutions :)

linyihai

linyihai commented on Jan 5, 2024

@linyihai
Contributor

One fix I can think of is comparing if the value of --precise is a hex string of 40/64 characters here, and give a nice error if not.

I will provide a PR to fix it based on this

This is only a fix on the surface, if we find more commands have this kind of bug, we might want to push the fix down to lower-level.

This is worth a look, I will check the relevant code by the way, once there is any feasible plan I will make a corresponding plan

@rustbot claim

Sorry, I haven't started this work yet. It's nice to see a better solution being proposed. 👍

removed their assignment
on Jan 5, 2024
weihanglo

weihanglo commented on Jan 5, 2024

@weihanglo
Member

Oops. I didn't realize this is claimed. Sorry for that, @linyihai 😔

Actually I was trying to provide guidance but ended up writing a fix

linyihai

linyihai commented on Jan 5, 2024

@linyihai
Contributor

Oops. I didn't realize this is claimed. Sorry for that, @linyihai 😔

Actually I was trying to provide guidance but ended up writing a fix

The content to be changed is already very different from what I have been exposed to before, So feel free to continue it. Your PR is also a good guidance for me.

JEnoch

JEnoch commented on Jan 5, 2024

@JEnoch
Author

BTW, I accidentally have a working patch of this during code exploration. I may post a PR later but feel free to propose more better solutions :)

You PR is more advanced than my proposal. I agree that having a Revision enum that can be a GitReference to be resolved when required is more accurately reflecting reality. Nice job! 😃

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

No one assigned

    Labels

    A-gitArea: anything dealing with gitC-bugCategory: bugCommand-updateS-acceptedStatus: Issue or feature is accepted, and has a team member available to help mentor or review

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

      Development

      Participants

      @JEnoch@weihanglo@linyihai

      Issue actions

        Update a Github dependency with `--precise <short_id>` fails · Issue #13188 · rust-lang/cargo