-
-
Notifications
You must be signed in to change notification settings - Fork 6.1k
Description
Gitea Version
1.15.6
Git Version
Seen on different servers running 2.25.1 and 2.34.0
Operating System
Ubuntu 20.04.3
How are you running Gitea?
One server is manually deployed from a GitHub release and started with a systemd script. The other uses the apt
-installed version of Gitea.
Database
PostgreSQL
Can you reproduce the bug on the Gitea demo site?
No
Log Gist
I can gather a log if that would be helpful. I've included clear repro steps in the description.
Description
LFS files are failing to download from forks. The only reason I can't repro on try.gitea.io is because LFS doesn't seem to be enabled. This is my minimal repro (visible at https://git.pernicious.games/try-gitea-lfs/lfs-fork-test and the fork at https://git.pernicious.games/cpickett/lfs-fork-test/src/branch/fork-1).
Note that we are pretty sure this used to work on our Gitea installation months ago, so it may be caused by a relatively recent Gitea version (maybe 1.15+).
The repro steps are:
- Create new repo with an LFS file
- Fork that repo
- Commit a new rev of the LFS file in a branch on the fork
- Open a PR for the main repo from the fork's branch
Now clone the base repo somewhere else and try to either pull from the remote or the PR ref:
- Clone base repo (e.g.
git clone https://git.pernicious.games/try-gitea-lfs/lfs-fork-test.git
) - Add the fork as a remote and fetch it (e.g.:
git remote add parnic https://git.pernicious.games/cpickett/lfs-fork-test.git
,git fetch parnic
) - Try switching to the branch that exists in the fork with the modified LFS file (e.g.:
git switch fork-1
)- Receive error, e.g.:
Downloading test.lfs (9 B)
Error downloading object: test.lfs (7ffc8bb): Smudge error: Error downloading test.lfs (7ffc8bb3b0c443a3d170b1f6aa402a66c6de86c49dd2d74e452d6d82b855939f): [7ffc8bb3b0c443a3d170b1f6aa402a66c6de86c49dd2d74e452d6d82b855939f] Not Found: [404] Not Found
- Restore modified files and try switching to the PR ref instead (e.g.:
git restore .
,git fetch origin pull/1/head:test
,git switch test
)- Receive same error as above
Note that the LFS file at the specified revision/hash exists in the fork, e.g.: https://git.pernicious.games/cpickett/lfs-fork-test.git/info/lfs/objects/7ffc8bb3b0c443a3d170b1f6aa402a66c6de86c49dd2d74e452d6d82b855939f/direct
but it does not exist in the base repo: https://git.pernicious.games/try-gitea-lfs/lfs-fork-test.git/info/lfs/objects/7ffc8bb3b0c443a3d170b1f6aa402a66c6de86c49dd2d74e452d6d82b855939f/direct
This is important because when running git lfs fetch --all
we see that it is trying to pull from the base org, not the fork:
>git lfs fetch --all
fetch: 2 object(s) found, done.
fetch: Fetching all references...
[7ffc8bb3b0c443a3d170b1f6aa402a66c6de86c49dd2d74e452d6d82b855939f] Not Found: [404] Not Found
error: failed to fetch some objects from 'https://git.pernicious.games/try-gitea-lfs/lfs-fork-test.git/info/lfs'
Finally, note that the LFS file does exist on the disk under the generic LFS directory. For this example, that's
/var/lib/gitea/data/lfs/7f/fc/8bb3b0c443a3d170b1f6aa402a66c6de86c49dd2d74e452d6d82b855939f
Screenshots
No response
Activity
wxiaoguang commentedon Nov 19, 2021
Can @zeripath or @KN4CK3R or someone else help to confirm?
IIRC the LFS code gets some changes during 1.14 -> 1.15, I am not sure whether this is a permission problem or a database consistency problem.
KN4CK3R commentedon Nov 19, 2021
Are you sure that worked before? After creating the PR the base repository contains the dangling pointer without the database reference:
(top = base, bottom = pr branch)
During 1.14 -> 1.15 I changed the LFS API code but this looks like we need to sync the LFS objects when creating a PR. And that functionality was never implemented I think.
KN4CK3R commentedon Nov 19, 2021
Looks like, I was wrong. I tested 1.12 and it works with that version (but is not correct I think). After creating the PR blob list does only show a single pointer (that's wrong because of the new PR branch?). Seems like the download request associated the pointer, so my changes in #16865 or #15523 may be the reason of the problem.
KN4CK3R commentedon Nov 19, 2021
The old code created the database references when downloading files. I changed that in #15523 so that only upload requests are able to do that.
Reasons:
So the question is how we would like to handle this problem. Personally I don't like to bring back file association in the download request. Auto-sync when creating a PR?
lunny commentedon Nov 19, 2021
I vote later.
wxiaoguang commentedon Nov 19, 2021
And just FYI, I remember this issue: #17207 , are they related? That issue also said about the behavior differs between Gitea release.
KN4CK3R commentedon Nov 19, 2021
Yes, I think they are related. Just putting the files in the directory does not associate the repo to them.
stu1811 commentedon Nov 19, 2021
I agree they are probably related and I've run into this issue as well. As a workaround you can do a
git lfs fetch <UPSTREAM>
and then agit lfs checkout
to get the files.Associate LFS pointer on download
parnic commentedon Nov 23, 2021
We are hosting a private Gitea server behind a VPN, so I've temporarily reintroduced the associate-LFS-pointers-on-download code for a hacked-up local build, in case anyone else needs the temporary workaround: https://github.com/go-gitea/gitea/compare/v1.15.6...parnic-sks:associate-lfs-obj-on-download?expand=1
stu1811 commentedon Dec 8, 2021
I built 1.15.6 cherry-picked the changes from parnic-sks@768aaab and it did not fix issue #17207. Detail below.
We got a new bundle with LFS updates. I extracted the new LFS files, fetched the branch, and tried to push to origin. It would not push due
hint: Your push was rejected due to missing or corrupt local objects
. I also triedgit lfs fetch --all
and the push failed again.parnic commentedon Dec 8, 2021
That one commit is not enough, you need the entire branch diff. parnic-sks@3872374 was actually the correct fix, but it needs the commit after that to fix the nil user deref.
stu1811 commentedon Dec 8, 2021
Sorry I mean I cherry-picked the whole branch.
54 remaining items
test-sha256 commentedon Aug 13, 2024
While I agree that associate on download is not ideal, there should be an auto-sync that associates all the files in PR. This would allow the commits and its LFS objects can be checked out in target repository.
gamedevkirk commentedon Sep 17, 2024
Hoping this gets some attention at some point. This is a brutally frustrating issue.
StanleySweet commentedon Apr 15, 2025
I believe 0 A.D. has the same, or a similar issue when syncing forks via the web UI.
na-Itms commentedon Jun 10, 2025
There are many situations in which our contributors for 0 A.D. (artists and programmers alike) run into the issue. We are forced to apply the workaround linked above.
In the specific situation of the "Sync Fork" button, a contributor updates their main branch using the "Sync Fork" button, which fast-forwards their main branch to the latest upstream. However, LFS files are not associated in the fork. Thus, the fork is not cloneable (without the workaround, it 404s on cloning).
I think that a proper fix for this specific situation would be to run the
settings/lfs/pointers/associate
endpoint every time the "Sync Fork" button is used.gitea/routers/web/repo/setting/lfs.go
Line 515 in e9f5105
I have tried to implement this and send you a PR, but I am not a Go developer and I don't understand how to properly call a method under
routers/settings
from arouters/branch
context.lunny commentedon Jun 14, 2025
Add a trace for the last step.
Although I checkout a branch from the remote
fork
, but it will download the lfs object from the base repository https://git.pernicious.games/try-gitea-lfs/lfs-fork-test.git/info/lfs/objects/batch. Maybe this is a bug of git-lfs. It should get the url of the remotefork
and download the lfs object from there. For those fork repositories' branches which have a PR in the base repository, the lfs pointers in that branch should be copied to base repository. But for other branches in the fork repositories, git-lfs should download lfs objects from the fork repository.lunny commentedon Jun 14, 2025
OK. I figured it out. Once the git repository have multiple remotes, configure like below.
git config lfs.remote.autodetect true
Then the problem will be resolved.
ref: git-lfs/git-lfs#5988 (comment)
ref: https://github.com/git-lfs/git-lfs/blob/0534b10d870acd31b13ea2c23d94a710b44ab98f/docs/man/git-lfs-config.adoc?plain=1#L59-L64
lunny commentedon Jun 14, 2025
Since the table
LFSMetaObject
hasn't store the branch name. We cannot easily copy the pointers of that branch when creating the pull request.Maybe two columns
path
andbranch
needs to be stored in this table or a new table.AdamMajer commentedon Jun 14, 2025
This LFS association stuff is general issue. To me, if you know the pointer of LFS object, you should be able to download it. The pointer is already unique and if we have a collision, then there are bigger problems than Gitea.
Adding a branch here makes things much more complicated -- what happens if you have LFS objects on branch
repo.branch1
Then you merge them to
repo.branch2
And then maybe you cherry pick these commits to another branch and push them too. Do you then check every commits for all LFS objects and try to associate them with every branch? What happens if you just fetch a SHA object and don't care about a branch.
git fetch --depth=1 remote SHA
IFF Gitea is doing these operations, sure, it's then "easy" to copy stuff around. But git is decentralized and these workflows are mostly not done on a Gitea instance but on some client.
A better way would be to associate objects by inspecting LFS objects on every push... but... there's an authentication issue that remains. IFF you know the Hash+Size, should that be sufficient to fetch the source object??
In my thinking, there's a case for removing the RepositoryID association with LFS object altogether. IFF you know the Hash+Size, then you already have a Git checkout of the sources. So you have access to the sources. The LFS<->RepositoryID association is then mostly unnecessary overhead. To put it another way, consider the world without LFS. In that world, all objects are in Git. A tarball is then a Git blob you get during fetch. Done. Now, in an LFS world, the LFS object you fetch on checkout -- but if you know the HASH/size of the hashed object, it means you already have the Git sources -- so you were authenticated to fetch the sources (from somewhere). Therefore there should be no additional authentication needed to download the object as you already had access to fetch its "key".
LFS shouldn't be adding hurdles to fetch complete objects.
PS. I'm assuming that git-lfs oids are always only full sha256 oids, and not some weird short oids.
lunny commentedon Jun 15, 2025
One benefit of including the branch and path in the table is that it allows displaying an lfs label in the file list, similar to how GitLab and Hugging Face do it.
AdamMajer commentedon Jun 15, 2025
It would be "easier" to list files with the tag, yes. But then we'd have this information in two places -- the gitea database and the Git store. That's a recipe for getting out of sync.
It may be better to parse the .gitattributes and then add the [lfs] tag on the fly for files that are correct format. We could even warn, like LFS does on checkout, if files are suppose to be tracked according to .gitattributes but are not in LFS. For this, all that needs to be done is checking the first few bytes of the file. For example,
And that is just the first 128 bytes. So, reading first up-to-150 bytes or so of the blob is enough to read the entire LFS pointer and verify that object is suppose to be in LFS or not. Nothing external to just
git cat-file
required.Aside: For GitLab, someone reported similar behaviour I proposed earier (no repository association) as a "vulnerability" (and other issues)
https://gitlab.com/gitlab-org/gitlab-workhorse/-/issues/197
I will discuss this with our security guys but I'm heavily leaning that this is not a vulnerability as otherwise all of TLS is basically broken too with same argument. The main reason why we want authentication for LFS uploads/fetches is to prevent LFS server to be abused as a type of remote file store for non-git repositories and to be able to track uploads.
lunny commentedon Jun 16, 2025
The git check-attr command is not efficient, especially when trying to identify LFS-tracked files within a specific subdirectory. It requires checking each file individually, which can be slow and cumbersome for large trees, as we must invoke git check-attr on every file to determine whether it matches the LFS attributes.
parnic commentedon Jun 16, 2025
The fix is known, if you're okay with re-enabling the automatic LFS file linkage. I've been maintaining it in my fork for a while: parnic-sks@b718e44
which is essentially a revert of 2bb3200 , if I'm remembering correctly.
lunny commentedon Jun 16, 2025
I think the patch might work, but it could also introduce inconsistent or “dirty” LFS metadata that doesn’t belong to the target repository. Ideally, this should be handled by copying the relevant LFS metadata from the fork’s branch to the base repository’s branch at the time the pull request is created—similar to how Git handles regular data.
However, since the lfs_meta_object table doesn’t include a branch reference, it’s difficult to determine which LFS pointers should be copied. One possible workaround is to copy all LFS pointers from the fork repository when the pull request is created.
As for cleanup, we also need to consider how to safely remove these LFS pointers when the fork repository is deleted—especially those that are not referenced in the base repository.
AdamMajer commentedon Jul 8, 2025
The problem is we need to download LFS files via target repo (eg. from a PR). The LFS objects have to be available via target repository even prior to being part of the main repository. Otherwise the PR cannot be resolved.
Another issue is that AGit + LFS do not work because of permission issues -- LFS push fails as there's no permissions to write to target repository.
I've asked our security contact regarding this repo+LFS-hash vs. only LFS-hash as method of accessing the file in question, and while it's not really likely for someone to know the hash without having the file, the hash could be leaked via logs, or partial dump and thus one could then de-hash it without authentication. While I could argue that this is still not a security issue that we need to defend against, maybe it's better not to open a can of worms and keep the LFS attribution to repo+LFS hash.
One of the questions is HOW to efficiently prune LFS/repo associations. The currrent database stores the OID of the LFS object itself. It would be helpful to then also store git blob OID as the git blob OID pointer as it can be interrogated quickly via
git rev-list
and checked for reachability in non-removed branches. For example, in my test repo,so if we have 0f93d97fcb604780cb28306b40fe137b5949cad3b579132aae76694f4cef1d59 stored in in addition to LFS OID, then these LFS objects can be quickly pruned from database when the referencing LFS pointers are no longer part of the repository. It would also be simpler to copy associations in PRs as we only need to copy the referenced LFS OIDs and not necessarily all.
Removing LFS objects can then safely be done when repo is removed - when referenced repo count is 0 after repository is removed.