Skip to content

Prebuilds fail with error "cannot initialize workspace: prebuild initializer: git fetch -p -P . tags -f failed" #9280

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
3 of 4 tasks
princerachit opened this issue Apr 13, 2022 · 27 comments
Labels
❓ clarification required team: webapp Issue belongs to the WebApp team type: bug Something isn't working

Comments

@princerachit
Copy link
Contributor

princerachit commented Apr 13, 2022

Bug description

Incremental Prebuilds (Beta) fails for Projects which have repo that contains private submodules.

This happens when the submodules are update in the repo. The subsequent prebuild fails while trying to update the submodules.

Error

Error looks similar to these

Example 1

When you start a workspace, it triggers a prebuild which results in the following error:

Oh, no! Something went wrong!
cannot initialize workspace: cannot initialize
workspace: prebuild initializer: git fetch -p -P .
tags -f failed (exit status 1): From
https://github.com/x/x-mean +
5096db13…b6a9d685 tests/y
-
origin/tests/y (forced update) Fetching
submodule a/b No user exists for uid
133332 fatal: Could not read from remote
repository. Please make sure you have the correct
access rights and the repository exists. No user
exists for id 133332 fatal: Could not read from
remote repository. Please make sure you have the
correct access rights and the repository exists.
Errors during submodule fetch: a/b
a/b

Example 2

No user exists for uid 133332
fatal: Could not read from remote repository.
Please make sure you have the correct access right and the repository exists.

Example 3

On the prebuild page of the project.

Prebuild failed for system reasons. Please contact support. cannot initialize workspace: cannot initialize workspace: prebuild initializer: git fetch -p -P --tags -f failed (exit status 1): From https://github.com/princerachit/pub 9383449..529e501 main -> origin/main * [new branch] newbranch1 - ...

Workaround

Since this problem is specific to Incremental Prebuilds as workaround we suggest users to do the following until we fix this issue.

  1. Disable Incremental Prebuilds permanently
  2. Trigger a new Prebuild on on your branch

Above seems to work for most cases but recently we encountered a customer who disabled prebuild but could still see the issue: internal slack thread

Please let us know if this workaround does not help.

Steps to reproduce

  1. Create a repo (pub) which has a private repo git submodule (hidden)
  2. Create an ssh key from an account which has access to hidden
  3. Encode the ssh key and set into your project environment variable. e.g. cat id_rsa | base64
  4. Update your .gitpod.yml so that it uses that key to initialize pub's submodules in prebuild- ref
  5. Create a project from this pub repo and enable incremental prebuilds
  6. Add new commits to the submodule repo
  7. Now create a branch b1 from pub repo and push it to GH. Let the prebuild run and finish.
  8. Update a few files locally (other than .gitpod.yml) and then also update the submodule to the HEAD in branch b1 e.g. cd hidden && GIT_SSH_COMMAND="ssh -i ../idkey -o UserKnownHostsFile=/dev/null -o StrictHostKeyChecking=no" git pull origin main
  9. Commit the change git add -u && git commit -m "changes"
  10. Push the change git push
  11. Navigate to the prebuild running on b1 branch. Prebuild should be failing.

e.g. Refer to the prebuild url here and logs (This is not accessible to public)

Workspace affected

Several prebuilds have been affected resulting in workspace failure.

To find the current number of failed prebuilds by their repo run the following command:

SELECT 
    cloneURL,COUNT(*)
FROM
    d_b_prebuilt_workspace
WHERE
    error LIKE '%prebuild initializer: git fetch -p -P%'
    and creationTime > '2022-02-13 08:56:44.752194'
GROUP BY cloneURL;

Expected behavior

Prebuilds should work without any issue.

Example repository

https://github.com/princerachit/pub

Anything else?

No response

Root cause

During incremental prebuild snapshost is downloaded and then the local repo is updated.

The snapshot contains the hidden submodule which was initialized during last successful prebuild.

When the git fetch -p -P --tags command is run git finds that there was an update in the hidden submodule's sha reference. It then tries to update the submodule. Since we preserve the userid/gid (133332 which is missing in the init container) mapping when untarring the snapshot, the git command complains about the user not exisiting.

Even after creating this user manually the prebuild fails as the ssh-key required to fetch this submodule does not exists in the content-init container.

Plan to Resolve

  • Try Reproducing the Error
  • Triage the code responsible for the issue
  • Figure out possible fixes - This issue will be fixed with the migration to PVC based storage where the git command will be run in the workspace container context. The work for PVC is under progress and being tracked in previous link. See internal thread.
  • Test fixes

Front logo Front conversations

@princerachit princerachit self-assigned this Apr 13, 2022
@princerachit princerachit added the team: workspace Issue belongs to the Workspace team label Apr 13, 2022
@princerachit
Copy link
Contributor Author

princerachit commented Apr 13, 2022

I could not reproduce this error. I tried the following.

  1. Create several branches with prebuild enabled on all of them
  2. Create several tags
  3. Update all branches by pushing new commit
  4. Force push tags
  5. Force push new commits on few branches
  6. Push new commits to all branches

Also, When I investigate DB I see that jt 142 prebuilds, gitlab-org 73 prebuilds, et 56 prebuilds had these errors in past 2 months.

SELECT 
    cloneURL,COUNT(*)
FROM
    d_b_prebuilt_workspace
WHERE
    error LIKE '%prebuild initializer: git fetch -p -P%'
    and creationTime > '2022-02-13 08:56:44.752194'
GROUP BY cloneURL;

@kylos101 kylos101 moved this to In Progress in 🌌 Workspace Team Apr 13, 2022
@kylos101
Copy link
Contributor

Hi @princerachit , have you tried setting up a repo with a submodule, and then push to the repo and repo backing the submodule at different times?

@princerachit
Copy link
Contributor Author

Hi @princerachit , have you tried setting up a repo with a submodule, and then push to the repo and repo backing the submodule at different times?

Yes, I tried that too but did not work

@princerachit
Copy link
Contributor Author

So far I tried these combinations could not reproduce the issue, in all these cases the submodule was a private module:

  1. Push commit to branch b1 and then to branch b2
  2. force push commit to b1 and then fresh commit to b2
  3. Update submodule in b1 and then a new commit to b2
  4. Force push commit updating submodule in b1 and then a new commit to b2

I suspect that these combinations are not working because my user has permission to access the private submodule and does not use an ssh key.

As a next set of steps I am going to create a private submodule in another account and then would use my account to perform above test.

I will also make sure that the git submodule update --init is executed in a prebuild so that it serves as a base for subsequent prebuilds.

@princerachit princerachit moved this from In Progress to Scheduled in 🌌 Workspace Team Apr 19, 2022
@princerachit
Copy link
Contributor Author

princerachit commented Apr 19, 2022

I have moved this to scheduled column as my prebuilds are stuck in pending state. Team webapp is investigating the problem, I will pick it up once the issue is resolved.

@princerachit
Copy link
Contributor Author

princerachit commented Apr 21, 2022

Reproducing the Error

I was able to reproduce this issue using incremental prebuild. Refer to the prebuild url here and logs (This is not accessible to public)

  1. Create a repo (pub) which has a private repo git submodule (hidden)
  2. Your github account SHOULD NOT have access to hidden repo
  3. Create an ssh key from an account which has access to hidden
  4. Encode the ssh key and set into your project environment variable
  5. Update your .gitpod.yml so that it uses that key to initialize pub's submodules in prebuild- ref
  6. Now create a branch b1
  7. Update the submodule commit (make sure there are new commits in the private repo) e.g. cd hidden && GIT_SSH_COMMAND="ssh -i myssshidkey -o UserKnownHostsFile=/dev/null -o StrictHostKeyChecking=no" git pull origin main
  8. Commit the change
  9. Push the change
  10. Prebuild should failed

NOTE: If you update the init section of .gitpod.yml, then a new prebuild will not refer to the last prebuild from this branch. It looks like there is some hash matching being done in order to determine base prebuilds.

Why previous attempts failed

1. The previous attempts failed because my github account being used in gitpod already had access to private repo

  1. I was not using git submodule update --init in my prebuild command.

@princerachit
Copy link
Contributor Author

princerachit commented Apr 21, 2022

Cause

I am not exactly sure of the reason of this failure, I will add logs and try to reproduce this in development env.
Right now my hypothesis is that since the git fetch is run in ring2 and git submodule --init is run in ring3, the submodule has a different user as owner or something on similar lines which causes subsequent git fetch to fail during incremental prebuild.

Maybe Related issue forum

Related Code

return c.Git(ctx, "fetch", "-p", "-P", "--tags", "-f")

@princerachit princerachit moved this from Scheduled to In Progress in 🌌 Workspace Team Apr 21, 2022
@princerachit
Copy link
Contributor Author

I added logs to the code path where we perform fetch. The file permission looks correct with all the files owned by user 133332:

40K
-rw-r--r-- 1 133332 133332    2 Apr 28 07:02 there
-rw-r--r-- 1 133332 133332   21 Apr 28 07:02 afile
-rw-r--r-- 1 133332 133332   14 Apr 28 07:02 README.md
-rw-r--r-- 1 133332 133332  446 Apr 28 07:02 .gitpod.yml
-rw-r--r-- 1 133332 133332   82 Apr 28 07:02 .gitmodules
-rw-r--r-- 1 133332 133332    5 Apr 28 07:02 .gitignore
drwxr-xr-x 2 133332 133332 4.0K Apr 28 07:02 hidden
drwxr-x--- 4 133332 133332 4.0K Apr 28 07:02 .
drwxr-xr-x 5 133332 133332 4.0K Apr 28 09:05 ..
drwxr-xr-x 9 133332 133332 4.0K Apr 28 09:05 .git

But we still see the error:

"cannot initialize workspace:
    github.com/gitpod-io/gitpod/content-service/pkg/initializer.InitializeWorkspace
        github.com/gitpod-io/gitpod/[email protected]/pkg/initializer/initializer.go:426
  - prebuild initializer:
    github.com/gitpod-io/gitpod/content-service/pkg/initializer.runGitInit
        github.com/gitpod-io/gitpod/[email protected]/pkg/initializer/prebuild.go:131
  - git fetch -p -P --tags -f failed (exit status 1): From https://github.com/princerachit/pub
   bb4a4d3..6ed10c5  main       -> origin/main
 * [new branch]      test1      -> origin/test1
 * [new branch]      test2      -> origin/test2
 * [new branch]      test3      -> origin/test3
Fetching submodule hidden
No user exists for uid 133332
fatal: Could not read from remote repository.

Please make sure you have the correct access rights
and the repository exists.
No user exists for uid 133332
fatal: Could not read from remote repository.

Please make sure you have the correct access rights
and the repository exists.
Errors during submodule fetch:
	hidden
	hidden
"

@princerachit
Copy link
Contributor Author

princerachit commented May 3, 2022

I am moving this issue into blocked state. See internal slack thread.

Gist: The snapshot contains the hidden submodule. When we run the git fetch -p -P --tags command, git finds that there was an update in the hidden submodule's sha reference. It then tries to update the submodule. However, the ssh-key required to fetch this submodule does not exists in the content-init container. Therefore, this fails.

This will likely get resolved as we migrate to PVC based snapshots, where the git commands will be run in workspace context.

@sagor999
Copy link
Contributor

sagor999 commented Jun 6, 2022

This seems to happen even when git submodule references public repo.
Slack thread: https://gitpod.slack.com/archives/C032GEUGV09/p1654277137835379

@sagor999
Copy link
Contributor

sagor999 commented Jun 6, 2022

@csweichel
what if before we run git fetch we would do chown with 33333 uid\gid on all files to ensure there is no 133332 user referenced anywhere:

@sagor999
Copy link
Contributor

sagor999 commented Jun 6, 2022

And just to confirm, this happens on prebuilds that have incremental build disabled.

@csweichel
Copy link
Contributor

@csweichel what if before we run git fetch we would do chown with 33333 uid\gid on all files to ensure there is no 133332 user referenced anywhere:

That is one way indeed.

We could also add a special mode to the snapshot initialiser whereby we don't do the UID mapping. This way we'd safe a (potentially rather expensive) chown.

@csweichel
Copy link
Contributor

And just to confirm, this happens on prebuilds that have incremental build disabled.

That's odd and warrants some more investigation. Where do those UIDs come from if not from a snapshot restoration?

@sagor999 sagor999 moved this from Blocked to Scheduled in 🌌 Workspace Team Jun 8, 2022
@sagor999
Copy link
Contributor

sagor999 commented Jun 8, 2022

I am moving this issue into scheduled state from blocked, since it does affect non incremental prebuilds as well and we need to investigate why.

@sagor999
Copy link
Contributor

I suspect we might be using incremental prebuild when we are not supposed to?
@gitpod-io/engineering-webapp could you guys double check logic here:

!(await this.isGoodBaseforIncrementalPrebuild(

I cannot find anything that would disable incremental prebuild based on project config setting, but maybe that is done somewhere else?
Here is the affected project:
https://gitpod.io/admin/projects/d2168919-f8ac-4c4c-8e6d-4fe9ac049911
It seems like its prebuilds are running based on previous prebuild, even though incremental prebuilds are disabled for that project.

@atduarte atduarte changed the title Prebuilds fail intermittently for users who enabled incremental prebuild Prebuilds fail intermittently for users Jun 28, 2022
@atduarte
Copy link
Contributor

since it does affect non incremental prebuilds as well and we need to investigate why.

Changed title given @sagor999's comment: "it does affect non incremental prebuilds as well and we need to investigate why"

@kylos101
Copy link
Contributor

Added to webapp inbox and pinging @geropl because of @sagor999 's comment here.

Thanks, @sagor999 , for the find! 🙏

@kylos101 kylos101 removed the status in 🌌 Workspace Team Jun 28, 2022
@geropl
Copy link
Member

geropl commented Jun 29, 2022

Thx @sagor999 , will schedule for someone to investigate. 👍

@geropl geropl moved this to Scheduled in 🍎 WebApp Team Jun 29, 2022
@atduarte atduarte added the type: bug Something isn't working label Jul 25, 2022
@kylos101
Copy link
Contributor

Removing from Workspace project for now @geropl @sagor999 .

@geropl geropl removed the status in 🍎 WebApp Team Sep 19, 2022
@kylos101 kylos101 added team: webapp Issue belongs to the WebApp team and removed team: workspace Issue belongs to the Workspace team labels Sep 23, 2022
@jldec jldec moved this to Scheduled in 🍎 WebApp Team Sep 23, 2022
@jldec
Copy link
Contributor

jldec commented Sep 23, 2022

This has come up with matching error message in this customer context here (internal).

@jldec jldec changed the title Prebuilds fail intermittently for users Prebuilds fail with error "cannot initialize workspace: prebuild initializer: git fetch -p -P . tags -f failed" Sep 23, 2022
@jldec jldec removed the status in 🍎 WebApp Team Sep 23, 2022
@AlexTugarev
Copy link
Member

@sagor999, on #9280 (comment)

If incremental prebuilds are disabled in project settings, that code path wont be taken because the context.commitHistory isn't populated, see

if (context.commitHistory && context.commitHistory.length > 0) {

and the call site:

if (this.shouldPrebuildIncrementally(context.repository.cloneUrl, project)) {
const maxDepth = this.config.incrementalPrebuilds.commitHistory;
const hostContext = this.hostContextProvider.get(context.repository.host);
const repoProvider = hostContext?.services?.repositoryProvider;
if (repoProvider) {
prebuildContext.commitHistory = await repoProvider.getCommitHistory(user, context.repository.owner, context.repository.name, context.revision, maxDepth);
if (context.additionalRepositoryCheckoutInfo && context.additionalRepositoryCheckoutInfo.length > 0) {
const histories = context.additionalRepositoryCheckoutInfo.map(async info => {
const commitHistory = await repoProvider.getCommitHistory(user, info.repository.owner, info.repository.name, info.revision, maxDepth);
return {
cloneUrl: info.repository.cloneUrl,
commitHistory
}
});
prebuildContext.additionalRepositoryCommitHistories = await Promise.all(histories);
}
}
}
const projectEnvVarsPromise = project ? this.projectService.getProjectEnvironmentVariables(project.id) : [];
const workspace = await this.workspaceFactory.createForContext({span}, user, prebuildContext, context.normalizedContextURL!);

@jpfeuffer
Copy link

jpfeuffer commented Oct 14, 2022

Hi! What are workarounds for this problem? This is a huge blocker. We are not able to spawn any workspaces anymore at https://github.com/OpenMS/OpenMS

Apparently it? the prebuild? cached some state of a submodule with a commit that does exist anymore. Or something like that.
I cannot find Incremental Prebuild as an option anywhere (actually I cannot even find the "project settings").
Edit: I had to create a Team first to create a Project. This is a separate doc issue by the way. But even after registering a Team and the repo as project and making sure that gitpod has access to all repos (including the ones that are submodules), the error still persists. Incremental prebuilds seem to be deactivated by default anyway.

I don't get it. Prebuilds run fine! But when opening the workspace it always crashes.

@axonasif
Copy link
Member

Possible (new) duplicate:

@kylos101
Copy link
Contributor

Thank you for sharing, @axonasif!! This is interesting. cc: @aledbf @csweichel

@aledbf
Copy link
Member

aledbf commented Oct 21, 2022

Closing. The fix is deployed now

@aledbf aledbf closed this as completed Oct 21, 2022
@aledbf aledbf moved this to Awaiting Deployment in 🌌 Workspace Team Oct 21, 2022
@kylos101
Copy link
Contributor

👋 exactly, thanks @aledbf ! 🚀 Reference the fix in #13956, it is deployed to the saas in us72 and eu72. Also, it'll be available in self-hosted as part of the October release.

@kylos101 kylos101 moved this from Awaiting Deployment to Done in 🌌 Workspace Team Oct 21, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
❓ clarification required team: webapp Issue belongs to the WebApp team type: bug Something isn't working
Projects
No open projects
Status: Done
Development

Successfully merging a pull request may close this issue.