Skip to content

x/build/cmd/coordinator: watcher's git mirroring is slow/flaky #16388

Closed
@bradfitz

Description

@bradfitz

The "net" subrepo has stopped mirroring to github.

Note: https://github.com/golang/net/commits/master (stuck at Jul 7, 2016, f841c3)
Versus: https://go.googlesource.com/net (3 new commits since then, with e90d6d0 currently at top)

The watcher says:

http://farmer.golang.org/debug/watcher

 ! [remote rejected] ---40hexomitted--- -> ---40hexomitted--- (pre-receive hook declined)
error: failed to push some refs to 'https://gopherbot:---40hexomitted---@github.com/golang/net'
remote: error: GH002: Sorry, branch or tag names consisting of 40 hex characters are not allowed.        
remote: error: Invalid branch or tag name "---40hexomitted---"        

So Github doesn't accept some ref that Gerrit has?

/cc @spearce @adg @broady @quentinmit

Related: #11811

Activity

added
Buildersx/build issues (builders, bots, dashboards)
on Jul 15, 2016
added this to the Unreleased milestone on Jul 15, 2016
bradfitz

bradfitz commented on Jul 15, 2016

@bradfitz
ContributorAuthor

In Gerrit for the "net" repo, I see only:

branches

... and no tags.

bradfitz

bradfitz commented on Jul 15, 2016

@bradfitz
ContributorAuthor

But on the watcher's git cache, I see:

# cat /var/cache/watcher-git/net/refs/heads/f841c39de738b1d0df95b5a7187744f0e03d8112 
62685c2d7ca23c807425dca88b11a3e2323dab41

I don't know what created that head.

It seems to be the only screwed up one:

# find /var/cache/watcher-git -type f -path '*refs/heads*'
/var/cache/watcher-git/image/refs/heads/master
/var/cache/watcher-git/oauth2/refs/heads/master
/var/cache/watcher-git/gddo/refs/heads/master
/var/cache/watcher-git/exp/refs/heads/master
/var/cache/watcher-git/sync/refs/heads/master
/var/cache/watcher-git/sys/refs/heads/master
/var/cache/watcher-git/mobile/refs/heads/master
/var/cache/watcher-git/build/refs/heads/master
/var/cache/watcher-git/crypto/refs/heads/master
/var/cache/watcher-git/go/refs/heads/dev.ssa
/var/cache/watcher-git/go/refs/heads/master
/var/cache/watcher-git/go/refs/heads/release-branch.go1.7
/var/cache/watcher-git/tools/refs/heads/master
/var/cache/watcher-git/tools/refs/heads/release-branch.go1.7
/var/cache/watcher-git/talks/refs/heads/master
/var/cache/watcher-git/gofrontend/refs/heads/master
/var/cache/watcher-git/term/refs/heads/master
/var/cache/watcher-git/review/refs/heads/master
/var/cache/watcher-git/arch/refs/heads/master
/var/cache/watcher-git/proposal/refs/heads/master
/var/cache/watcher-git/net/refs/heads/master
/var/cache/watcher-git/net/refs/heads/f841c39de738b1d0df95b5a7187744f0e03d8112

Did somebody accidentally create that recently?

quentinmit

quentinmit commented on Jul 15, 2016

@quentinmit
Contributor

@aclements, @cherrymui and I have been looking at it for the past hour. We removed that branch about an hour ago; no idea when it was created but it pointed at a commit from you in February.

self-assigned this
on Jul 15, 2016
bradfitz

bradfitz commented on Jul 15, 2016

@bradfitz
ContributorAuthor

I've nuked the watcher's "net" git cache and restarted the watcher, which re-cloned it, and github & the build dashboard are happy again.

quentinmit

quentinmit commented on Jul 15, 2016

@quentinmit
Contributor

The build dashboard certainly doesn't look happy to me; the latest commit on master is still from July 8th.

The git push of the go repo is taking for-freaking-ever, but I hope when it finally finishes that the watcher will notice the commits since July 8th.

bradfitz

bradfitz commented on Jul 15, 2016

@bradfitz
ContributorAuthor

Going forward, we need monitoring on the watcher (that is #15760)

This bug can be about making sure the watcher's git cache stays in sync better, deleting refs when they're deleted from upstream gerrit.

bradfitz

bradfitz commented on Jul 15, 2016

@bradfitz
ContributorAuthor

The build dashboard certainly doesn't look happy to me; the latest commit on master is still from July 8th.

It's still building, but it noticed new stuff. The red "fail" went away and was replaced by blank spots it knows it needs to build.

quentinmit

quentinmit commented on Jul 15, 2016

@quentinmit
Contributor

It's still building, but it noticed new stuff. The red "fail" went away and was replaced by blank spots it knows it needs to build.

That's true for the subrepos, but the go repo at the top does not show any blank spots.

builders

bradfitz

bradfitz commented on Jul 15, 2016

@bradfitz
ContributorAuthor

Ah, I was only debugging the "net" repo. I hadn't noticed the main repo.

aclements

aclements commented on Jul 15, 2016

@aclements
Member

The git push of the go repo is taking for-freaking-ever, but I hope when it finally finishes that the watcher will notice the commits since July 8th.

I think I may know what's going on here. If you git clone --mirror https://go.googlesource.com/go (which I believe is what the watcher does), you'll get 40,370 refs, including every version of every CL on Gerrit. I suspect the send-pack run by push --mirror is taking forever because it's processing all of these refs.

aclements

aclements commented on Jul 15, 2016

@aclements
Member

Also, it appears these have all been pushed up to GitHub. They don't show up in the UI (presumably because they're not in refs/heads), but if you git clone --mirror https://github.com/golang/go, you'll see all 40k refs, as well.

bradfitz

bradfitz commented on Jul 15, 2016

@bradfitz
ContributorAuthor

@aclements that's what I suspected at first too, but even though we do run git clone --mirror, I only see 330 refs on the git cache:

# find /var/cache/watcher-git/go/refs -type f | head -10
/var/cache/watcher-git/go/refs/heads/dev.ssa
/var/cache/watcher-git/go/refs/heads/master
/var/cache/watcher-git/go/refs/heads/release-branch.go1.7
/var/cache/watcher-git/go/refs/tags/go1.7rc1
/var/cache/watcher-git/go/refs/notes/review
/var/cache/watcher-git/go/refs/cache-automerge/b4/a7806724b78f7c26326f6acee0dcca0110b7a8
/var/cache/watcher-git/go/refs/cache-automerge/ef/efd11725885c850be617389faa4d9f56137641
/var/cache/watcher-git/go/refs/changes/85/22685/meta
/var/cache/watcher-git/go/refs/changes/05/24705/meta
/var/cache/watcher-git/go/refs/changes/05/24705/1

# find /var/cache/watcher-git/go/refs -type f | wc -l
330
# find /var/cache/watcher-git/go/refs -type f | tail -20
/var/cache/watcher-git/go/refs/changes/15/24915/meta
/var/cache/watcher-git/go/refs/changes/15/24915/1
/var/cache/watcher-git/go/refs/changes/15/24915/2
/var/cache/watcher-git/go/refs/changes/15/24715/meta
/var/cache/watcher-git/go/refs/changes/15/24715/1
/var/cache/watcher-git/go/refs/changes/15/24715/2
/var/cache/watcher-git/go/refs/changes/15/24815/meta
/var/cache/watcher-git/go/refs/changes/15/24815/1
/var/cache/watcher-git/go/refs/changes/15/24815/2
/var/cache/watcher-git/go/refs/changes/35/18835/meta
/var/cache/watcher-git/go/refs/changes/61/24861/meta
/var/cache/watcher-git/go/refs/changes/61/24861/1
/var/cache/watcher-git/go/refs/changes/61/24861/2
/var/cache/watcher-git/go/refs/changes/81/24781/meta
/var/cache/watcher-git/go/refs/changes/81/24781/1
/var/cache/watcher-git/go/refs/changes/81/24781/2
/var/cache/watcher-git/go/refs/changes/36/24836/meta
/var/cache/watcher-git/go/refs/changes/36/24836/1
/var/cache/watcher-git/go/refs/changes/56/24856/meta
/var/cache/watcher-git/go/refs/changes/56/24856/1

farmer go # pwd
/var/cache/watcher-git/go

farmer go # du -h | tail -1
306M    .

farmer go # ls
FETCH_HEAD  HEAD  branches  config  description  hooks  info  objects  packed-refs  refs

Ooooh! That's where they're hiding. In packed-refs:

# wc -l packed-refs 
40070 packed-refs

Okay, mystery solved.

We should probably pusher smarter, and prioritize non-change refs, only pushing code review refs when we're otherwise idle.

7 remaining items

bradfitz

bradfitz commented on Jul 20, 2016

@bradfitz
ContributorAuthor

Ping @quentinmit. This appears to be slow again today and I need to decide how to fix it. You say modern git is still slow with many refs, but my tests seemed fine. What did you run?

In any case, I'll start with adding more visibility into what it's doing.

quentinmit

quentinmit commented on Jul 20, 2016

@quentinmit
Contributor

@bradfitz Here is my repro with git 2.8.0.rc3.226.g39d4020:

$ git clone https://go-review.googlesource.com/go
$ time git push --mirror -n -v http://github.com/golang/go
Pushing to http://github.com/golang/go
Username for 'https://github.com': quentinmit
Password for 'https://quentinmit@github.com':
^C

real    1m37.156s
user    0m0.013s
sys     0m0.013s  

(Obviously I killed it after 1.5 minutes; I don't know how long it would have taken.)

Note that I did not clone with --mirror, so the repo does have slightly different refs. Perhaps that's the difference.

bradfitz

bradfitz commented on Jul 20, 2016

@bradfitz
ContributorAuthor

I assume there was a "cd go" in between those lines?

In any case, such a git push is 2 seconds for me, with git 1.9.1, when my current directory is a bare git checkout (from git clone --mirror).

quentinmit

quentinmit commented on Jul 20, 2016

@quentinmit
Contributor

Yes, there was, I forgot to copy+paste it.

In fact, from a clone --mirror it takes almost no time, so that explains the difference. I think our time would be best spent upgrading git in go-watcher-world.

bradfitz

bradfitz commented on Jul 20, 2016

@bradfitz
ContributorAuthor

Good to hear.

I'll update go-watcher-world after I deploy a new version with more status page visibility into what's happening. I'd like to see the problem more clearly before fixing it.

bradfitz

bradfitz commented on Jul 20, 2016

@bradfitz
ContributorAuthor

I updated go-watcher-world to sid, which has git 2.8.0, and I still see it sucking with the new status pages:

watcher status for repo: "go"

2016-07-20T18:58:01Z   8m51s ago            running git push -f --mirror dest
2016-07-20T18:58:01Z   8m51s ago            added dest remote
2016-07-20T18:58:01Z   8m51s ago            adding dest remote
2016-07-20T18:57:35Z   9m16s ago            reusing git dir; running git fetch

But running my own watcher-world container on the coordinator with the /var/cache/watcher-git bind-mounted, I timed various operations:

root@77bd5d9082b9://var/cache/watcher-git/go# time (git show-ref > /dev/null)

real    0m0.118s
user    0m0.087s
sys     0m0.022s
root@77bd5d9082b9://var/cache/watcher-git/go# time (git ls-remote origin > /dev/null)

real    0m0.966s
user    0m0.205s
sys     0m0.054s

root@77bd5d9082b9://var/cache/watcher-git/go# time git fetch --dry-run origin
real    0m4.366s
user    0m3.339s
sys     0m0.439s

There's no reason it should be slow.

I'm inclined to do the syncing myself as in https://gist.github.com/bradfitz/b64ca2917fbc7a447141c580ec820da5

Which also has the advantage of prioritization and the ability to get more visibility into its state.

changed the title [-]build: mirroring of net repo is broken[/-] [+]x/build/cmd/coordinator: watcher's git mirroring is slow/flaky[/+] on Jul 20, 2016
bradfitz

bradfitz commented on Jul 21, 2016

@bradfitz
ContributorAuthor

After debugging with @aclements, I switched to using the ssh transport, and things now look good and very fast:

watcher status for repo: "go"
2016-07-21T00:27:38Z   1m24s ago            waiting
2016-07-21T00:27:38Z   1m24s ago            updated dashboard
2016-07-21T00:27:38Z   1m24s ago            found 13 remotes
2016-07-21T00:27:38Z   1m24s ago            found 13 remotes
2016-07-21T00:27:38Z   1m24s ago            updating dashboard
2016-07-21T00:27:38Z   1m24s ago            did sync to github
2016-07-21T00:27:38Z   1m24s ago            sync complete
2016-07-21T00:27:34Z   1m28s ago            2 refs to push; pushing batch
2016-07-21T00:27:34Z   1m28s ago            sync: got 40680 remote refs
2016-07-21T00:27:33Z   1m29s ago            sync: fetching remote refs
2016-07-21T00:27:33Z   1m29s ago            sync: got 40568 local refs
2016-07-21T00:27:33Z   1m29s ago            sync: fetching local refs
2016-07-21T00:27:33Z   1m29s ago            syncing to github
2016-07-21T00:27:33Z   1m29s ago            ran git fetch
2016-07-21T00:27:28Z   1m34s ago            running git fetch origin
2016-07-21T00:27:28Z   1m34s ago            poll timer fired
2016-07-21T00:22:28Z   6m34s ago            waiting
2016-07-21T00:22:28Z   6m34s ago            updated dashboard
2016-07-21T00:22:28Z   6m34s ago            found 13 remotes
2016-07-21T00:22:27Z   6m34s ago            found 13 remotes
2016-07-21T00:22:27Z   6m34s ago            updating dashboard
2016-07-21T00:22:27Z   6m34s ago            did sync to github
2016-07-21T00:22:27Z   6m34s ago            sync complete
2016-07-21T00:22:23Z   6m38s ago            1 refs to push; pushing batch
2016-07-21T00:22:23Z   6m38s ago            sync: got 40680 remote refs
2016-07-21T00:22:22Z   6m39s ago            sync: fetching remote refs
2016-07-21T00:22:22Z   6m39s ago            sync: got 40567 local refs
2016-07-21T00:22:22Z   6m39s ago            sync: fetching local refs
2016-07-21T00:22:22Z   6m39s ago            syncing to github
2016-07-21T00:22:22Z   6m39s ago            ran git fetch
2016-07-21T00:22:18Z   6m43s ago            running git fetch origin
2016-07-21T00:22:18Z   6m43s ago            poll timer fired
2016-07-21T00:17:18Z   11m43s ago           waiting
2016-07-21T00:17:18Z   11m43s ago           updated dashboard
2016-07-21T00:17:18Z   11m43s ago           found 13 remotes
2016-07-21T00:17:18Z   11m44s ago           found 13 remotes
2016-07-21T00:17:18Z   11m44s ago           updating dashboard
2016-07-21T00:17:18Z   11m44s ago           did sync to github
2016-07-21T00:17:18Z   11m44s ago           nothing to sync
2016-07-21T00:17:18Z   11m44s ago           sync: got 40680 remote refs
2016-07-21T00:17:17Z   11m45s ago           sync: fetching remote refs
2016-07-21T00:17:17Z   11m45s ago           sync: got 40567 local refs
2016-07-21T00:17:17Z   11m45s ago           sync: fetching local refs
2016-07-21T00:17:17Z   11m45s ago           syncing to github
2016-07-21T00:17:17Z   11m45s ago           ran git fetch
2016-07-21T00:17:13Z   11m48s ago           running git fetch origin
2016-07-21T00:17:13Z   11m48s ago           poll timer fired
2016-07-21T00:12:13Z   16m48s ago           waiting
2016-07-21T00:12:13Z   16m48s ago           updated dashboard

It seems the git-transport-https is just buggy or inefficient with 40k+ refs.

gopherbot

gopherbot commented on Jul 21, 2016

@gopherbot
Contributor

CL https://golang.org/cl/25110 mentions this issue.

locked and limited conversation to collaborators on Jul 21, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

      Development

      No branches or pull requests

        Participants

        @bradfitz@josharian@quentinmit@aclements@gopherbot

        Issue actions

          x/build/cmd/coordinator: watcher's git mirroring is slow/flaky · Issue #16388 · golang/go