Skip to content

misc/cgo/testcshared: sometimes stalls on windows-amd64-longtest builder in non-sharded longtest mode #39665

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
dmitshur opened this issue Jun 17, 2020 · 11 comments
Labels
FrozenDueToAge NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Milestone

Comments

@dmitshur
Copy link
Member

dmitshur commented Jun 17, 2020

I've observed the following failure scenario while testing all.bash in non-sharded longtest mode via x/build/cmd/release. The build stalled on the ##### ../misc/cgo/testcshared test and did not make progress for several hours:

[...]
PASS
scatter = 0000000000EDD070
sqrt is: 0
hello from C
ok  	misc/cgo/test	7.344s

##### ../misc/cgo/testgodefs
PASS

##### ../misc/cgo/testso
ok  	misc/cgo/testso	1.977s

##### ../misc/cgo/testsovar
ok  	misc/cgo/testsovar	2.309s

##### ../misc/cgo/testcarchive
PASS

##### ../misc/cgo/testcshared

I've observed this just twice, on the latest commit of release-branch.go1.14, and on a recent commit of master (to be 1.15), using the windows-amd64-longtest builder:

$ release -target=windows-amd64-longtest -watch -version go1.14beta2 -rev=e98cafae04b78f1e994d52ea66d228451c8e6f81
$ release -target=windows-amd64-longtest -watch -version go1.15beta2 -rev=dea6d928f6c293631ce93bd3a3bb8b4020188954

It didn't happen a second time, after I re-tried. I don't yet know how common of an occurrence this is.

This is the tracking issue to collect information and investigate.

/cc @cagedmantis @toothrot @andybons @ianlancetaylor @bcmills

@dmitshur dmitshur added the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label Jun 17, 2020
@dmitshur dmitshur added this to the Backlog milestone Jun 17, 2020
@dmitshur dmitshur changed the title misc/cgo/testcshared: sometimes stalls on windows-amd64-longtest builder in release testing mode misc/cgo/testcshared: sometimes stalls on windows-amd64-longtest builder in non-sharded longtest mode Jun 17, 2020
@dmitshur
Copy link
Member Author

dmitshur commented Jun 24, 2020

There may have been an occurrence of this exact problem or a similar problem in the SlowBot run of CL 239738 on the windows-amd64-2016 builder just now:

[...]
scatter = 0000000000569860
sqrt is: 0
hello from C
ok  	misc/cgo/test	16.696s

##### ../misc/cgo/testgodefs
PASS

##### ../misc/cgo/testso
ok  	misc/cgo/testso	1.981s

##### ../misc/cgo/testsovar
ok  	misc/cgo/testsovar	2.256s

##### ../misc/cgo/testcarchive
PASS


Error: runTests: dist test failed: all buildlets had network errors or timeouts, yet tests remain

@bcmills
Copy link
Contributor

bcmills commented Nov 18, 2020

@bcmills
Copy link
Contributor

bcmills commented Nov 18, 2020

Depending on how often this occurs, it might be a blocker for #42661. 😞

@bcmills
Copy link
Contributor

bcmills commented Nov 18, 2020

Hmm, I wonder if this is related to #39349

@bcmills
Copy link
Contributor

bcmills commented Jan 25, 2021

From CL 285720
(https://farmer.golang.org/temporarylogs?name=windows-amd64-2016&rev=26b0deca44e0cbf06661927dcd8b8546c5903aaf&st=0xc011c041a0):

  2021-01-25T14:15:33Z no_new_tests_remain 10.240.0.109:80
  2021-01-25T14:15:33Z closed_helper 10.240.0.109:80
  2021-01-25T14:16:03Z still_waiting_on_test testcshared
  2021-01-25T14:16:33Z still_waiting_on_test testcshared
  2021-01-25T14:17:03Z still_waiting_on_test testcshared
  2021-01-25T14:17:33Z still_waiting_on_test testcshared
  2021-01-25T14:18:03Z still_waiting_on_test testcshared
  2021-01-25T14:18:33Z still_waiting_on_test testcshared
  2021-01-25T14:19:03Z still_waiting_on_test testcshared
  2021-01-25T14:19:33Z still_waiting_on_test testcshared
  2021-01-25T14:20:03Z still_waiting_on_test testcshared
  2021-01-25T14:20:33Z still_waiting_on_test testcshared
  2021-01-25T14:21:03Z still_waiting_on_test testcshared
  2021-01-25T14:21:33Z still_waiting_on_test testcshared
  2021-01-25T14:22:03Z still_waiting_on_test testcshared
  2021-01-25T14:22:33Z still_waiting_on_test testcshared
  2021-01-25T14:23:03Z still_waiting_on_test testcshared
  2021-01-25T14:23:33Z still_waiting_on_test testcshared
  2021-01-25T14:24:03Z still_waiting_on_test testcshared
  2021-01-25T14:24:33Z still_waiting_on_test testcshared
  2021-01-25T14:25:03Z still_waiting_on_test testcshared
  2021-01-25T14:25:33Z still_waiting_on_test testcshared
  2021-01-25T14:26:03Z still_waiting_on_test testcshared
  2021-01-25T14:26:33Z still_waiting_on_test testcshared
  2021-01-25T14:27:03Z still_waiting_on_test testcshared
  2021-01-25T14:27:33Z still_waiting_on_test testcshared
  2021-01-25T14:28:03Z still_waiting_on_test testcshared
  2021-01-25T14:28:33Z still_waiting_on_test testcshared
  2021-01-25T14:29:03Z still_waiting_on_test testcshared
  2021-01-25T14:29:33Z still_waiting_on_test testcshared
  2021-01-25T14:30:03Z still_waiting_on_test testcshared
  2021-01-25T14:30:33Z still_waiting_on_test testcshared
  2021-01-25T14:31:03Z still_waiting_on_test testcshared
  2021-01-25T14:31:33Z still_waiting_on_test testcshared
  +12.0s (now)

@cuonglm
Copy link
Member

cuonglm commented Oct 26, 2021

@aclements
Copy link
Member

I have a theory about this. For most tests, the dist tool lets go test compile and run the test, but for testcshared, it separately compiles and runs the test binary. go test implements a backstop timeout in case the test binary wedges too hard to timeout itself, but dist does not implement this logic for test binaries it runs directly. Hence, if testcshared wedges, the builder itself will eventually timeout with no further output.

@bcmills
Copy link
Contributor

bcmills commented Jan 11, 2022

Still just windows-amd64-2008; curiously no new failures since October. 🤔

That suggests a possible connection to CL 365994 / #49457 (CC @ianlancetaylor, @bufflig, @cherrymui) — if TestGo2C2Go was the one timing out, then the timeouts would have stopped due to skipping that test.

greplogs --dashboard -md -l -e '(?m)##### \.\./misc/cgo/testcshared.*\n\z' --since=2021-08-28

2021-10-26T14:24:17-283d8a3/windows-amd64-2008
2021-10-14T07:18:59-1349c6e/windows-amd64-2008
2021-09-27T18:14:10-ecac351/windows-amd64-2008
2021-09-20T22:14:47-6e81f78/windows-amd64-2008
2021-09-20T16:20:33-2d9b486/windows-amd64-2008
2021-09-08T16:19:36-409434d/windows-amd64-2008
2021-08-31T23:45:48-2d98a4b/windows-amd64-2008
2021-08-30T22:07:53-3342aa5/windows-amd64-2008

@cherrymui
Copy link
Member

It is possible that this is related to TestGo2C2Go. When I tried to reproduce #49457 I saw it sometimes hangs (#49457 (comment)). And the underlying issue of #49457 can definitely cause it to hang. As there is no new failures after that CL, I think the connection is very likely.

@bcmills
Copy link
Contributor

bcmills commented Jan 28, 2022

Still no more failures since Oct. 26. I'm calling this fixed by CL 365994.

greplogs --dashboard -md -l -e '(?m)##### \.\./misc/cgo/testcshared.*\n\z' --since=2021-10-26

2021-10-26T14:24:17-283d8a3/windows-amd64-2008

@bcmills bcmills closed this as completed Jan 28, 2022
@golang golang locked and limited conversation to collaborators Jan 28, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
FrozenDueToAge NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Projects
None yet
Development

No branches or pull requests

6 participants