Closed
Description
From recent windows-amd64-2003kardianos builder log:
--- FAIL: TestTraceStress (8.30s)
trace_test.go:226: failed to parse trace: p 1 is not running g 0 while unpark (offset 3449241, time 930298786)
FAIL
FAIL runtime/pprof 22.016s
@dvyukov how would I debug that? I cannot reproduce the failure here.
Thank you.
Alex
Metadata
Metadata
Assignees
Type
Projects
Relationships
Development
No branches or pull requests
Activity
bradfitz commentedon Jun 22, 2015
/cc @aclements for triage
dvyukov commentedon Jun 22, 2015
@alexbrainman how frequently does it happen? Please provide links to builder logs. Can't find them.
Do we have access to the builder?
I have only one idea that is that traceTickDiv value is too large. I will send a change, but it is a blind change. We need to assess how it affects failure frequency.
alexbrainman commentedon Jun 22, 2015
Dmitry, the failure is on windows-amd64-2003kardianos builder (see builder in the middle of 3 windows builders). Recently it fails for nearly every build - failures are slightly different, but all about trace parsing. The builder belongs to @kardianos - perhaps he can help you with testing.
The test never fails on my PCs here.
Thank you for looking into it.
Alex
dvyukov commentedon Jun 22, 2015
What is that builder? Don't see it here:
https://code.google.com/p/go-wiki/wiki/DashboardBuilders
Is it AMD processor? Probably the processor of the OS is broken. Need to run the test program on the machine:
#9729 (comment)
It tests whether RDTSC behave reasonably on the machine.
alexbrainman commentedon Jun 22, 2015
I really don't know. All I know that it is one of the oldest ones we have OS-wise. You won't be able to run try-bot on it either. It is running old builder program. Lets wait for @kardianos - I am sure he will help you to test whatever you like there.
Alex
kardianos commentedon Jun 22, 2015
Running it mainly to test the runtime and cgo interface for older versions
of windows that we still claim to support. It runs on VirtualBox right now,
the host machine is a 6-core AMD that is close to four or five years old
now.
We could disable this test in some manner, or I could send you the product
code and iso (and purge it on my end), or come up with some other work
around.
Let me know how I can best help,
-Daniel
On Mon, Jun 22, 2015 at 4:00 AM Alex Brainman notifications@github.com
wrote:
dvyukov commentedon Jun 22, 2015
@kardianos thanks, I see
Ideally, we disable the test on this particular builder. @bradfitz , do we have such means now?
aclements commentedon Jun 22, 2015
FWIW, this happened a few times back in April, but only start happening reliably a few days ago:
2015-04-21T20:50:23-e589e08/windows-amd64-2003kardianos
2015-04-22T02:50:48-87054c4/windows-amd64-2003kardianos
2015-04-22T10:35:44-5fa2d99/windows-amd64-2003kardianos
2015-06-18T22:16:16-0c247bf/windows-amd64-2003kardianos
2015-06-18T22:17:11-ccec934/windows-amd64-2003kardianos
2015-06-18T22:39:09-682ecea/windows-amd64-2003kardianos
2015-06-18T22:44:26-82020f8/windows-amd64-2003kardianos
2015-06-19T00:53:56-18d9a8d/windows-amd64-2003kardianos
2015-06-19T01:47:11-9d968cb/windows-amd64-2003kardianos
2015-06-19T06:14:38-75ce330/windows-amd64-2003kardianos
2015-06-19T19:05:01-183cc0c/windows-amd64-2003kardianos
2015-06-19T20:05:31-dc89350/windows-amd64-2003kardianos
2015-06-19T20:28:01-cc6554f/windows-amd64-2003kardianos
2015-06-20T00:52:38-79d4d6e/windows-amd64-2003kardianos
2015-06-20T10:35:38-13c44d2/windows-amd64-2003kardianos
2015-06-21T03:11:01-3cab476/windows-amd64-2003kardianos
2015-06-22T02:48:27-626188d/windows-amd64-2003kardianos
I see e72f5f6 (runtime: fix tracing of syscallexit) on the 18th. Could that have made us more susceptible to this failure, or did it just make us able to detect it?
kardianos commentedon Jun 22, 2015
@dvyukov Last time we approached this we left it at this:
https://go-review.googlesource.com/#/c/8736/
I'll do some testing later to see if I can detect if time goes backwards reliably. Let me know if you don't think that is worthwhile.
dvyukov commentedon Jun 22, 2015
What is your plan? Do you want to detect this in test, or in runtime?
kardianos commentedon Jun 22, 2015
It looked like Brad wanted to try to put it in the runtime, though I'm not sure about that. If it can be detected cheaply then maybe the runtime too. I'll try to quantize the cost of detection and get back.
bradfitz commentedon Jun 23, 2015
We don't have a way to disable tests per builder yet, nor is there anything in the environment to key off of. I filed #11346.
kardianos commentedon Jun 23, 2015
I tried to detect "time goes backwards" with something like this:
Output:
But I was unable to detect time going backwards, regardless of how I got "time".
Is there a better way to do this? I'm sure I'm missing something.
27 remaining items
dvyukov commentedon Jun 29, 2015
How that was proved?
dvyukov commentedon Jun 29, 2015
Run the following program when host is loaded:
kardianos commentedon Jun 29, 2015
I've had this running for a half hour and nothing printed or exited. I'll
keep it running for a while longer, but I don't think that is the cause.
On Mon, Jun 29, 2015 at 9:40 AM Dmitry Vyukov notifications@github.com
wrote:
dvyukov commentedon Jun 30, 2015
@kardianos @alexbrainman OK, please dump a bad trace to file and attach it here. I will take a look as to what exactly is wrong there.
dvyukov commentedon Jul 1, 2015
From the trace that @kardianos sent me offline:
The last two evens happen at the same time.
Please check if https://go-review.googlesource.com/#/c/11834/ fixes tests.
gopherbot commentedon Jul 1, 2015
CL https://golang.org/cl/11834 mentions this issue.
internal/trace: stable sort events
kardianos commentedon Jul 2, 2015
CL 11834 fixes this issue on my VM.
alexbrainman commentedon Jul 2, 2015
CL 11834 fixes this issue on my pc too. Thank you @dvyukov and @kardianos for seeing it till the end.
Alex
[-]runtime/debug: TestTraceStress fails while unpark[/-][+]runtime/trace: TestTraceStress fails while unpark[/+]