Skip to content

runtime/trace: TestTraceStress fails while unpark #11320

Closed
@alexbrainman

Description

@alexbrainman

From recent windows-amd64-2003kardianos builder log:

--- FAIL: TestTraceStress (8.30s)
    trace_test.go:226: failed to parse trace: p 1 is not running g 0 while unpark (offset 3449241, time 930298786)
FAIL
FAIL    runtime/pprof   22.016s

@dvyukov how would I debug that? I cannot reproduce the failure here.

Thank you.

Alex

Activity

bradfitz

bradfitz commented on Jun 22, 2015

@bradfitz
Contributor

/cc @aclements for triage

added this to the Go1.5Maybe milestone on Jun 22, 2015
dvyukov

dvyukov commented on Jun 22, 2015

@dvyukov
Member

@alexbrainman how frequently does it happen? Please provide links to builder logs. Can't find them.
Do we have access to the builder?
I have only one idea that is that traceTickDiv value is too large. I will send a change, but it is a blind change. We need to assess how it affects failure frequency.

alexbrainman

alexbrainman commented on Jun 22, 2015

@alexbrainman
MemberAuthor

Dmitry, the failure is on windows-amd64-2003kardianos builder (see builder in the middle of 3 windows builders). Recently it fails for nearly every build - failures are slightly different, but all about trace parsing. The builder belongs to @kardianos - perhaps he can help you with testing.

The test never fails on my PCs here.

Thank you for looking into it.

Alex

dvyukov

dvyukov commented on Jun 22, 2015

@dvyukov
Member

What is that builder? Don't see it here:
https://code.google.com/p/go-wiki/wiki/DashboardBuilders
Is it AMD processor? Probably the processor of the OS is broken. Need to run the test program on the machine:
#9729 (comment)
It tests whether RDTSC behave reasonably on the machine.

alexbrainman

alexbrainman commented on Jun 22, 2015

@alexbrainman
MemberAuthor

I really don't know. All I know that it is one of the oldest ones we have OS-wise. You won't be able to run try-bot on it either. It is running old builder program. Lets wait for @kardianos - I am sure he will help you to test whatever you like there.

Alex

kardianos

kardianos commented on Jun 22, 2015

@kardianos
Contributor

Running it mainly to test the runtime and cgo interface for older versions
of windows that we still claim to support. It runs on VirtualBox right now,
the host machine is a 6-core AMD that is close to four or five years old
now.

We could disable this test in some manner, or I could send you the product
code and iso (and purge it on my end), or come up with some other work
around.
Let me know how I can best help,
-Daniel

On Mon, Jun 22, 2015 at 4:00 AM Alex Brainman notifications@github.com
wrote:

I really don't know. All I know that it is one of the oldest ones we have
OS-wise. You won't be able to run try-bot on it either. It is running old
builder program. Lets wait for @kardianos https://github.com/kardianos

  • I am sure he will help you to test whatever you like there.

Alex


Reply to this email directly or view it on GitHub
#11320 (comment).

dvyukov

dvyukov commented on Jun 22, 2015

@dvyukov
Member

@kardianos thanks, I see
Ideally, we disable the test on this particular builder. @bradfitz , do we have such means now?

aclements

aclements commented on Jun 22, 2015

@aclements
Member

FWIW, this happened a few times back in April, but only start happening reliably a few days ago:

2015-04-21T20:50:23-e589e08/windows-amd64-2003kardianos
2015-04-22T02:50:48-87054c4/windows-amd64-2003kardianos
2015-04-22T10:35:44-5fa2d99/windows-amd64-2003kardianos
2015-06-18T22:16:16-0c247bf/windows-amd64-2003kardianos
2015-06-18T22:17:11-ccec934/windows-amd64-2003kardianos
2015-06-18T22:39:09-682ecea/windows-amd64-2003kardianos
2015-06-18T22:44:26-82020f8/windows-amd64-2003kardianos
2015-06-19T00:53:56-18d9a8d/windows-amd64-2003kardianos
2015-06-19T01:47:11-9d968cb/windows-amd64-2003kardianos
2015-06-19T06:14:38-75ce330/windows-amd64-2003kardianos
2015-06-19T19:05:01-183cc0c/windows-amd64-2003kardianos
2015-06-19T20:05:31-dc89350/windows-amd64-2003kardianos
2015-06-19T20:28:01-cc6554f/windows-amd64-2003kardianos
2015-06-20T00:52:38-79d4d6e/windows-amd64-2003kardianos
2015-06-20T10:35:38-13c44d2/windows-amd64-2003kardianos
2015-06-21T03:11:01-3cab476/windows-amd64-2003kardianos
2015-06-22T02:48:27-626188d/windows-amd64-2003kardianos

I see e72f5f6 (runtime: fix tracing of syscallexit) on the 18th. Could that have made us more susceptible to this failure, or did it just make us able to detect it?

kardianos

kardianos commented on Jun 22, 2015

@kardianos
Contributor

@dvyukov Last time we approached this we left it at this:
https://go-review.googlesource.com/#/c/8736/

I'll do some testing later to see if I can detect if time goes backwards reliably. Let me know if you don't think that is worthwhile.

dvyukov

dvyukov commented on Jun 22, 2015

@dvyukov
Member

I'll do some testing later to see if I can detect if time goes backwards reliably. Let me know if you don't think that is worthwhile.

What is your plan? Do you want to detect this in test, or in runtime?

kardianos

kardianos commented on Jun 22, 2015

@kardianos
Contributor

It looked like Brad wanted to try to put it in the runtime, though I'm not sure about that. If it can be detected cheaply then maybe the runtime too. I'll try to quantize the cost of detection and get back.

bradfitz

bradfitz commented on Jun 23, 2015

@bradfitz
Contributor

We don't have a way to disable tests per builder yet, nor is there anything in the environment to key off of. I filed #11346.

kardianos

kardianos commented on Jun 23, 2015

@kardianos
Contributor

I tried to detect "time goes backwards" with something like this:

package main

import (
    "log"
    "runtime"
    "time"
)

const timeoutSecond = 100

func nanotime() int64 {
    // return time.Now().UnixNano()
    // return runtime.Unixnano()
    return runtime.Nanotime() // exported runtime·nanotime
}

func main() {
    runtime.GOMAXPROCS(8)

    log.Print("Start Check")
    max := nanotime()
    start := time.Now()

    timeout := time.After(time.Second * timeoutSecond)
    for {
        select {
        case <-timeout:
            log.Print("timeout, no backwards time detected")
            return
        default:
            next := nanotime()
            if next < max {
                log.Printf("backwards detected: duration %v", time.Now().Sub(start))
                return
            }
            max = next
        }
    }
}

Output:

2015/06/23 05:58:56 Start Check
2015/06/23 06:00:36 timeout, no backwards time detected
2015/06/23 06:00:36 Max Diff: 29296900

But I was unable to detect time going backwards, regardless of how I got "time".

Is there a better way to do this? I'm sure I'm missing something.

27 remaining items

dvyukov

dvyukov commented on Jun 29, 2015

@dvyukov
Member

it never appears to go backwards on our single core

How that was proved?

dvyukov

dvyukov commented on Jun 29, 2015

@dvyukov
Member

Run the following program when host is loaded:

package main

func cputicks() int64

func main() {
        for {
        t0, t1 := cputicks(), cputicks()
        if t1 - t0 <= 0 {
            println(t0, t1, t1 - t0)
            return
        }
        }
}
kardianos

kardianos commented on Jun 29, 2015

@kardianos
Contributor

I've had this running for a half hour and nothing printed or exited. I'll
keep it running for a while longer, but I don't think that is the cause.

On Mon, Jun 29, 2015 at 9:40 AM Dmitry Vyukov notifications@github.com
wrote:

Run the following program when host is loaded:

package main
func cputicks() int64

func main() {
for {
t0, t1 := cputicks(), cputicks()
if t1 - t0 <= 0 {
println(t0, t1, t1 - t0)
return
}
}
}


Reply to this email directly or view it on GitHub
#11320 (comment).

dvyukov

dvyukov commented on Jun 30, 2015

@dvyukov
Member

@kardianos @alexbrainman OK, please dump a bad trace to file and attach it here. I will take a look as to what exactly is wrong there.

dvyukov

dvyukov commented on Jul 1, 2015

@dvyukov
Member

From the trace that @kardianos sent me offline:

179827100 GoBlock p=2 g=11 off=936312
179827123 GoUnblock p=2 g=0 off=936315 g=11
179827146 GoStart p=2 g=11 off=936319 g=11
179827169 GoBlock p=2 g=11 off=936322
179827192 GoUnblock p=2 g=0 off=936325 g=11
179827215 GoStart p=2 g=11 off=936329 g=11
179827238 GoBlock p=2 g=11 off=936332
179827261 GoUnblock p=2 g=0 off=936335 g=11
179827284 GoStart p=2 g=11 off=936339 g=11
179827307 GoBlock p=2 g=11 off=936342
179827330 GoUnblock p=2 g=0 off=936345 g=11
179827353 GoStart p=2 g=11 off=936349 g=11
179827376 GoBlock p=2 g=11 off=936352
179827399 GoStart p=2 g=11 off=936359 g=11 
179827399 GoUnblock p=2 g=0 off=936355 g=11

The last two evens happen at the same time.

Please check if https://go-review.googlesource.com/#/c/11834/ fixes tests.

gopherbot

gopherbot commented on Jul 1, 2015

@gopherbot
Contributor

CL https://golang.org/cl/11834 mentions this issue.

added a commit that references this issue on Jul 2, 2015
kardianos

kardianos commented on Jul 2, 2015

@kardianos
Contributor

CL 11834 fixes this issue on my VM.

alexbrainman

alexbrainman commented on Jul 2, 2015

@alexbrainman
MemberAuthor

CL 11834 fixes this issue on my pc too. Thank you @dvyukov and @kardianos for seeing it till the end.

Alex

modified the milestones: Go1.5, Go1.5Maybe on Jul 2, 2015
changed the title [-]runtime/debug: TestTraceStress fails while unpark[/-] [+]runtime/trace: TestTraceStress fails while unpark[/+] on Jul 31, 2015
locked and limited conversation to collaborators on Aug 5, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

      Development

      No branches or pull requests

        Participants

        @bradfitz@mikioh@kardianos@dvyukov@aclements

        Issue actions

          runtime/trace: TestTraceStress fails while unpark · Issue #11320 · golang/go