Skip to content

Add debug message in time panic #37137

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
mrmiywj opened this issue Oct 13, 2016 · 14 comments
Closed

Add debug message in time panic #37137

mrmiywj opened this issue Oct 13, 2016 · 14 comments

Comments

@mrmiywj
Copy link
Contributor

mrmiywj commented Oct 13, 2016

In developing tikv, we met a strange panic here, https://github.com/rust-lang/rust/blob/master/src/libstd/sys/unix/time.rs#L276

But this panic does not have enough debug info, we think we can add the earlier and current time debug message in the panic message.

@siddontang
Copy link

/cc @ alexcrichton

We meet this panic in linux randomly, the backtrace is:

thread 'raftstore-4' panicked 'other was less than the current instant' at "../src/libstd/sys/unix/time.rs:276"
stack backtrace:
   0:     0x7fdb0fd53c1e - backtrace::backtrace::libunwind::trace
                        at /root/.cargo/registry/src/github.com-1ecc6299db9ec823/backtrace-0.2.3/src/backtrace/libunwind.rs:54
                         - backtrace::backtrace::trace<closure>
                        at /root/.cargo/registry/src/github.com-1ecc6299db9ec823/backtrace-0.2.3/src/backtrace/mod.rs:70
   1:     0x7fdb0fd54333 - backtrace::capture::{{impl}}::new
                        at /tikv/target/release/build/backtrace-3bc1ff360ebb00cb/out/capture.rs:79
   2:     0x7fdb0fd53540 - tikv::util::panic_hook::set_exit_hook::{{closure}}
                        at /tikv/src/util/panic_hook.rs:84
   3:     0x7fdb102863f7 - std::panicking::rust_panic_with_hook::h105c3d42fcd2fb5e
   4:     0x7fdb1028628f - std::panicking::begin_panic::h5c54e97e5cda1cc4
   5:     0x7fdb1027e5e9 - std::time::Instant::elapsed::h032d2251de7f06d6

This is strange that monotonic time jumps back.

@alexcrichton
Copy link
Member

This panic doesn't necessarily mean monotonic time jumps backwards (that would definitely be bad!) It could also just be the accidental ordering of two times in a subtraction perhaps?

@siddontang
Copy link

yes, it is mainly for subtracting a larger Instant time, not only for monotonic time jumping back.

As you can see, we meet panic in Instant::elaspsed function, this is very strange, but we can only know we panic here, what's the current Instant, what's the earlier Instant, we don't know. So I think it may be more helpful to add these in panic message.

@alexcrichton
Copy link
Member

Oh interesting! If this is happening in elapsed then that's definitely a bug!

Does this happen on one particular platform? And if so, is there a way to reproduce it? (even sporadically)

@japaric
Copy link
Member

japaric commented Oct 16, 2016

If the panics are on a *nix server, this could be caused by NTP shenanigans as we are using CLOCK_MONOTONIC instead CLOCK_MONOTONIC_RAW. Supposedly, the later is NTP-immune but I don't know if it has any drawbacks over the non-raw version.

@sfackler
Copy link
Member

It's not present on all kernel versions we support, but it seems like one of those things we should detect support for at runtime.

@alexcrichton
Copy link
Member

In any case even if there's not kernel support we should never panic in Instant::elapsed. Presumably we should just attempt a subtraction and if it fails return a zero.

@siddontang
Copy link

Hi @alexcrichton
Now we meet this panic in CentOS 7 in the UCloud Cloud Server(/cc @wuzhy).

cat /etc/redhat-release
CentOS Linux release 7.0.1406 (Core)

@siddontang
Copy link

Hi @japaric

If the panics are on a *nix server, this could be caused by NTP shenanigans as we are using CLOCK_MONOTONIC instead CLOCK_MONOTONIC_RAW.

Seem that NTP can't let monotonic clock jump back, but only affects its rate.
I don't use CLOCK_MONOTONIC_RAW before, so can't know whether it is better or not :-)

@alexcrichton
Copy link
Member

Hm so I don't understand why CLOCK_MONOTONIC wouldn't be appropriate here. I can't find any documentation or warnings or cases about the clock going backwards, and my presumption is that's the whole purpose of the clock, it doesn't go backwards. My understanding is that while it's susceptible to NTP adjustments that just means it slows down or speeds up the frequency of the passage of time.

@siddontang an explicit condition where this function can panic is when an Instant is being manufactured. For example this will likely always panic:

(Instant::now() + Duration::new(100, 0)).elapsed();

Is that happening in your program? Or does your usage of elapsed() look like:

let start = Instant::now();
// ...
let dur = start.elapsed();

@siddontang
Copy link

Hi @alexcrichton

Our usage is:

let start = Instant::now();
let dur = start.elapsed();

We meet this panic rarely, maybe after running our program for many days.
I want to find some helpful system logs but find nothing.

@alexcrichton
Copy link
Member

Unusual! Definitely seems bad :(

@siddontang
Copy link

Update: we write a simple two program, the one uses CLOCK_MONOTONIC, the other uses CLOCK_MONOTONIC_RAW, both codes like:

for {
    t1 = now()
    t2 = now()
    assert(t2 >= t1)
}

We run these two programs with our program. When our program panics with clock jumping back, we find the program with CLOCK_MONOTONIC also panics too, but the program CLOCK_MONOTONIC_RAW still works.

I don't think this the issue for Rust but it is for the ucloud system (/cc @wuzhy) now.
I think both CLOCK_MONOTONIC and CLOCK_MONOTONIC_RAW can't jump back.

Can we close it now?

@alexcrichton
Copy link
Member

Ah thanks for the update @siddontang! Sounds reasonable to me.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants