-
Notifications
You must be signed in to change notification settings - Fork 13.7k
Closed
Labels
A-timeArea: TimeArea: TimeC-bugCategory: This is a bug.Category: This is a bug.C-defective-hardwareCategory: Issue that was filed as a possible software bug but is actually defective hardwareCategory: Issue that was filed as a possible software bug but is actually defective hardwareT-libsRelevant to the library team, which will review and decide on the PR/issue.Relevant to the library team, which will review and decide on the PR/issue.T-libs-apiRelevant to the library API team, which will review and decide on the PR/issue.Relevant to the library API team, which will review and decide on the PR/issue.
Metadata
Metadata
Assignees
Labels
A-timeArea: TimeArea: TimeC-bugCategory: This is a bug.Category: This is a bug.C-defective-hardwareCategory: Issue that was filed as a possible software bug but is actually defective hardwareCategory: Issue that was filed as a possible software bug but is actually defective hardwareT-libsRelevant to the library team, which will review and decide on the PR/issue.Relevant to the library team, which will review and decide on the PR/issue.T-libs-apiRelevant to the library API team, which will review and decide on the PR/issue.Relevant to the library API team, which will review and decide on the PR/issue.
Type
Projects
Milestone
Relationships
Development
Select code repository
Activity
silence-coding commentedon Jan 4, 2021
jyn514 commentedon Jan 4, 2021
https://doc.rust-lang.org/std/time/struct.Instant.html#panics-1
What operating system are you on? How are you setting the instant? On some systems the clock time is not actually monotonic.
rust/library/std/src/time.rs
Lines 223 to 250 in 8018418
silence-coding commentedon Jan 4, 2021
rust version: 1.45.2
operating system: EulerOS 2.0SP5 x86_64
https://developer.huaweicloud.com/en-us/euleros/lifecycle-management.html
silence-coding commentedon Jan 4, 2021
@jyn514 Thank you for your reply. Can I ask how to set the instant will trigger this situation?
jyn514 commentedon Jan 4, 2021
@silence-coding like the comment says, if the system time is not monotonic then this panic can happen when you set it to a time that's earlier. I don't know any more than that.
silence-coding commentedon Jan 5, 2021
Thank you
saturating_duration_since
inelapsed
instead of panicking #84344the8472 commentedon Apr 20, 2021
To get an idea what might be causing this the hardware environment matters too.
Can you provide the following info?
/proc/cpuinfo
, if the cores are homogenous then one core will do/sys/devices/system/clocksource/clocksource0/current_clocksource
silence-coding commentedon Apr 21, 2021
Circumstances have been lost, just looking for a similar
Linux version 3.10.0-862.14.1.0.h197.eulerosv2r7.x86_64
/proc/cpuinfo
processor : 0
vendor_id : GenuineIntel
cpu family : 6
model : 63
model name : Intel(R) Xeon(R) CPU E5-2690 v3 @ 2.60GHz
stepping : 2
microcode : 0x1
cpu MHz : 2593.992
cache size : 16384 KB
physical id : 0
siblings : 1
core id : 0
cpu cores : 1
apicid : 0
initial apicid : 0
fpu : yes
fpu_exception : yes
cpuid level : 13
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology eagerfpu pni pclmulqdq ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid xsaveopt arat
bogomips : 5187.98
clflush size : 64
cache_alignment : 64
address sizes : 42 bits physical, 48 bits virtual
power management:
Hypervisor vendor: KVM
/sys/devices/system/clocksource/clocksource0/current_clocksource kvm-clock
the8472 commentedon Apr 21, 2021
Ok, on the assumption that actually is identical to the hardware that triggered your initial report:
It's not actually using the tsc directly and instead relying on the hypervisor.
If I read this code correctly:
https://github.com/torvalds/linux/blob/8bb495e3f02401ee6f76d1b1d77f3ac9f079e376/arch/x86/kernel/kvmclock.c#L271-L272
and
https://github.com/torvalds/linux/blob/8bb495e3f02401ee6f76d1b1d77f3ac9f079e376/arch/x86/kernel/pvclock.c#L77-L100
then it'll only trust the KVM clock if the hypervisor explicitly promises that the host clock is stable and otherwise protects it with an atomic CAS. I only checked the mainline 3.10 source though. RHEL/CentOS are known to produce heavily patched frankenkernels, so their code might actually be doing something different there.
Anyway, it looks like the host explicitly promises that the clock is reliable and then breaks that promise.
So if this issue occurs again on a machine using
kvm-clock
you could report that to your hosting provider, they may have to update their hypervisor to fix this or you could switch to a different clock source, e.g.hpet
. You can check which ones are available under/sys/devices/system/clocksource/clocksource0/available_clocksources
On the rust side we have the option to also change
actually_monotonic()
check to return false on x86 linux. Which would be unfortunate since this only affects broken hypervisors on old CPUs withoutnonstop_tsc
since current kernels prefertsc
overkvm-clock
if the cpu indicatesconstant_tsc
+nonstop_tsc
.the8472 commentedon Apr 21, 2021
That may not be reliable enough to report this issue to the linux kernel maintainers.
If you encounter the issue again can you gather the information (OS, cpuinfo, hypervisor, kernel version and clock source) for the particular failing system?
silence-coding commentedon Apr 25, 2021
@the8472 Next time I meet, I gather as much information as I can.
6 remaining items