Skip to content

Incorrect elapsed time reported on Docker #80

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
filipesilva opened this issue Aug 30, 2018 · 5 comments
Closed

Incorrect elapsed time reported on Docker #80

filipesilva opened this issue Aug 30, 2018 · 5 comments

Comments

@filipesilva
Copy link
Contributor

Heya, I've noticed some odd elapsed time reported on Docker images. I'm not too sure if it's docker or linux or generally, but my machine is Windows and I see the issue when using Docker. I also see it happening on CircleCI, and they use docker images for their CI machines.

Here's repro: https://github.com/filipesilva/pidusage-docker. It's just a test script that print stats three times on npm test:

var pidusage = require('pidusage')

// Compute statistics every second:
const interval = setInterval(function () {
  pidusage(process.pid, { maxage: 1000 }, function (err, stats) {
    console.log(stats)
    // => {
    //   cpu: 10.0,            // percentage (from 0 to 100*vcore)
    //   memory: 357306368,    // bytes
    //   ppid: 312,            // PPID
    //   pid: 727,             // PID
    //   ctime: 867000,        // ms user + system time
    //   elapsed: 6650000,     // ms since the start of the process
    //   timestamp: 864000000  // ms since epoch
    // }
  })
}, 1000);

setTimeout(() => clearInterval(interval), 3500);

On my Windows machine I see something like this:

{ cpu: 0,
  memory: 23441408,
  ppid: 4512,
  pid: 14228,
  ctime: 265,
  elapsed: 1554,
  timestamp: 1535623344931 }
{ cpu: 3.2,
  memory: 23777280,
  ppid: 4512,
  pid: 14228,
  ctime: 297,
  elapsed: 2388,
  timestamp: 1535623345765 }
{ cpu: 0,
  memory: 23781376,
  ppid: 4512,
  pid: 14228,
  ctime: 297,
  elapsed: 3659,
  timestamp: 1535623347036 }

Using one of CircleCI's docker images, for instance circleci/node:8.11-browsers, I see the following

$ docker run -v d:/sandbox:/sandbox -it circleci/node:8.11-browsers
$ cd /sandbox/pidusage-docker
$ npm test

> [email protected] test /sandbox/pidusage-docker
> node test

{ cpu: 0,
  memory: 30306304,
  ctime: 0.08,
  elapsed: 1535540503719,
  timestamp: 83190970,
  pid: 24,
  ppid: 23 }
{ cpu: 0,
  memory: 30547968,
  ctime: 0.08,
  elapsed: 1535540504713,
  timestamp: 83190970,
  pid: 24,
  ppid: 23 }
{ cpu: 1,
  memory: 30547968,
  ctime: 0.09,
  elapsed: 1535540505714,
  timestamp: 83190970,
  pid: 24,
  ppid: 23 }

(NB: d:/sandbox:/sandbox is just mounting my local folder so I don't have to git clone the repro)

The difference between each elapsed seems correct, but they are reporting something like 48 years of elapsed time.

I guess it's related with the extremely low timestamp though, since elapsed is calculated from it:

https://github.com/soyuka/pidusage/blob/master/lib/procfile.js#L115-L116

I tried adding some logging to the relevant computations for the timestamp:

      console.log('parseFloat(infos[19])', parseFloat(infos[19]))
      console.log('cpuInfo.clockTick', cpuInfo.clockTick)
      console.log('stat.start', stat.start)
parseFloat(infos[19]) 8421548
cpuInfo.clockTick 100
stat.start 84215.48
{ cpu: 0,
  memory: 30195712,
  ctime: 0.07,
  elapsed: 1535540503698,
  timestamp: 84215480,
  pid: 52,
  ppid: 51 }

parseFloat(infos[19]) 8421548
cpuInfo.clockTick 100
stat.start 84215.48
{ cpu: 0,
  memory: 30461952,
  ctime: 0.07,
  elapsed: 1535540504692,
  timestamp: 84215480,
  pid: 52,
  ppid: 51 }

parseFloat(infos[19]) 8421548
cpuInfo.clockTick 100
stat.start 84215.48
{ cpu: 0,
  memory: 30461952,
  ctime: 0.07,
  elapsed: 1535540505694,
  timestamp: 84215480,
  pid: 52,
  ppid: 51 }

Looking at the calculated numbers it doesn't seem like anything is going wrong, just that starttime from the procfile looks really small.

Happy to try and fix this, do you have any ideas on how to proceed?

@soyuka
Copy link
Owner

soyuka commented Aug 30, 2018

Hi! Really detailed bug report, thanks for that!

So, the value we call start here (in the case of linux, reading proc files) is in reality:

The time the process started after system boot. In kernels before Linux 2.6, this value was expressed in jiffies. Since Linux 2.6, the value is expressed in clock ticks (divide by sysconf(_SC_CLK_TCK)).

Source: man 5 proc

My guess is that we forgot to substract the system uptime to get a value representing "ms since the start of the process" (quoted above).
Does that make sense?

@filipesilva
Copy link
Contributor Author

Heya, thanks for getting back to me on this!

The system uptime is stored in cpuInfo.uptime right (and afterwards stats.uptime)?

I tried logging that just before the readProcFile calls the done callback, multiplied by 1000 to get it in ms:

      console.log('stat.start * 1000 = ', stat.start * 1000)
      console.log('stat.uptime * 1000 = ', stat.uptime * 1000)

      return done(null, {

and got

stat.start * 1000 =  167873780
stat.uptime * 1000 =  167874860
{ cpu: 0,
  memory: 30240768,
  ctime: 0.07,
  elapsed: 1535540503702,
  timestamp: 167873780,
  pid: 158,
  ppid: 157 }
stat.start * 1000 =  167873780
stat.uptime * 1000 =  167875860
{ cpu: 0,
  memory: 30351360,
  ctime: 0.07,
  elapsed: 1535540504698,
  timestamp: 167873780,
  pid: 158,
  ppid: 157 }
stat.start * 1000 =  167873780
stat.uptime * 1000 =  167876860
{ cpu: 0,
  memory: 30351360,
  ctime: 0.07,
  elapsed: 1535540505698,
  timestamp: 167873780,
  pid: 158,
  ppid: 157 }

The last pair has a difference of 3080 ms between them. That's kinda odd, because it's roughly how long the test lasts. But this docker machine has been up for longer than that.

Is stat.uptime actually the uptime of the process? If so, we could use that directly for the elapsed calculation instead instead of Date.now():

        elapsed: (stat.uptime * 1000) - (stat.start * 1000),
        timestamp: stat.start * 1000, // start date

This in turn gives me the numbers I expected for elapsed:

stat.start * 1000 =  168916300
stat.uptime * 1000 =  168917390
{ cpu: 0,
  memory: 30183424,
  ctime: 0.07,
  elapsed: 1090,
  timestamp: 168916300,
  pid: 409,
  ppid: 408 }
stat.start * 1000 =  168916300
stat.uptime * 1000 =  168918390
{ cpu: 0,
  memory: 30474240,
  ctime: 0.07,
  elapsed: 2090,
  timestamp: 168916300,
  pid: 409,
  ppid: 408 }
stat.start * 1000 =  168916300
stat.uptime * 1000 =  168919390
{ cpu: 1,
  memory: 30474240,
  ctime: 0.08,
  elapsed: 3090,
  timestamp: 168916300,
  pid: 409,
  ppid: 408 }

Timestamp is still very small though. Looking at the wmic.js file I see that there timestamp is actually Data.now().

If I change that now:

        elapsed: (stat.uptime * 1000) - (stat.start * 1000), // process uptime minus start time
        timestamp: date,

I get similar numbers on windows and linux:

// windows
{ cpu: 0,
  memory: 23195648,
  ppid: 8504,
  pid: 18560,
  ctime: 157,
  elapsed: 1410,
  timestamp: 1535709907746 }
{ cpu: 3.1,
  memory: 23564288,
  ppid: 8504,
  pid: 18560,
  ctime: 188,
  elapsed: 2277,
  timestamp: 1535709908613 }
{ cpu: 1.6,
  memory: 23564288,
  ppid: 8504,
  pid: 18560,
  ctime: 204,
  elapsed: 3286,
  timestamp: 1535709909622 }

// linux
{ cpu: 0,
  memory: 30138368,
  ctime: 0.06,
  elapsed: 1080,
  timestamp: 1535709900482,
  pid: 459,
  ppid: 458 }
{ cpu: 1,
  memory: 30326784,
  ctime: 0.07,
  elapsed: 2080,
  timestamp: 1535709901478,
  pid: 459,
  ppid: 458 }
{ cpu: 0,
  memory: 30326784,
  ctime: 0.07,
  elapsed: 3080,
  timestamp: 1535709902479,
  pid: 459,
  ppid: 458 }

What do you think? I'll make a PR with these changes.

@filipesilva
Copy link
Contributor Author

Here is the PR #83

@soyuka
Copy link
Owner

soyuka commented Aug 31, 2018

Is stat.uptime actually the uptime of the process? If so, we could use that directly for the elapsed calculation instead instead of Date.now():

Definitely!

About the timestamp I think that @simonepri had in mind "uptime of the process in ms since epoch". The changes you made for the elapsed time look definitely correct, we'll just wait for his insight on the timestamp thing before merging!

Tyvm for taking care of this!

@filipesilva
Copy link
Contributor Author

Regarding timestamp, using Date.now() (the value of date in the PR) is the ms since epoch for when that function was ran on node.

On Windows that's the number used for timestamp:
https://github.com/soyuka/pidusage/blob/master/lib/wmic.js#L116

If we could get the time since epoch of the procfile process proper that would be more precise. I can't see it listed in http://man7.org/linux/man-pages/man5/proc.5.html though.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants