fix: fix staleness check #182

hugomrdias · 2018-11-21T17:13:34Z

For context the problem isn't about that stack trace, the problem is about the staleness checks https://github.com/moxystudio/node-proper-lockfile#design.
The lockfile mtime is updated periodically (5s) to prevent staleness, and there's a threshold of up to 10s to mark a lock as stale.

What happens is that ipfs some how locks the event loop for so long that proper-lockfile marks the lock as compromised because it takes more than 5s to update mtime again and so we hit the limit of 10s. Hope this makes sense lol :)

This fix just increases the stale threshold to 20s and that should be enough, but its a naive fix and if we need to further increase this threshold we should really dig into it more.

To give an example of what this threshold means, if we have a long running ipfs daemon that critically crashes meaning no cleaning is made (lock file still there) we can't spawn another for the next 20s without manually deleting the lock file so any systemd like system that restart the process after a crash will not work!

jacobheun · 2018-11-21T19:22:20Z

So, we create the lock and set a timeout for 5 seconds to update the mtime on the lock. By the time that update actually occurs, it's been more that 10 seconds? Is that correct? Either the fs is insanely slow and/or something bad is happening in ipfs.

@alanshaw is the original error consistent or intermittent?

hugomrdias · 2018-11-21T19:26:43Z

Jacob that's all done by proper-lockfile, I checked and all seems fine on that side.
The error is irregular sometimes doesn't even happen, others times happens in different tests.

hugomrdias · 2018-11-21T19:36:27Z

The flow is more like :

mtime updates
another is scheduled to run 5s later
5s later loop is stuck
scheduled function finally runs checks current time and more than 10s passed from the previous mtime update
lock is marked as compromised
proper-lockfile throws error

We can make it not throw but that effectively removes some protection.

achingbrain · 2018-11-27T07:47:15Z

This will help until something locks up the CPU for 20 seconds. Which is bad, obviously, but the app then crashing afterwards with a seemingly unrelated error masks the problem.

I wonder if we couldn't just get IPFS to offer the option of overriding the lock on startup instead?

alanshaw · 2018-11-27T11:19:16Z

src/lock.js

@@ -19,7 +19,7 @@ exports.lock = (dir, callback) => {
  const file = path.join(dir, lockFile)
  log('locking %s', file)

-  lock(dir, {lockfilePath: file})
+  lock(dir, {lockfilePath: file, stale: 20000})


20000 needs to be pulled out as a const with a comment explaining what it does and why it is set to 2000 with a link to this PR.

hugomrdias · 2018-11-28T10:21:00Z

@alanshaw lets get this in ?

jacobheun

Change looks good for now. Comments left for future improvements.

fix: fix staleness check

57b884a

hugomrdias self-assigned this Nov 21, 2018

hugomrdias requested review from alanshaw, jacobheun and achingbrain November 21, 2018 17:13

ghost added the status/in-progress In progress label Nov 21, 2018

alanshaw requested changes Nov 27, 2018

View reviewed changes

fix: extract stale value to const

0c510ca

jacobheun approved these changes Nov 28, 2018

View reviewed changes

jacobheun merged commit ede5dd6 into master Nov 28, 2018

ghost removed the status/in-progress In progress label Nov 28, 2018

jacobheun deleted the fix/lock-stale branch November 28, 2018 11:45

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: fix staleness check #182

fix: fix staleness check #182

hugomrdias commented Nov 21, 2018

jacobheun commented Nov 21, 2018

hugomrdias commented Nov 21, 2018

hugomrdias commented Nov 21, 2018

achingbrain commented Nov 27, 2018

alanshaw Nov 27, 2018

hugomrdias commented Nov 28, 2018

jacobheun left a comment

fix: fix staleness check #182

fix: fix staleness check #182

Conversation

hugomrdias commented Nov 21, 2018

jacobheun commented Nov 21, 2018

hugomrdias commented Nov 21, 2018

hugomrdias commented Nov 21, 2018

achingbrain commented Nov 27, 2018

alanshaw Nov 27, 2018

Choose a reason for hiding this comment

hugomrdias commented Nov 28, 2018

jacobheun left a comment

Choose a reason for hiding this comment