Skip to content

fix: fix staleness check #182

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Nov 28, 2018
Merged

fix: fix staleness check #182

merged 2 commits into from
Nov 28, 2018

Conversation

hugomrdias
Copy link
Member

This PR fixes this #181 (comment)

For context the problem isn't about that stack trace, the problem is about the staleness checks https://github.com/moxystudio/node-proper-lockfile#design.
The lockfile mtime is updated periodically (5s) to prevent staleness, and there's a threshold of up to 10s to mark a lock as stale.

What happens is that ipfs some how locks the event loop for so long that proper-lockfile marks the lock as compromised because it takes more than 5s to update mtime again and so we hit the limit of 10s. Hope this makes sense lol :)

This fix just increases the stale threshold to 20s and that should be enough, but its a naive fix and if we need to further increase this threshold we should really dig into it more.

To give an example of what this threshold means, if we have a long running ipfs daemon that critically crashes meaning no cleaning is made (lock file still there) we can't spawn another for the next 20s without manually deleting the lock file so any systemd like system that restart the process after a crash will not work!

@hugomrdias hugomrdias self-assigned this Nov 21, 2018
@ghost ghost added the status/in-progress In progress label Nov 21, 2018
@jacobheun
Copy link
Contributor

So, we create the lock and set a timeout for 5 seconds to update the mtime on the lock. By the time that update actually occurs, it's been more that 10 seconds? Is that correct? Either the fs is insanely slow and/or something bad is happening in ipfs.

@alanshaw is the original error consistent or intermittent?

@hugomrdias
Copy link
Member Author

Jacob that's all done by proper-lockfile, I checked and all seems fine on that side.
The error is irregular sometimes doesn't even happen, others times happens in different tests.

@hugomrdias
Copy link
Member Author

The flow is more like :

  • mtime updates
  • another is scheduled to run 5s later
  • 5s later loop is stuck
  • scheduled function finally runs checks current time and more than 10s passed from the previous mtime update
  • lock is marked as compromised
  • proper-lockfile throws error

We can make it not throw but that effectively removes some protection.

@achingbrain
Copy link
Member

This will help until something locks up the CPU for 20 seconds. Which is bad, obviously, but the app then crashing afterwards with a seemingly unrelated error masks the problem.

I wonder if we couldn't just get IPFS to offer the option of overriding the lock on startup instead?

src/lock.js Outdated
@@ -19,7 +19,7 @@ exports.lock = (dir, callback) => {
const file = path.join(dir, lockFile)
log('locking %s', file)

lock(dir, {lockfilePath: file})
lock(dir, {lockfilePath: file, stale: 20000})
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

20000 needs to be pulled out as a const with a comment explaining what it does and why it is set to 2000 with a link to this PR.

@hugomrdias
Copy link
Member Author

@alanshaw lets get this in ?

Copy link
Contributor

@jacobheun jacobheun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Change looks good for now. Comments left for future improvements.

@jacobheun jacobheun merged commit ede5dd6 into master Nov 28, 2018
@ghost ghost removed the status/in-progress In progress label Nov 28, 2018
@jacobheun jacobheun deleted the fix/lock-stale branch November 28, 2018 11:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants