-
Notifications
You must be signed in to change notification settings - Fork 51
fix: fix staleness check #182
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
So, we create the lock and set a timeout for 5 seconds to update the mtime on the lock. By the time that update actually occurs, it's been more that 10 seconds? Is that correct? Either the fs is insanely slow and/or something bad is happening in ipfs. @alanshaw is the original error consistent or intermittent? |
Jacob that's all done by proper-lockfile, I checked and all seems fine on that side. |
The flow is more like :
We can make it not throw but that effectively removes some protection. |
This will help until something locks up the CPU for 20 seconds. Which is bad, obviously, but the app then crashing afterwards with a seemingly unrelated error masks the problem. I wonder if we couldn't just get IPFS to offer the option of overriding the lock on startup instead? |
src/lock.js
Outdated
@@ -19,7 +19,7 @@ exports.lock = (dir, callback) => { | |||
const file = path.join(dir, lockFile) | |||
log('locking %s', file) | |||
|
|||
lock(dir, {lockfilePath: file}) | |||
lock(dir, {lockfilePath: file, stale: 20000}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
20000
needs to be pulled out as a const with a comment explaining what it does and why it is set to 2000 with a link to this PR.
@alanshaw lets get this in ? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Change looks good for now. Comments left for future improvements.
This PR fixes this #181 (comment)
For context the problem isn't about that stack trace, the problem is about the staleness checks https://github.com/moxystudio/node-proper-lockfile#design.
The lockfile mtime is updated periodically (5s) to prevent staleness, and there's a threshold of up to 10s to mark a lock as stale.
What happens is that ipfs some how locks the event loop for so long that
proper-lockfile
marks the lock as compromised because it takes more than 5s to update mtime again and so we hit the limit of 10s. Hope this makes sense lol :)This fix just increases the stale threshold to 20s and that should be enough, but its a naive fix and if we need to further increase this threshold we should really dig into it more.
To give an example of what this threshold means, if we have a long running ipfs daemon that critically crashes meaning no cleaning is made (lock file still there) we can't spawn another for the next 20s without manually deleting the lock file so any systemd like system that restart the process after a crash will not work!