-
Notifications
You must be signed in to change notification settings - Fork 784
fix: time remaining should use better estimates #4196
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Co-authored-by: Copilot <[email protected]> Signed-off-by: Ron Kuris <[email protected]>
Co-authored-by: Copilot <[email protected]> Signed-off-by: Ron Kuris <[email protected]>
Co-authored-by: Copilot <[email protected]> Signed-off-by: Ron Kuris <[email protected]>
Recommended by copilot, and cna't hurt to make this more bullet proof.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR improves ETA estimation accuracy during avalanchego bootstrapping by implementing a rolling window approach with adjustable confidence factors. The changes replace simple linear time estimates with a more sophisticated tracking system that accounts for rate variations and provides percentage completion feedback.
- Introduces a new EtaTracker with rolling window sampling and slowdown factor adjustment
- Updates bootstrapping components to use the improved ETA calculation with percentage completion
- Adds comprehensive test coverage for the ETA tracking functionality
Reviewed Changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 5 comments.
File | Description |
---|---|
utils/timer/eta.go | Implements new EtaTracker with rolling window sampling and deprecates old EstimateETA function |
utils/timer/eta_test.go | Adds comprehensive test coverage for EtaTracker functionality |
snow/engine/snowman/bootstrap/bootstrapper.go | Integrates EtaTracker into main bootstrapping logic with time-based logging controls |
snow/engine/snowman/bootstrap/storage.go | Updates block execution progress tracking to use new EtaTracker |
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
utils/timer/eta.go
Outdated
percentComplete := float64(sample.completed) / float64(target) | ||
roundedPercentComplete := math.Round(percentComplete*10000) / 100 // Return percentage (0.0 to 100.0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The magic number 10000 for percentage rounding should be defined as a constant. This calculation appears multiple times in the code with the same magic number.
Copilot uses AI. Check for mistakes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the comment explains what this is doing. I suppose this could be moved into a function for clarity, but that seems a bit overkill for this simple calculation.
if timeSinceOldest == 0 { | ||
return nil, 0.0 | ||
} | ||
rate := float64(progressSinceOldest) / float64(timeSinceOldest) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Division by timeSinceOldest as a time.Duration should convert to a numeric type first. The current code performs division between uint64 and time.Duration, which may not produce the expected rate calculation.
Copilot uses AI. Check for mistakes.
shouldLog := numPreviouslyFetched/statusUpdateFrequency != numFetched/statusUpdateFrequency && | ||
now.Sub(b.lastProgressUpdateTime) >= minimumLogInterval |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nitpick] The compound condition combining progress-based and time-based logging logic is complex and could be split into separate boolean variables for better readability.
shouldLog := numPreviouslyFetched/statusUpdateFrequency != numFetched/statusUpdateFrequency && | |
now.Sub(b.lastProgressUpdateTime) >= minimumLogInterval | |
progressThresholdCrossed := numPreviouslyFetched/statusUpdateFrequency != numFetched/statusUpdateFrequency | |
timeThresholdCrossed := now.Sub(b.lastProgressUpdateTime) >= minimumLogInterval | |
shouldLog := progressThresholdCrossed && timeThresholdCrossed |
Copilot uses AI. Check for mistakes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Incomplete review, I'll take a look again later
numProcessed := totalNumberToProcess - tree.Len() | ||
|
||
// Use the tracked previous progress for accurate ETA calculation | ||
currentProgress := numProcessed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the point of this? Won't these variables always be the same?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just documentation. the parameter for etaTracker is currentProgress, and so I wanted to show why it was being used. If you think that just adds confusion, I can easily remove it.
return nil, 0.0 | ||
} | ||
rate := float64(progressSinceOldest) / float64(timeSinceOldest) | ||
if rate == 0 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems like really this should be progressSinceOldest == 0
. Is it possible for floating point operations to mess up the distinction?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nope. Zero divided by anything on any platform is still zero (except when the denominator is zero, that makes NaN still).
} | ||
|
||
// EstimateETA calculates ETA from start time and current progress. | ||
// Deprecated: use EtaTracker instead | ||
func EstimateETA(startTime time.Time, progress, end uint64) time.Duration { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you just delete this? I mean I guess if there's other users, but yours is better!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are still some references to the old method. I didn't change them because I'm not sure how to test them, and this code is focused on the bootstrapping which I can test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we update the last instance of this method from the bootstrapping logic?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we do that in a followup issue/PR?
if numPreviouslyFetched/statusUpdateFrequency != numFetched/statusUpdateFrequency { | ||
// Check if it's time to log progress (both progress-based and time-based frequency) | ||
now := time.Now() | ||
shouldLog := numPreviouslyFetched/statusUpdateFrequency != numFetched/statusUpdateFrequency && |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we just remove the statusUpdateFrequency
from here? It was definitely added just because we were lazy and didn't want to add another tracker (which we are doing now with lastProgressUpdateTime
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's possible, can we do this in a followup?
numFetched-b.initiallyFetched, // Number of blocks we have fetched during this run | ||
totalBlocksToFetch-b.initiallyFetched, // Number of blocks we expect to fetch during this run | ||
|
||
eta, progressPercentage := b.etaTracker.AddSample( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wish the log rate didn't modify how the eta updates... I suppose it is just up to anyone modifying this to know that doubling the logging rate should also double the number of internally tracked samples in the eta tracker for the same eta volatility.
It might be possible to do some ewma instead to avoid this (like done here)... but it isn't immediately obvious to me how best to do that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fair point, but that assumes that someone cares about the eta volatility more than this bug fix provides. Even with too few samples (at the minimum of 5), it should still do much better than the old code as well.
} | ||
|
||
// EstimateETA calculates ETA from start time and current progress. | ||
// Deprecated: use EtaTracker instead | ||
func EstimateETA(startTime time.Time, progress, end uint64) time.Duration { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we update the last instance of this method from the bootstrapping logic?
It's desired, but not quite sure why.
golang seems to require this to actually indicate it's deprecated
Adds tests cases as well
Signed-off-by: Ron Kuris <[email protected]> Co-authored-by: Copilot <[email protected]>
Why this should be merged
This code does a much better job of estimating completion time, which improves my experience using avalanchego for bootstrapping.
How this works
Correctly computing an estimate involves:
How this was tested
Too many bootstraps. The 20% margin is a wild guess, but I wrote some code to show how much fuzz makes sense, and settled on 10 samples. It was not tested on a very slow machine.
Need to be documented in RELEASES.md?
Not a bad idea.