Skip to content

Conversation

rkuris
Copy link
Member

@rkuris rkuris commented Aug 21, 2025

Why this should be merged

This code does a much better job of estimating completion time, which improves my experience using avalanchego for bootstrapping.

How this works

Correctly computing an estimate involves:

  • Estimating high early on, but gradually removing the fuzz as the load progresses. This code uses a 20% margin for this.
  • Using a rolling window of samples to see what the rate is. This prevents a few large or slow blocks from making a big impact on the estimate right way. This code uses a window size of 10.
  • Don't report estimates until we have some reasonable number of samples. Estimates will not be reported for about a minute with this code (10 samples at a minimum of 5s apart)
  • Include a percentage complete. I found myself running this calculation all the time and it's nice to see how far along it is as a percentage.

How this was tested

Too many bootstraps. The 20% margin is a wild guess, but I wrote some code to show how much fuzz makes sense, and settled on 10 samples. It was not tested on a very slow machine.

Need to be documented in RELEASES.md?

Not a bad idea.

@rkuris rkuris self-assigned this Aug 21, 2025
@rkuris rkuris added enhancement New feature or request storage This involves storage primitives labels Aug 21, 2025
@rkuris rkuris moved this to In Progress 🏗️ in avalanchego Aug 21, 2025
@rkuris rkuris marked this pull request as ready for review August 21, 2025 22:08
@Copilot Copilot AI review requested due to automatic review settings August 21, 2025 22:08
Copilot

This comment was marked as outdated.

@rkuris rkuris requested review from joshua-kim, aaronbuchwald and a team August 22, 2025 17:48
@joshua-kim joshua-kim requested review from a team and removed request for a team, alarso16, joshua-kim and aaronbuchwald August 22, 2025 19:30
Recommended by copilot, and cna't hurt to make this more bullet proof.
@rkuris rkuris requested a review from Copilot August 30, 2025 00:27
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR improves ETA estimation accuracy during avalanchego bootstrapping by implementing a rolling window approach with adjustable confidence factors. The changes replace simple linear time estimates with a more sophisticated tracking system that accounts for rate variations and provides percentage completion feedback.

  • Introduces a new EtaTracker with rolling window sampling and slowdown factor adjustment
  • Updates bootstrapping components to use the improved ETA calculation with percentage completion
  • Adds comprehensive test coverage for the ETA tracking functionality

Reviewed Changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 5 comments.

File Description
utils/timer/eta.go Implements new EtaTracker with rolling window sampling and deprecates old EstimateETA function
utils/timer/eta_test.go Adds comprehensive test coverage for EtaTracker functionality
snow/engine/snowman/bootstrap/bootstrapper.go Integrates EtaTracker into main bootstrapping logic with time-based logging controls
snow/engine/snowman/bootstrap/storage.go Updates block execution progress tracking to use new EtaTracker

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

Comment on lines 96 to 97
percentComplete := float64(sample.completed) / float64(target)
roundedPercentComplete := math.Round(percentComplete*10000) / 100 // Return percentage (0.0 to 100.0)
Copy link
Preview

Copilot AI Aug 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The magic number 10000 for percentage rounding should be defined as a constant. This calculation appears multiple times in the code with the same magic number.

Copilot uses AI. Check for mistakes.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the comment explains what this is doing. I suppose this could be moved into a function for clarity, but that seems a bit overkill for this simple calculation.

if timeSinceOldest == 0 {
return nil, 0.0
}
rate := float64(progressSinceOldest) / float64(timeSinceOldest)
Copy link
Preview

Copilot AI Aug 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Division by timeSinceOldest as a time.Duration should convert to a numeric type first. The current code performs division between uint64 and time.Duration, which may not produce the expected rate calculation.

Copilot uses AI. Check for mistakes.

Comment on lines +622 to +623
shouldLog := numPreviouslyFetched/statusUpdateFrequency != numFetched/statusUpdateFrequency &&
now.Sub(b.lastProgressUpdateTime) >= minimumLogInterval
Copy link
Preview

Copilot AI Aug 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] The compound condition combining progress-based and time-based logging logic is complex and could be split into separate boolean variables for better readability.

Suggested change
shouldLog := numPreviouslyFetched/statusUpdateFrequency != numFetched/statusUpdateFrequency &&
now.Sub(b.lastProgressUpdateTime) >= minimumLogInterval
progressThresholdCrossed := numPreviouslyFetched/statusUpdateFrequency != numFetched/statusUpdateFrequency
timeThresholdCrossed := now.Sub(b.lastProgressUpdateTime) >= minimumLogInterval
shouldLog := progressThresholdCrossed && timeThresholdCrossed

Copilot uses AI. Check for mistakes.

Copy link
Contributor

@alarso16 alarso16 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Incomplete review, I'll take a look again later

numProcessed := totalNumberToProcess - tree.Len()

// Use the tracked previous progress for accurate ETA calculation
currentProgress := numProcessed
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the point of this? Won't these variables always be the same?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just documentation. the parameter for etaTracker is currentProgress, and so I wanted to show why it was being used. If you think that just adds confusion, I can easily remove it.

return nil, 0.0
}
rate := float64(progressSinceOldest) / float64(timeSinceOldest)
if rate == 0 {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like really this should be progressSinceOldest == 0. Is it possible for floating point operations to mess up the distinction?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nope. Zero divided by anything on any platform is still zero (except when the denominator is zero, that makes NaN still).

}

// EstimateETA calculates ETA from start time and current progress.
// Deprecated: use EtaTracker instead
func EstimateETA(startTime time.Time, progress, end uint64) time.Duration {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you just delete this? I mean I guess if there's other users, but yours is better!

Copy link
Member Author

@rkuris rkuris Sep 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are still some references to the old method. I didn't change them because I'm not sure how to test them, and this code is focused on the bootstrapping which I can test.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we update the last instance of this method from the bootstrapping logic?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we do that in a followup issue/PR?

if numPreviouslyFetched/statusUpdateFrequency != numFetched/statusUpdateFrequency {
// Check if it's time to log progress (both progress-based and time-based frequency)
now := time.Now()
shouldLog := numPreviouslyFetched/statusUpdateFrequency != numFetched/statusUpdateFrequency &&
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we just remove the statusUpdateFrequency from here? It was definitely added just because we were lazy and didn't want to add another tracker (which we are doing now with lastProgressUpdateTime)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's possible, can we do this in a followup?

numFetched-b.initiallyFetched, // Number of blocks we have fetched during this run
totalBlocksToFetch-b.initiallyFetched, // Number of blocks we expect to fetch during this run

eta, progressPercentage := b.etaTracker.AddSample(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wish the log rate didn't modify how the eta updates... I suppose it is just up to anyone modifying this to know that doubling the logging rate should also double the number of internally tracked samples in the eta tracker for the same eta volatility.

It might be possible to do some ewma instead to avoid this (like done here)... but it isn't immediately obvious to me how best to do that.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair point, but that assumes that someone cares about the eta volatility more than this bug fix provides. Even with too few samples (at the minimum of 5), it should still do much better than the old code as well.

}

// EstimateETA calculates ETA from start time and current progress.
// Deprecated: use EtaTracker instead
func EstimateETA(startTime time.Time, progress, end uint64) time.Duration {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we update the last instance of this method from the bootstrapping logic?

@StephenButtolph StephenButtolph added this pull request to the merge queue Sep 11, 2025
Merged via the queue into master with commit 487fd9e Sep 11, 2025
35 checks passed
@StephenButtolph StephenButtolph deleted the rkuris/fix-time-estimates branch September 11, 2025 20:52
@github-project-automation github-project-automation bot moved this from In Review 🔎 to Done 🎉 in avalanchego Sep 11, 2025
felipemadero pushed a commit that referenced this pull request Sep 15, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request storage This involves storage primitives
Projects
Status: Done 🎉
Development

Successfully merging this pull request may close these issues.

4 participants