fix: time remaining should use better estimates #4196

rkuris · 2025-08-21T18:36:58Z

Why this should be merged

This code does a much better job of estimating completion time, which improves my experience using avalanchego for bootstrapping.

How this works

Correctly computing an estimate involves:

Estimating high early on, but gradually removing the fuzz as the load progresses. This code uses a 20% margin for this.
Using a rolling window of samples to see what the rate is. This prevents a few large or slow blocks from making a big impact on the estimate right way. This code uses a window size of 10.
Don't report estimates until we have some reasonable number of samples. Estimates will not be reported for about a minute with this code (10 samples at a minimum of 5s apart)
Include a percentage complete. I found myself running this calculation all the time and it's nice to see how far along it is as a percentage.

How this was tested

Too many bootstraps. The 20% margin is a wild guess, but I wrote some code to show how much fuzz makes sense, and settled on 10 samples. It was not tested on a very slow machine.

Need to be documented in RELEASES.md?

Not a bad idea.

Co-authored-by: Copilot <[email protected]> Signed-off-by: Ron Kuris <[email protected]>

utils/timer/eta_test.go

Recommended by copilot, and cna't hurt to make this more bullet proof.

Copilot

Pull Request Overview

This PR improves ETA estimation accuracy during avalanchego bootstrapping by implementing a rolling window approach with adjustable confidence factors. The changes replace simple linear time estimates with a more sophisticated tracking system that accounts for rate variations and provides percentage completion feedback.

Introduces a new EtaTracker with rolling window sampling and slowdown factor adjustment
Updates bootstrapping components to use the improved ETA calculation with percentage completion
Adds comprehensive test coverage for the ETA tracking functionality

Reviewed Changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 5 comments.

File	Description
utils/timer/eta.go	Implements new EtaTracker with rolling window sampling and deprecates old EstimateETA function
utils/timer/eta_test.go	Adds comprehensive test coverage for EtaTracker functionality
snow/engine/snowman/bootstrap/bootstrapper.go	Integrates EtaTracker into main bootstrapping logic with time-based logging controls
snow/engine/snowman/bootstrap/storage.go	Updates block execution progress tracking to use new EtaTracker

_{Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.}

utils/timer/eta.go

Copilot · 2025-08-30T00:28:23Z

utils/timer/eta.go

+		percentComplete := float64(sample.completed) / float64(target)
+		roundedPercentComplete := math.Round(percentComplete*10000) / 100 // Return percentage (0.0 to 100.0)


The magic number 10000 for percentage rounding should be defined as a constant. This calculation appears multiple times in the code with the same magic number.

I think the comment explains what this is doing. I suppose this could be moved into a function for clarity, but that seems a bit overkill for this simple calculation.

Copilot · 2025-08-30T00:28:23Z

utils/timer/eta.go

+	if timeSinceOldest == 0 {
+		return nil, 0.0
+	}
+	rate := float64(progressSinceOldest) / float64(timeSinceOldest)


Division by timeSinceOldest as a time.Duration should convert to a numeric type first. The current code performs division between uint64 and time.Duration, which may not produce the expected rate calculation.

Copilot · 2025-08-30T00:28:24Z

snow/engine/snowman/bootstrap/bootstrapper.go

+		shouldLog := numPreviouslyFetched/statusUpdateFrequency != numFetched/statusUpdateFrequency &&
+			now.Sub(b.lastProgressUpdateTime) >= minimumLogInterval


[nitpick] The compound condition combining progress-based and time-based logging logic is complex and could be split into separate boolean variables for better readability.

Suggested change

shouldLog := numPreviouslyFetched/statusUpdateFrequency != numFetched/statusUpdateFrequency &&

now.Sub(b.lastProgressUpdateTime) >= minimumLogInterval

progressThresholdCrossed := numPreviouslyFetched/statusUpdateFrequency != numFetched/statusUpdateFrequency

timeThresholdCrossed := now.Sub(b.lastProgressUpdateTime) >= minimumLogInterval

shouldLog := progressThresholdCrossed && timeThresholdCrossed

snow/engine/snowman/bootstrap/bootstrapper.go

alarso16

Incomplete review, I'll take a look again later

snow/engine/snowman/bootstrap/bootstrapper.go

alarso16 · 2025-08-25T20:19:11Z

snow/engine/snowman/bootstrap/storage.go

+			numProcessed := totalNumberToProcess - tree.Len()
+
+			// Use the tracked previous progress for accurate ETA calculation
+			currentProgress := numProcessed


What's the point of this? Won't these variables always be the same?

Just documentation. the parameter for etaTracker is currentProgress, and so I wanted to show why it was being used. If you think that just adds confusion, I can easily remove it.

alarso16 · 2025-09-05T20:57:20Z

utils/timer/eta.go

+		return nil, 0.0
+	}
+	rate := float64(progressSinceOldest) / float64(timeSinceOldest)
+	if rate == 0 {


It seems like really this should be progressSinceOldest == 0. Is it possible for floating point operations to mess up the distinction?

Nope. Zero divided by anything on any platform is still zero (except when the denominator is zero, that makes NaN still).

alarso16 · 2025-09-05T20:59:00Z

utils/timer/eta.go

+}
+
+// EstimateETA calculates ETA from start time and current progress.
+// Deprecated: use EtaTracker instead
 func EstimateETA(startTime time.Time, progress, end uint64) time.Duration {


Can you just delete this? I mean I guess if there's other users, but yours is better!

There are still some references to the old method. I didn't change them because I'm not sure how to test them, and this code is focused on the bootstrapping which I can test.

Should we update the last instance of this method from the bootstrapping logic?

Can we do that in a followup issue/PR?

snow/engine/snowman/bootstrap/bootstrapper.go

utils/timer/eta_test.go

utils/timer/eta.go

StephenButtolph · 2025-09-09T19:22:55Z

snow/engine/snowman/bootstrap/bootstrapper.go

-		if numPreviouslyFetched/statusUpdateFrequency != numFetched/statusUpdateFrequency {
+		// Check if it's time to log progress (both progress-based and time-based frequency)
+		now := time.Now()
+		shouldLog := numPreviouslyFetched/statusUpdateFrequency != numFetched/statusUpdateFrequency &&


Should we just remove the statusUpdateFrequency from here? It was definitely added just because we were lazy and didn't want to add another tracker (which we are doing now with lastProgressUpdateTime)

That's possible, can we do this in a followup?

snow/engine/snowman/bootstrap/bootstrapper.go

StephenButtolph · 2025-09-09T19:32:10Z

snow/engine/snowman/bootstrap/bootstrapper.go

-				numFetched-b.initiallyFetched,         // Number of blocks we have fetched during this run
-				totalBlocksToFetch-b.initiallyFetched, // Number of blocks we expect to fetch during this run
+
+			eta, progressPercentage := b.etaTracker.AddSample(


I wish the log rate didn't modify how the eta updates... I suppose it is just up to anyone modifying this to know that doubling the logging rate should also double the number of internally tracked samples in the eta tracker for the same eta volatility.

It might be possible to do some ewma instead to avoid this (like done here)... but it isn't immediately obvious to me how best to do that.

Fair point, but that assumes that someone cares about the eta volatility more than this bug fix provides. Even with too few samples (at the minimum of 5), it should still do much better than the old code as well.

snow/engine/snowman/bootstrap/bootstrapper.go

StephenButtolph · 2025-09-09T19:36:19Z

utils/timer/eta.go

+}
+
+// EstimateETA calculates ETA from start time and current progress.
+// Deprecated: use EtaTracker instead
 func EstimateETA(startTime time.Time, progress, end uint64) time.Duration {


Should we update the last instance of this method from the bootstrapping logic?

It's desired, but not quite sure why.

golang seems to require this to actually indicate it's deprecated

Adds tests cases as well

Signed-off-by: Ron Kuris <[email protected]> Co-authored-by: Copilot <[email protected]>

fix: time remaining should use better estimates

3395ea3

rkuris requested a review from alarso16 August 21, 2025 18:36

github-project-automation bot added this to avalanchego Aug 21, 2025

Typo

926f0db

rkuris self-assigned this Aug 21, 2025

rkuris added enhancement New feature or request storage This involves storage primitives labels Aug 21, 2025

rkuris moved this to In Progress 🏗️ in avalanchego Aug 21, 2025

rkuris added 2 commits August 21, 2025 14:49

Switch to require over assert

bc50cc8

Address lints

0a9c5ae

rkuris marked this pull request as ready for review August 21, 2025 22:08

Copilot AI review requested due to automatic review settings August 21, 2025 22:08

rkuris requested a review from StephenButtolph as a code owner August 21, 2025 22:08

This comment was marked as outdated.

Sign in to view

rkuris and others added 6 commits August 21, 2025 15:09

Update utils/timer/eta.go

23807a0

Co-authored-by: Copilot <[email protected]> Signed-off-by: Ron Kuris <[email protected]>

Update snow/engine/snowman/bootstrap/bootstrapper.go

1fb25a1

Co-authored-by: Copilot <[email protected]> Signed-off-by: Ron Kuris <[email protected]>

Update utils/timer/eta.go

125f004

Co-authored-by: Copilot <[email protected]> Signed-off-by: Ron Kuris <[email protected]>

Use InDelta, not InEpsilon

ef8e4ee

Merge branch 'master' into rkuris/fix-time-estimates

60fea36

Moar lint

e5f6d39

rkuris commented Aug 21, 2025

View reviewed changes

utils/timer/eta_test.go Outdated Show resolved Hide resolved

rkuris requested review from joshua-kim, aaronbuchwald and a team August 22, 2025 17:48

joshua-kim requested review from a team and removed request for a team, alarso16, joshua-kim and aaronbuchwald August 22, 2025 19:30

Add some extra guards around division by zero

b7c0497

Recommended by copilot, and cna't hurt to make this more bullet proof.

rkuris requested a review from Copilot August 30, 2025 00:27

Copilot AI reviewed Aug 30, 2025

View reviewed changes

alarso16 reviewed Sep 2, 2025

View reviewed changes

Merge branch 'master' into rkuris/fix-time-estimates

6d8c4aa

alarso16 reviewed Sep 5, 2025

View reviewed changes

alarso16 approved these changes Sep 5, 2025

View reviewed changes

StephenButtolph reviewed Sep 9, 2025

View reviewed changes

rkuris added 13 commits September 11, 2025 12:27

Reviewer comment: remove ref to lowestSample

261d2ca

Reviewer comment: debug only when restarting

83bc28f

It's desired, but not quite sure why.

Reviewer comment: remove t.Run calls

6f42598

Reviewer comment: add blank line

398ccf7

golang seems to require this to actually indicate it's deprecated

Reviewer comment: remove named variables

a417341

Reviewer comment: remove maxSamples

b2380a8

Reviewer comment: improve comment

1e24439

Merge branch 'master' into rkuris/fix-time-estimates

d0d45be

Reviewer comment: simplify past the end cases

70fb25e

Adds tests cases as well

Rename remainingProgress -> remainingWork

f2ebf65

Bail if timeSinceOldest <= 0

0db0466

Reviewer comment: rename eta to etaPtr

c4ed490

Reviewer comments: add some negative test cases

b341898

rkuris requested a review from StephenButtolph September 11, 2025 20:16

StephenButtolph approved these changes Sep 11, 2025

View reviewed changes

StephenButtolph added this pull request to the merge queue Sep 11, 2025

Merged via the queue into master with commit 487fd9e Sep 11, 2025
35 checks passed

StephenButtolph deleted the rkuris/fix-time-estimates branch September 11, 2025 20:52

github-project-automation bot moved this from In Review 🔎 to Done 🎉 in avalanchego Sep 11, 2025

rkuris mentioned this pull request Sep 12, 2025

chore(bootstrap): [tech debt] statusUpdateFrequency can be removed #4271

Open

felipemadero pushed a commit that referenced this pull request Sep 15, 2025

fix: time remaining should use better estimates (#4196)

a76b03d

Signed-off-by: Ron Kuris <[email protected]> Co-authored-by: Copilot <[email protected]>

		percentComplete := float64(sample.completed) / float64(target)
		roundedPercentComplete := math.Round(percentComplete*10000) / 100 // Return percentage (0.0 to 100.0)

		shouldLog := numPreviouslyFetched/statusUpdateFrequency != numFetched/statusUpdateFrequency &&
		now.Sub(b.lastProgressUpdateTime) >= minimumLogInterval

-		shouldLog := numPreviouslyFetched/statusUpdateFrequency != numFetched/statusUpdateFrequency &&
-			now.Sub(b.lastProgressUpdateTime) >= minimumLogInterval
+		progressThresholdCrossed := numPreviouslyFetched/statusUpdateFrequency != numFetched/statusUpdateFrequency
+		timeThresholdCrossed := now.Sub(b.lastProgressUpdateTime) >= minimumLogInterval
+		shouldLog := progressThresholdCrossed && timeThresholdCrossed

fix: time remaining should use better estimates #4196

fix: time remaining should use better estimates #4196

Uh oh!

Conversation

rkuris commented Aug 21, 2025

Why this should be merged

How this works

How this was tested

Need to be documented in RELEASES.md?

Uh oh!

This comment was marked as outdated.

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Uh oh!

Copilot AI Aug 30, 2025

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Copilot AI Aug 30, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Aug 30, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

alarso16 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

rkuris Sep 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

rkuris Sep 7, 2025 •

edited

Loading