Skip to content

fix: randomise the initial grace period to avoid collisions #240

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Aug 3, 2022

Conversation

kruskall
Copy link
Member

The previous algorithm was using binary exponential-backoff with
a +- 10% jitter to calculate the grace period.
Because there can be multiple lambda environments we need to
mitigate collisions:

We cannot use 0 as the first delay because functions failing closer to
each other will collide. The issue would then be propagated by the
small jitter for lower delays.

This change adds an initial delay of n seconds to the first reconnection
attempt.
n is randomly generated in a closed interval to account for collisions
while keeping in mind usability and user experience.

Closes #188

Potential followup issue: make the interval configurable with an environment variable

@github-actions github-actions bot added the aws-λ-extension AWS Lambda Extension label Jul 17, 2022
@apmmachine
Copy link

apmmachine commented Jul 17, 2022

💚 Build Succeeded

the below badges are clickable and redirect to their specific view in the CI or DOCS
Pipeline View Test View Changes Artifacts preview preview

Expand to view the summary

Build stats

  • Start Time: 2022-08-03T19:40:25.075+0000

  • Duration: 5 min 10 sec

Test stats 🧪

Test Results
Failed 0
Passed 98
Skipped 32
Total 130

🤖 GitHub comments

To re-run your PR in the CI, just comment with:

  • /test : Re-trigger the build.

  • run elasticsearch-ci/docs : Re-trigger the docs validation. (use unformatted text in the comment!)

The previous algorithm was using binary exponential-backoff with
a +- 10% jitter to calculate the grace period.
Because there can be multiple lambda environments we need to
mitigate collisions:

We cannot use 0 as the first delay because functions failing closer to
each other will collide. The issue would then be propagated by the
small jitter for lower delays.

This change adds an initial delay of n seconds to the first reconnection
attempt.
n is randomly generated in a closed interval to account for collisions
while keeping in mind usability and user experience.
@kruskall kruskall force-pushed the fix/backoff-collisions branch from 77bb6d2 to 33b2471 Compare August 1, 2022 22:48
@kruskall kruskall merged commit 63d7186 into elastic:main Aug 3, 2022
@kruskall kruskall deleted the fix/backoff-collisions branch August 3, 2022 20:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
aws-λ-extension AWS Lambda Extension
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Revisit backoff grace period computation
4 participants