fix: randomise the initial grace period to avoid collisions #240

kruskall · 2022-07-17T18:19:57Z

The previous algorithm was using binary exponential-backoff with
a +- 10% jitter to calculate the grace period.
Because there can be multiple lambda environments we need to
mitigate collisions:

We cannot use 0 as the first delay because functions failing closer to
each other will collide. The issue would then be propagated by the
small jitter for lower delays.

This change adds an initial delay of n seconds to the first reconnection
attempt.
n is randomly generated in a closed interval to account for collisions
while keeping in mind usability and user experience.

Closes #188

Potential followup issue: make the interval configurable with an environment variable

apmmachine · 2022-07-17T18:29:51Z

💚 Build Succeeded

the below badges are clickable and redirect to their specific view in the CI or DOCS

Expand to view the summary

Build stats

Start Time: 2022-08-03T19:40:25.075+0000
Duration: 5 min 10 sec

Test stats 🧪

Test	Results
Failed	0
Passed	98
Skipped	32
Total	130

🤖 GitHub comments

To re-run your PR in the CI, just comment with:

/test : Re-trigger the build.
run elasticsearch-ci/docs : Re-trigger the docs validation. (use unformatted text in the comment!)

apm-lambda-extension/extension/apm_server_transport.go

The previous algorithm was using binary exponential-backoff with a +- 10% jitter to calculate the grace period. Because there can be multiple lambda environments we need to mitigate collisions: We cannot use 0 as the first delay because functions failing closer to each other will collide. The issue would then be propagated by the small jitter for lower delays. This change adds an initial delay of n seconds to the first reconnection attempt. n is randomly generated in a closed interval to account for collisions while keeping in mind usability and user experience.

github-actions bot added the aws-λ-extension AWS Lambda Extension label Jul 17, 2022

marclop reviewed Jul 18, 2022

View reviewed changes

apm-lambda-extension/extension/apm_server_transport.go Outdated Show resolved Hide resolved

kruskall force-pushed the fix/backoff-collisions branch from 77bb6d2 to 33b2471 Compare August 1, 2022 22:48

simitt approved these changes Aug 2, 2022

View reviewed changes

kruskall added 2 commits August 3, 2022 10:29

Merge branch 'main' into fix/backoff-collisions

817cf1f

changelog: add changelog entry

a34a8ee

kruskall merged commit 63d7186 into elastic:main Aug 3, 2022

kruskall deleted the fix/backoff-collisions branch August 3, 2022 20:04

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix: randomise the initial grace period to avoid collisions #240

fix: randomise the initial grace period to avoid collisions #240

Uh oh!

kruskall commented Jul 17, 2022

Uh oh!

apmmachine commented Jul 17, 2022 •

edited

Loading

Build stats

Test stats 🧪

Uh oh!

Uh oh!

Uh oh!

fix: randomise the initial grace period to avoid collisions #240

fix: randomise the initial grace period to avoid collisions #240

Uh oh!

Conversation

kruskall commented Jul 17, 2022

Uh oh!

apmmachine commented Jul 17, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

💚 Build Succeeded

Build stats

Test stats 🧪

🤖 GitHub comments

Uh oh!

Uh oh!

Uh oh!

apmmachine commented Jul 17, 2022 •

edited

Loading