Skip to content

Conversation

stevesg
Copy link
Contributor

@stevesg stevesg commented Apr 14, 2021

The test had two issues:

  1. The most common issue was that replicating the silence sometimes
    took a small amount of time, as it happens asynchronously. Replacing
    Equal with Eventually when checking the replication metrics is enough
    to make the tests reliable.

  2. The second issue was more rare: When a particular shard does not have
    any users assigned to it, and the test randomly picks that shard
    to write the silence to, the test would panic (in rand.Intn). Due to
    the alertmanagersMtx being held, the test actually just hung up while
    trying to shutdown the alertmanagers.

Signed-off-by: Steve Simpson [email protected]

The test had two issues:

1. The most common issue was that replicating the silence sometimes
   took a small amount of time, as it happens asynchronously. Replacing
   Equal with Eventually when checking the replication metrics is enough
   to make the tests reliable.

2. The second issue was more rare: When a particular shard does not have
   any users assigned to it, _and_ the test randomly picks _that_ shard
   to write the silence to, the test would panic (in rand.Intn). Due to
   the alertmanagersMtx being held, the test actually just hung up while
   trying to shutdown the alertmanagers.

Signed-off-by: Steve Simpson <[email protected]>
Copy link
Contributor

@gotjosh gotjosh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree on case 1, for case 2 I'd appreciate a set of eyes from other maintainers as I'm not entirely sure I understand why this doesn't work as-is and we might be missing something from the core logic.

Copy link
Contributor

@gotjosh gotjosh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

Copy link
Contributor

@pracucci pracucci left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for fixing it!

Copy link
Contributor

@pstibrany pstibrany left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm thanks!

Signed-off-by: Steve Simpson <[email protected]>
@pracucci pracucci merged commit cf724d7 into cortexproject:master Apr 15, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants