Skip to content

Added UserReplicaGroupMetrics #6463

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Jan 7, 2025
Merged

Conversation

7h3-3mp7y-m4n
Copy link
Contributor

What this PR does:
The main purpose of this pr is to add a goroutine to update tenant HA replica groups to metrics periodically.
Which issue(s) this PR fixes:
Fixes #6397

Checklist

  • Tests updated
  • Documentation added
  • CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX]

Signed-off-by: 7h3-3mp7y-m4n <[email protected]>
Copy link
Contributor

@yeya24 yeya24 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR. It looks great so far. Some suggestions

userReplicaGroupCount: promauto.With(reg).NewGaugeVec(prometheus.GaugeOpts{
Name: "ha_tracker_user_replica_group_count",
Help: "Number of HA replica groups tracked for each user.",
}, []string{"user"}),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to clean up the metric in CleanupHATrackerMetricsForUser so that when a user is not active anymore the corresponding user dimension is removed from the metric

time.Sleep(10 * time.Second)
t.updateUserReplicaGroupCount()
}
}()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we move this to the loop method?
You can follow cleanupOldReplicasLoop and do it similarly.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

okay :)

@@ -212,6 +218,12 @@ func NewHATracker(cfg HATrackerConfig, limits HATrackerLimits, trackerStatusConf
}
t.client = client
}
go func() {
for {
time.Sleep(10 * time.Second)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's do 30s interval and make the interval a const

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for sure! :)

`
require.NoError(t, testutil.GatherAndCompare(reg, strings.NewReader(expectedMetrics), "ha_tracker_user_replica_group_count"))
delete(tracker.replicaGroups, "user1")
tracker.userReplicaGroupCount.WithLabelValues("user1").Set(0)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no need set this manually.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

okay!

ha_tracker_user_replica_group_count{user="user2"} 3
ha_tracker_user_replica_group_count{user="user1"} 0
`
require.NoError(t, testutil.GatherAndCompare(reg, strings.NewReader(expectedMetricsAfterCleanup), "ha_tracker_user_replica_group_count"))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add CleanupHATrackerMetricsForUser for one test user and make sure ha_tracker_user_replica_group_count is cleaned up?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah why not :)

@7h3-3mp7y-m4n
Copy link
Contributor Author

hey @yeya24 , I’m currently working on the tests and ran into an issue. Whenever I use CleanupHATrackerMetricsForUse for cleanup, I get a nil pointer error. This might take me some time to resolve, so I’d really appreciate any thoughts or insights you might have on this!

@yeya24
Copy link
Contributor

yeya24 commented Jan 3, 2025

Hey @7h3-3mp7y-m4n, can you please share the error you got? And your test code if possible.

I checked out your code locally and tried to update the metrics clean up test. yeya24@9a35c7d
I didn't see any null pointer issue so maybe I missed something.

@7h3-3mp7y-m4n
Copy link
Contributor Author

Hey @7h3-3mp7y-m4n, can you please share the error you got? And your test code if possible.

I checked out your code locally and tried to update the metrics clean up test. yeya24@9a35c7d I didn't see any null pointer issue so maybe I missed something.

Hi @yeya24 , I'm sorry for the previous version of the test case, I was writing it manually, and it turned out a bit rough. I’ve now implemented the changes you suggested, and it's ready for your review. Thanks for your patience!

Copy link
Contributor

@yeya24 yeya24 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this PR still in progress or ready for review? I see it is still draft status

Signed-off-by: 7h3-3mp7y-m4n <[email protected]>
@7h3-3mp7y-m4n
Copy link
Contributor Author

Is this PR still in progress or ready for review? I see it is still draft status

Well, I made this as a draft so that I could review it and update you with my current progress. I think it's ready to review!

@7h3-3mp7y-m4n 7h3-3mp7y-m4n marked this pull request as ready for review January 7, 2025 13:23
Copy link
Contributor

@yeya24 yeya24 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks

@yeya24 yeya24 merged commit 7ec57ac into cortexproject:master Jan 7, 2025
17 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

HA Tracker: Expose per tenant metric of number of HA replica groups
2 participants