Skip to content

Conversation

sandeepsukhani
Copy link
Contributor

What this PR does:
Adds some metrics for monitoring delete requests. It also eliminates duplicate entries of chunks which overlap multiple plans/days.
purger_delete_requests_received_total: Number of delete requests received per user.
purger_delete_requests_processed_total: Number of delete requests processed per user.
purger_delete_requests_chunks_selected_total: Number of chunks selected while building delete plans per user.
purger_delete_requests_processing_failure: Delete requests processing failure for each user.

Which issue(s) this PR fixes:
Fixes #

Checklist

  • Tests updated
  • Documentation added
  • CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX]

Copy link
Contributor

@pracucci pracucci left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you mention the new metrics in the CHANGELOG, please?

}, []string{"user"})
m.deleteRequestsProcessingFailures = prometheus.NewGaugeVec(prometheus.GaugeOpts{
Namespace: "cortex",
Name: "purger_delete_requests_processing_failure",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you think to add an s to failure?

Suggested change
Name: "purger_delete_requests_processing_failure",
Name: "purger_delete_requests_processing_failures",

func newPurgerMetrics(r prometheus.Registerer) *purgerMetrics {
m := purgerMetrics{}

m.deleteRequestsProcessedTotal = prometheus.NewCounterVec(prometheus.CounterOpts{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of calling r.MustRegister() below, could you create the metrics with promauto.With(r).NewCounterVec() (and same for NewGaugeVec())? It's the pattern we're heading to.

func newDeleteRequestHandlerMetrics(r prometheus.Registerer) *deleteRequestHandlerMetrics {
m := deleteRequestHandlerMetrics{}

m.deleteRequestsReceivedTotal = prometheus.NewCounterVec(prometheus.CounterOpts{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

promauto.With(r) here as well, please.

Name: "purger_delete_requests_chunks_selected_total",
Help: "Number of chunks selected while building delete plans per user",
}, []string{"user"})
m.deleteRequestsProcessingFailures = prometheus.NewGaugeVec(prometheus.GaugeOpts{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed offline, this should be a counter.

Signed-off-by: Sandeep Sukhani <[email protected]>
Copy link
Contributor

@pstibrany pstibrany left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, with some nitpicking comments.

"strings"

"github.com/prometheus/client_golang/prometheus"

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove new line please, and merge these two import groups.

}, []string{"user"})
m.deleteRequestsProcessingFailures = promauto.With(r).NewCounterVec(prometheus.CounterOpts{
Namespace: "cortex",
Name: "purger_delete_requests_processing_failures",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing _total suffix for counter. I don't like it, but that's what the convention is.

// groups chunks together by unique label sets i.e all the chunks with same labels would be stored in a group
// chunk details are stored in groups for each unique label set to avoid storing them repetitively for each chunk
func groupChunks(chunks []chunk.Chunk, deleteFrom, deleteThrough model.Time) []ChunksGroup {
func groupChunks(chunks []chunk.Chunk, deleteFrom, deleteThrough model.Time, includedChunkIDs map[string]struct{}) ([]ChunksGroup, map[string]struct{}) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it more like processedChunkIDs?

Also there is no need to return it back from the function – modifications are already visible to the caller. Function comment should however mention this fact (that passed map is modified).

Copy link
Contributor

@pracucci pracucci left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @sandeepsukhani ! Let's fix the CHANGELOG and then we can merge it 🎉

CHANGELOG.md Outdated
Comment on lines 25 to 28
- `purger_delete_requests_received_total`: Number of delete requests received per user.
- `purger_delete_requests_processed_total`: Number of delete requests processed per user.
- `purger_delete_requests_chunks_selected_total`: Number of chunks selected while building delete plans per user.
- `purger_delete_requests_processing_failures_total`: Number of delete requests processing failures per user.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- `purger_delete_requests_received_total`: Number of delete requests received per user.
- `purger_delete_requests_processed_total`: Number of delete requests processed per user.
- `purger_delete_requests_chunks_selected_total`: Number of chunks selected while building delete plans per user.
- `purger_delete_requests_processing_failures_total`: Number of delete requests processing failures per user.
- `cortex_purger_delete_requests_received_total`: Number of delete requests received per user.
- `cortex_purger_delete_requests_processed_total`: Number of delete requests processed per user.
- `cortex_purger_delete_requests_chunks_selected_total`: Number of chunks selected while building delete plans per user.
- `cortex_purger_delete_requests_processing_failures_total`: Number of delete requests processing failures per user.

Signed-off-by: Sandeep Sukhani <[email protected]>
@sandeepsukhani sandeepsukhani force-pushed the delete-series-metrics branch from 4603c91 to f1a4dce Compare April 17, 2020 14:12
@pracucci pracucci merged commit 9245c31 into cortexproject:master Apr 17, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants