-
Notifications
You must be signed in to change notification settings - Fork 832
Generic KV Client Instrumentation #2648
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Generic KV Client Instrumentation #2648
Conversation
de55827
to
170eb06
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm fine with having generic metrics client only, although I've found ability to see individual Consul "CAS" calls useful in the past.
After some recent changes related to Cortex/Thanos metrics, I now think that passing name
to individual components that is only used for ConstLabels
is an anti-pattern:
- It is giving component more responsibility than it needs
- Name is useless is registrerer is nil
- Name is not used by sub-components (eg. if Consul client itself wanted to register its own metrics, that would create conflict)
Better way of handling multiple components with same metrics is to wrap registrerer to add constant label to all metrics registered by component: prometheus.WrapRegistererWith(prometheus.Labels{"name": "component"}, reg)
(if reg
is not nil
. extprom.WrapRegistererWith()
from Thanos handles nil
case as well), and pass such wrapped registrer to component. This solves all problems mentioned above.
/cc @pracucci to hear your opinion on this as well
pkg/ring/kv/metrics.go
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's make sure to do that before merging please.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree. This TODO should be removed (and the default register never used).
pkg/ring/kv/client.go
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think passing name
to components that is used just for metrics registration is an anti-pattern. Caller of this call can easily wrap reg
into prometheus.WrapRegistererWith(prometheus.Labels{"name": "some name here"}, reg)
if it wishes to do so. I would prefer to use this pattern instead. KV store client doesn't need a name for anything else, and it also doesn't need it if registrerer is nil
.
pkg/ring/lifecycler.go
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a great opportunity to pass registrerer to lifecycler and move lifecycler metrics away from global variables.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But maybe in another PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will pass the Registerer but keep the lifecycler metrics global for now.
pkg/ring/kv/consul/client.go
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Previously we could see individual CAS
calls to consul. Now we will only see "CAS loop".
Agree. It's more flexible. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good job @jtlisi! I left few comments as well.
pkg/ring/kv/metrics.go
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree. This TODO should be removed (and the default register never used).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The generated name here looks a bit weird to me. Maybe querier-store-gateway
may be more clear?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good, I wasn't sure how best to name some of these clients.
pkg/ring/kv/consul/client.go
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now that I notice, I think previous version was even wrong. We did track "Put" twice for 1 Put call. Once here, and once in consul/metrics.go
. Am I missing anything?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After investigating further, the consul metrics probably should remain as is. They provide lower level metrics for the client interacting with consul itself: https://github.com/hashicorp/consul/blob/master/api/kv.go
In our KV implementation CAS is retried multiple times and by using a generic metrics client we are missing out on instrumenting some of these calls. One option that may be used to solve this problem is moving retries to its own KV implementation and wrap clients with it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am going to re-add the separate consul instrumentation and will add an issue to add a generic retry implementation for the KV store interface. WDYT?
pkg/ring/kv/metrics.go
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please mention this change in the CHANGELOG. You should also check if it has an impact on the cortex mixin's dashboards (depends on how we group series, I haven't checked).
pkg/ring/lifecycler.go
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But maybe in another PR?
pkg/ring/kv/consul/metrics.go
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please remember to mention in the CHANGELOG that the metric cortex_consul_request_duration_seconds
has been removed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I re-added this metric since it fits a different use case.
Signed-off-by: Jacob Lisi <[email protected]>
Signed-off-by: Jacob Lisi <[email protected]>
Signed-off-by: Jacob Lisi <[email protected]>
Signed-off-by: Jacob Lisi <[email protected]>
Signed-off-by: Jacob Lisi <[email protected]>
Signed-off-by: Jacob Lisi <[email protected]>
Signed-off-by: Jacob Lisi <[email protected]>
Signed-off-by: Jacob Lisi <[email protected]>
Signed-off-by: Jacob Lisi <[email protected]>
ff4541e
to
40c71c7
Compare
I updated this PR per your comments. Changes include:
I will test this PR w/ the cortex-mixin this week and ensure it operates correctly with the new metrics. If you have anymore feedback about the naming of KV clients let me know. I wasn't sure if some of the names/suffixes I chose made sense. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have couple of nits, but lgtm overall. Thanks!
pkg/distributor/distributor_test.go
Outdated
UpdateTimeout: 100 * time.Millisecond, | ||
FailoverTimeout: time.Second, | ||
}) | ||
}, prometheus.NewRegistry()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
}, prometheus.NewRegistry()) | |
}, nil) |
Since we don't verify metrics.
pkg/ring/kv/metrics.go
Outdated
// If no Registerer is provided return the raw client | ||
if reg == nil { | ||
return c | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: I'd move this check into createClient
function.
pkg/ring/kv/metrics.go
Outdated
return nil | ||
} | ||
|
||
return prometheus.WrapRegistererWith(prometheus.Labels{"name": name}, reg) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it worth using something more unique, e.g. kv-name
here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That makes sense. I was just basing it off the label we use for the ring.
pkg/ring/kv/metrics.go
Outdated
primaryLabel = prometheus.Labels{"role": "primary"} | ||
secondaryLabel = prometheus.Labels{"role": "secondary"} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Only used from client.go, I would move it into that file.
CHANGELOG.md
Outdated
|
||
## master / unreleased | ||
|
||
* [CHANGE] Metric `cortex_kv_request_duration_seconds` now includes `name` label to denote which client is being used as well as the `backend` label to denote the KV backend implemnetation in use. #2648 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
* [CHANGE] Metric `cortex_kv_request_duration_seconds` now includes `name` label to denote which client is being used as well as the `backend` label to denote the KV backend implemnetation in use. #2648 | |
* [CHANGE] Metric `cortex_kv_request_duration_seconds` now includes `name` label to denote which client is being used as well as the `backend` label to denote the KV backend implementation in use. #2648 |
pkg/distributor/distributor_ring.go
Outdated
var ( | ||
RingNameForClient = "distributor" | ||
) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
var ( | |
RingNameForClient = "distributor" | |
) | |
const ( | |
ringNameForClient = "distributor" | |
) |
And move to distributor_ring.go
where it is used.
(Alternatively, just inline it)
pkg/ruler/ruler.go
Outdated
ringStore, err := kv.NewClient( | ||
cfg.Ring.KVStore, | ||
ring.GetCodec(), | ||
kv.RegistererWithKVName(reg, ring.RulerRingKey), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
kv.RegistererWithKVName(reg, ring.RulerRingKey), | |
kv.RegistererWithKVName(reg, "ruler"), |
ring.RulerRingKey
is actually ring
.
pkg/storegateway/gateway.go
Outdated
ringStore, err = kv.NewClient( | ||
gatewayCfg.ShardingRing.KVStore, | ||
ring.GetCodec(), | ||
kv.RegistererWithKVName(reg, RingNameForClient), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
kv.RegistererWithKVName(reg, RingNameForClient), | |
kv.RegistererWithKVName(reg, "store-gateway"), |
Signed-off-by: Jacob Lisi <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks for patiently address my comments!
What this PR does:
Which issue(s) this PR fixes:
Fixes #2484
Checklist
CHANGELOG.md
updated - the order of entries should be[CHANGE]
,[FEATURE]
,[ENHANCEMENT]
,[BUGFIX]