Allow filtering via query parameters #135

fabxc · 2015-06-11T15:53:32Z

For federation we can provide match[] query parameters with metric selectors to ask for specific metrics. By default none are shown.

Regular scrape targets should provide the same options but show all metrics by default.

The text was updated successfully, but these errors were encountered:

beorn7 · 2016-08-19T22:03:06Z

This has to be sync'd with other client libraries, obviously.
Perhaps a topic for the roadmap meeting at PromCon.

fabxc · 2016-08-20T01:28:13Z

Seems very hard to get adopted. Per-exporter parameters to enable certain
collectors seems more useful an independent in general.

So far number of metrics in collection didn't seem to be a bottleneck, no?

On Sat, Aug 20, 2016, 12:03 AM Björn Rabenstein [email protected]
wrote:

This has to be sync'd with other client libraries, obviously.
Perhaps a topic for the roadmap meeting at PromCon.

—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
#135 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AEuA8tUQfzNMnbdSnpKgAxF_eAMgEAJpks5qhigcgaJpZM4E_4JK
.

grobie · 2016-08-20T02:23:04Z

Using flags to control active collectors will probably stay the way to go for one-purpose exporters. Where this feature will become interesting is when someone wants to scrape different metrics in different scrape intervals and the other collectors incur a substantial overhead. The MySQL exporter would be a good example for that.

brian-brazil · 2016-08-20T07:57:07Z

Yeah, I want to be able to do that and I know @SuperQ wants it too.

The main question for me is whether these parameters would be the names of collectors or timeseries.

SuperQ · 2016-08-20T08:57:36Z

I think the easiest thing to do is simply pass URL params as key/value pairs to the Collector interface.

brian-brazil · 2016-08-20T09:07:42Z

If we pass URL parameters to collectors directly, there will be little standardisation.

My thinking is more that it'd work via Descs, so that only collectors that match the given timeseries names get called.

beorn7 · 2016-08-20T09:30:34Z

The implementation details will be fairly easy to figure out once we have settled on the HTTP API (which should be the same across client libraries). Since the choice there is not obvious, we need to find a common way first.

SuperQ · 2016-08-20T12:21:41Z

We already lack standards for URL params. snmp_exporter uses address and blackbox_exporter uses target. I think we need the flexibility to allow for arbitrary collector params, and things should be "standardized" via community best practices docs, like we do for metric names and labels.

fabxc · 2016-08-20T12:49:09Z

I agree with that. Whatever I'm monitoring, I'll have to look up valid values for whatever parameters anyway – might just as well use parameters given in the docs. "Collectors" are no first-class concept in Prometheus and shooting for more than best practices and guidance doesn't seem like well-invested effort.

Selecting by time series is another approach. But at the instrumentation level efforts are rather separated along collectors and inferring these from time series descs seems rather fragile as metrics can be added and dropped.
That's what I was referring to in my last comment: selecting only certain time series seems to optimize a problem currently addressed by metric_relabel_configs. I'm not sure it's an actual problem at this point. It only indirectly addresses the issue of some collectors being expensive.

SuperQ · 2016-08-20T13:00:43Z

Selecting based on what amounts to collector-side metric_relabel_configs is problematic when dealing with pass-through Const metrics, like node_exporter or mysqld_exporter. I need to have params such that I can toggle specific features on and off depending on my scrape need. Even being able to adjust some params of the collector at scrape time would be useful.

brian-brazil · 2016-08-20T13:03:37Z

I think standard should go beyond that. Saying things are already slightly non-standard for a similar use case so therefore we can't have enforced standards would prevent us from ever having hard standards.

I think we should start from how this should work for direct instrumentation (as that's what most instrumentation will be), and then extend that to custom collectors. And then extend that to custom collectors that don't know what time series they'll be exporting, as they're rare and we shouldn't have the tail wagging the dog.

I'm against arbitrary data being passed to custom collectors, there's too much scope for it being abused.

I'm not sure it's an actual problem at this point.

It's an actual problem. We've had several users looking for it.

brian-brazil · 2016-08-28T07:21:13Z

I just had a thought while pondering the WMI exporter.

What if we made Desc automatic?

I see three classes of collector:

Desc is known statically (e.g. direct instrumentation, disk stats in node exporter)
Desc depends on the system, but is basically static at runtime (e.g. mysql exporter, netstat in node exporter)
Desc is dynamic (e.g. jmx exporter)

For the first two classes, why not call Collect at registration time and extract the Desc that way?

For the 3rd case, we're unlikely to ever get query filtering working in a generic fashion and these collectors may be expensive so we'd need to make this opt-out/opt-in/overridable.

There'd also be cases where Collect is expensive or wasteful (e.g. an ephemeral registry for using the pushgateway). We could handle this via the Collector opt-out/opt-in or have it as a per-registry setting that's enabled on the default registry.

What all this gives us is the information we need inside the registry to call exactly the collectors we require, and then filter down to just the timeseres the user wants. It also gives us a simpler API for users to write custom collectors as in most cases they never need to worry about Descs.

beorn7 · 2016-08-30T13:44:29Z

I have been pondering the idea of automatic "description by collection" for quite some time (since December 2013 or so). Sometimes, it will work great, but sometimes it would defeat the whole idea of the Desc. That's the case for any MetricVec pre-initialization of any values. (Yes, we encourage to initialize all the children beforehand, but that's not always possible.) Such a MetricVec might collect nothing, although the Desc is perfectly known.

What I came up with is a helper function DescribeByCollect that would make it easy to implement Describe by calling Collect and reap the Descs from the results. But it has to be opt-in.

All of this is, however, not directly related to the issue at hand here. Unfortunately, we didn't get to talk about it during the Saturday symposium. I could imagine both ways, either specify the names of the metrics to retrieve in some pattern matching way (globs or regexp — full PromQL matchers would be nice but too much to ask from client libraries) or pass through some opaque URL parameters (which would show up at some point in the API of the client libraries; those would be quite specific to each exporter etc.).

Are there any use cases that could not be solved with the former approach?

brian-brazil · 2016-08-30T14:18:01Z

We just need something that always works for direct instrumentation, and covers the majority of custom collector use cases out of the box. For example with the standard use of Python style ConstMetrics you'll still get an empty metric when you call collect on something that has no label values yet.

I think we should do this by passing in time series names/regexes. I think labels would be a bit much in practice.

All of this is, however, not directly related to the issue at hand here.

It's a key point, as without collector information we can't know which collectors to use.

Are there any use cases that could not be solved with the former approach?

Yes, and I believe we shouldn't support them. Anything that would require opaque information should be handled by the exporter itself, with no assistance from the client library. Users implementing snmp-exporter like mechanisms via parameters meant for filtering collectors doesn't seem like it would lead to a sane ecosystem.

beorn7 · 2016-08-30T14:59:59Z

I think we should do this by passing in time series names/regexes. I think labels would be a bit much in practice.

Yes, that's what I would go for. Just fishing for contrary opinions before investing any effort in implementations.

All of this is, however, not directly related to the issue at hand here.

It's a key point, as without collector information we can't know which collectors to use.

But that's already solved. You were just talking about how to make it more convenient to implement.

brian-brazil · 2016-08-30T17:02:10Z

But that's already solved. You were just talking about how to make it more convenient to implement.

Only for the 1st class of exporters I listed above. That doesn't help the mysql exporter as-is.

The Go Desc seems to be missing the metric type too.

beorn7 · 2016-08-30T21:45:01Z

Only for the 1st class of exporters I listed above. That doesn't help the mysql exporter as-is.

"Describe by collect" might simplify things (but not if metrics show up at runtime). As said, there will be tooling for that.

The Go Desc seems to be missing the metric type too.

That will be fixed in v0.10.0, see #222

But the type wouldn't help in a decision based on the metric name anyway.

brian-brazil · 2016-08-30T22:10:14Z

"Describe by collect" might simplify things (but not if metrics show up at runtime). As said, there will be tooling for that.

I think we can ignore the case of metrics that are fully dynamic at runtime. Considering the systems we know with those semantics (JMX being the main one), I'm not sure how to try to handle this feature let alone do it in a sane generic fashion that'd be suitable for adding to the APIs of all clients.

beorn7 · 2016-08-30T22:36:25Z

In the fully dynamic page (or in other words: no Desc or at least no complete Desc), the client library can still collect all the metrics but then don't send them out via HTTP.

brian-brazil · 2016-08-30T22:53:54Z

For the JMX exporter that'd defeat the purpose, as it's the running of the collector that's the expensive bit. You'd need something quite custom (filtering mBeans? additional config like snmp/blackbox?) to make it actually useful, against just using a metric_relabel_config.

beorn7 · 2017-01-05T15:41:16Z

This has been discussed and implemented in the Python client library, see prometheus/client_python#110 . Bottom line is that this feature will be pretty specific to certain implementations, and we don't really aim for a strictly standardized interface across all client libs and use cases. Nevertheless, the implementation here should be as similar as possible.

To avoid confusion with details of the Desc, let's do this only in v0.10 or later.

finkr · 2019-06-04T12:58:01Z

For the record, this has been implemented in node_exporter, see prometheus/node_exporter#699.
(The query argument name is collect[]).

beorn7 · 2019-06-04T13:02:57Z

Yes, that's kind of the point of this issue: Can the node_exporter approach be generalized? node_exporter uses custom collector names, while this is more fishing for filtering by metric prefix or some regexp (which in turn has raised the concern we are starting to implement a metrics query engine within client_golang).

beorn7 · 2021-06-02T20:01:19Z

I mostly assigned this to myself as a “default” because I used to be the maintainer of this repo. Therefore, I'm un-assigning this from myself now. New maintainers @bwplotka & @kakkoyun, please deal with this as you see fit.

bwplotka · 2021-06-02T20:36:20Z

Thanks, we talked about that with @beorn7 and @kakkoyun and it would be amazing to have it, but I feel it should be first tackled on OpenMetrics side.

Let's make sure there is ticket for that there.

brian-brazil · 2021-06-02T20:52:02Z

This has been a feature of client_python and client_java for many years via the name[] parameter, you could work from that implementation. I don't see any connection to OpenMetrics here.

bwplotka · 2021-06-02T21:34:56Z

Hm, why there is no connection?

I thought OM is a protocol that standardises scraping logic. If we standardise path parameter and logic used for filtering, we could allow tools like Prometheus (or any other scrapers) to improve scraping huge targets (e.g kube-state metrics), no? 🤔

brian-brazil · 2021-06-02T21:43:03Z

I thought OM is a protocol that standardises scraping logic.

OM is primarily a metrics format, as long as you return something reasonable and compliant in response to a simple HTTP request it is good. What HTTP paths and query options you might also support or what effects they may have is not specified, and indeed many different exposers of the existing formats do different things in accordance with their needs and that is fine.

RichiH · 2022-04-28T16:02:24Z

For future reference, we touched on this during today's dev summit, 2022-04-28 https://docs.google.com/document/d/11LC3wJcVk00l8w5P3oLQ-m3Y37iom6INAMEu2ZAGIIE/edit#

ringerc · 2024-07-28T23:04:31Z

This remains a problem for exporters that want to support any sort of selective scrape on a metrics endpoint.

For example, I want to be able to support scraping /metrics?set=expensive at lower frequency than /metrics?set=cheap. Sure, entirely parallel listeners and endpoints can be used for this oversimplified case, but quickly falls down for anything less trivial.

I'd like to be able to query metrics for one tenant's workloads to forward to them for example. /metrics?tenant_id=42. The scrape should only generate metrics for that tenant, reducing wasted work generating metrics that will be thrown away, and reducing the risk of cross-tenant data leaks. In this case a custom http handler can be used to dynamically generate /metrics/tenant/{tenant_id}/ handlers, but it's clunky and verbose. If any other sort of filtering is desired it gets worse.

It would be immensely helpful if promhttp exposed a simple, sensible means of filtering by query parameters.

I worked around it, but the workaround is ugly and has to duplicate more code from promhttp than I'd like.

SuperQ · 2024-07-29T06:55:23Z

The scrape should only generate metrics for that tenant,

This implies that you are doing different work than a normal registry scrape. I don't think there's a lot that can be changed in client_golang to improve on that. Take a look at how exporters like the node_exporter or mysqld_exporter support query parameters.

fabxc mentioned this issue Aug 31, 2016

Make metric collection synchronous kubernetes/kube-state-metrics#5

Closed

beorn7 added this to the v0.10 milestone Jan 5, 2017

beorn7 mentioned this issue Jan 5, 2017

Handle query parameter of a scrape #267

Open

beorn7 added the feature request label Jan 5, 2017

beorn7 self-assigned this Jan 5, 2017

beorn7 mentioned this issue Apr 12, 2017

how to query a single metric #294

Closed

juliusv mentioned this issue Apr 14, 2017

Should containerd provide image filesystem metrics? containerd/containerd#678

Closed

beorn7 mentioned this issue Dec 9, 2019

[FR] Filter '^go_*' or '^go_memstats_*' by default prometheus/node_exporter#1564

Closed

finkr mentioned this issue Mar 5, 2020

Allow filtering via query parameters prometheus-community/windows_exporter#479

Closed

nathanjsweet mentioned this issue Jul 1, 2020

Add Filtering Functionality to Registry and HTTP Handler #778

Closed

beorn7 mentioned this issue Feb 12, 2021

Conditional collecting based on HTTP header #837

Closed

beorn7 removed their assignment Jun 2, 2021

beorn7 mentioned this issue Jan 14, 2022

Decouple metrics in NewProcessCollector #578

Closed

bwplotka added OpenMetrics OpenMetrics v1 and removed OpenMetrics OpenMetrics v1 labels Oct 17, 2022

bwplotka mentioned this issue Jul 8, 2024

Propagate context.Context through Gather and Collect #1538

Open

Allow filtering via query parameters #135

Allow filtering via query parameters #135

Comments

fabxc commented Jun 11, 2015

beorn7 commented Aug 19, 2016

Uh oh!

fabxc commented Aug 20, 2016

Uh oh!

grobie commented Aug 20, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

brian-brazil commented Aug 20, 2016

Uh oh!

SuperQ commented Aug 20, 2016

Uh oh!

brian-brazil commented Aug 20, 2016

Uh oh!

beorn7 commented Aug 20, 2016

Uh oh!

SuperQ commented Aug 20, 2016

Uh oh!

fabxc commented Aug 20, 2016

Uh oh!

SuperQ commented Aug 20, 2016

Uh oh!

brian-brazil commented Aug 20, 2016

Uh oh!

brian-brazil commented Aug 28, 2016

Uh oh!

beorn7 commented Aug 30, 2016

Uh oh!

brian-brazil commented Aug 30, 2016

Uh oh!

beorn7 commented Aug 30, 2016

Uh oh!

brian-brazil commented Aug 30, 2016

Uh oh!

beorn7 commented Aug 30, 2016

Uh oh!

brian-brazil commented Aug 30, 2016

Uh oh!

beorn7 commented Aug 30, 2016

Uh oh!

brian-brazil commented Aug 30, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

beorn7 commented Jan 5, 2017

Uh oh!

finkr commented Jun 4, 2019

Uh oh!

beorn7 commented Jun 4, 2019

Uh oh!

beorn7 commented Jun 2, 2021

Uh oh!

bwplotka commented Jun 2, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

brian-brazil commented Jun 2, 2021

Uh oh!

bwplotka commented Jun 2, 2021

Uh oh!

brian-brazil commented Jun 2, 2021

Uh oh!

RichiH commented Apr 28, 2022

Uh oh!

ringerc commented Jul 28, 2024

Uh oh!

SuperQ commented Jul 29, 2024

Uh oh!

grobie commented Aug 20, 2016 •

edited

Loading

brian-brazil commented Aug 30, 2016 •

edited

Loading

bwplotka commented Jun 2, 2021 •

edited

Loading