-
Notifications
You must be signed in to change notification settings - Fork 816
Executing child queries on queriers #1730
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
89ca6fd
to
11bc349
Compare
97713c4
to
2fc0914
Compare
ba5d4da
to
440d515
Compare
edfb63d
to
68295bd
Compare
pkg/chunk/series_store.go
Outdated
@@ -265,6 +265,7 @@ func (c *seriesStore) lookupSeriesByMetricNameMatchers(ctx context.Context, from | |||
|
|||
// Just get series for metric if there are no matchers | |||
if len(matchers) == 0 { | |||
level.Debug(log).Log("msg", "lookupSeriesByMetricNameMatchers: empty matcher") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
probably unnecessary to redeclare lookupSeriesByMetricNameMatchers
as a spanlogger with this name is created above
pkg/chunk/series_store.go
Outdated
if shard != nil { | ||
matchers = append(matchers[:shardLabelIndex], matchers[shardLabelIndex+1:]...) | ||
// Just get series for metric if there are no matchers |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we'd still need to filter the resulting seriesIDs here. I think it can more be more succinctly expressed by calculating the shard/splicing shard labels then running the len(matchers) == 0
logic once.
d92db3d
to
cf022c8
Compare
cf022c8
to
7144afe
Compare
0d2444f
to
dd72c85
Compare
3acf0fe
to
582788f
Compare
Signed-off-by: Owen Diehl <[email protected]>
Signed-off-by: Owen Diehl <[email protected]>
Signed-off-by: Owen Diehl <[email protected]>
Signed-off-by: Owen Diehl <[email protected]>
Signed-off-by: Owen Diehl <[email protected]>
582788f
to
b3efc15
Compare
deprecated in favor of #1878 |
Disclaimer
This is not finished, but I'd like feedback on the design.
What
These changes aim to introduce a path towards further distributing queries. Currently Cortex dispatches queries to backend queriers, but as the throughput of metrics increase, running entire queries on a single querier can become a bottleneck.
Prior Art
Sharding
The v10 schema introduced a shard factor for data which spreads series across n shards. This is a prerequisite for allowing us to query data from these shards in parallel.
Problem
Although the v10 schema change introduces a shard factor, all shards must still be processed on a single querier. This is compounded by the fact that query ASTs are not broken down before execution. Therefore, while sharding lets us split up data at it's source location, we still re-aggregate it and process it all in one place. The two goals of this PR are to fix these, by
Details
Mapping ASTs
Firstly, we introduce a package called
astmapper
with the interface:This interface is used to map ASTs into different ASTs. We use this to turn i.e.
sum by (foo) (rate(bar{baz=”blip”}[1m]))
->This works largely because summing the sub-sums is equivalent. The principle should hold true for any merge operation that is associative, but sums are the initial focus.
Hijacking Queryable + embedding queries
Queries are executed in unison by a
promql.Engine
and astorage.Queryable
. Since the Engine is a concrete type, it's implementation is locked. This means that the remaining option is to hijack the queryable to dispatch child queries. However, the queryable is only called to retrieve vector and matrix selector nodes -- all aggregations, functions, etc are handled by the Engine itself. Therefore, in order to regain control of these entire subtrees, we must encode them in vector/matrix selectors. This is done by stringifying an entire subtree and replacing the node with a vector or matrix selector. Currently queries are hex-encoded, but this could be something more human-friendly.Using our previous example with a shard-factor of 3,
sum by(foo) (rate(bar1{baz="blip"}[1m]))
is turned intoThe queryable implementation will look for these
__embedded_query__
and__cortex_query__
labels and upon finding them, shell out to a downstream querier with the encoded query. The Engine will then reassemble the resulting SeriesSet, applying parent operations and merging multiple child queries as necessary.Remaining Work
label grouping should include shard labels
sum by (foo)
when sharded will return a vector with labels that only includefoo=<value>
. Due to the merge behavior of the union operator, (or), this would result in discarding data in later vectors which have the same label value for foo. Therefore, we need to turn these into:Improve AST mapping
Splitting non-sum queries
We need a way to handle shard splitting for non-parallelized queries. In the sum example, we introduce
__cortex_shard__
labels in the AST and parallelize them. We need to ensure we're querying the right shards for non-sum queries as well. This may be handled either in the AST (ideal as it isolates logic) or in the backend (i.e. a backend could detect which queries do not have__cortex_shard__
labels and fan-out/collect them at that level)Better logic for determining which subtrees to execute
implementedthis optimization will be left for a later prCurrently, parallelizable sums will be executed on a downstream querier, but non-sum subtrees will not. In the example
rate(<selector>[5m])
, the selector matrix would be dispatched to a querier, but the rate would be computed over the entire series by the frontend. This is certainly suboptimal and I'll be adding more logic to correct this.Custom impls for specific functions
implementedthis optimization will be left for a later prAs an example, the
average
function is not associative and is thus difficult to combine across shards without knowing the number of data points the average was calculated over. However, average may be remapped to be asum(count)
, which is associative and parallelizes nicely.Injecting shard configurations into the frontend
This could be done similarly to how
PeriodConfigs
are used to create composite stores (i.e. as a configuration in a yaml file) or some other method.Nice to haves/Ideal Solution/etc
promql.Engine
so that evaluation wouldn't need to be handled in thestorage.Queryable
interface. It would remove the need to splice in selectors with encoded queries and allow for a cleaner, less obfuscated implementation. This may be a significant lift, though. As an example, the interface may look something like