Skip to content

Query Federation Incorrectly Returns Series for Wrong Tenant with __tenant_id__ Label Matcher #5941

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
CharlieTLe opened this issue May 10, 2024 · 3 comments · Fixed by #5943

Comments

@CharlieTLe
Copy link
Member

Describe the bug
Query federation returns series for the incorrect tenant when using the __tenant_id__ label matcher.

To Reproduce
Steps to reproduce the behavior:

  1. Start Cortex (SHA or version) v1.17.0
  2. Setup two tenants: tenant-a and tenant-b
  3. Send metric with value to tenant-a: metric_name{foo="bar"} 1
  4. Send metric with value to tenant-b: metric_name{foo="baz"} 2
  5. Query with X-Scope-OrgID: tenant-a|tenant-b for metric_name{__tenant_id__="tenant-b"}
  6. Response is metric_name{__tenant_id__="tenant-b", foo="bar"} 1

Expected behavior
Expected metric_name{__tenant_id__="tenant-b", foo="baz"} 2

Environment:

  • Infrastructure: Kubernetes
  • Deployment tool: helm

Additional Context
I was able to reproduce this by updating the existing tests to also check the labels from the series that were queried instead of only checking the number of series returned:

diff --git a/pkg/querier/tenantfederation/merge_queryable_test.go b/pkg/querier/tenantfederation/merge_queryable_test.go
index e8aa04ea2..b3aa318d4 100644
--- a/pkg/querier/tenantfederation/merge_queryable_test.go
+++ b/pkg/querier/tenantfederation/merge_queryable_test.go
@@ -451,6 +451,18 @@ func TestMergeQueryable_Select(t *testing.T) {
                                        name:                "should return only series for team-b when there is an equals matcher for the team-b tenant",
                                        matchers:            []*labels.Matcher{{Name: defaultTenantLabel, Value: "team-b", Type: labels.MatchEqual}},
                                        expectedSeriesCount: 2,
+                                       expectedLabels: []labels.Labels{
+                                               {
+                                                       {Name: "__tenant_id__", Value: "team-b"},
+                                                       {Name: "instance", Value: "host1"},
+                                                       {Name: "tenant-team-b", Value: "static"},
+                                               },
+                                               {
+                                                       {Name: "__tenant_id__", Value: "team-b"},
+                                                       {Name: "instance", Value: "host2.team-b"},
+                                                       {Name: "original___tenant_id__", Value: "original-value"},
+                                               },
+                                       },
                                },
                                {
                                        name:                "should return one series for each tenant when there is an equals matcher for the host1 instance",

When the test runs, the following failure output occurs:

        --- FAIL: TestMergeQueryable_Select/three_tenants/should_return_only_series_for_team-b_when_there_is_an_equals_matcher_for_the_team-b_tenant (0.00s)

Expected :labels.Labels{labels.Label{Name:"__tenant_id__", Value:"team-b"}, labels.Label{Name:"instance", Value:"host1"}, labels.Label{Name:"tenant-team-b", Value:"static"}}
Actual   :labels.Labels{labels.Label{Name:"__tenant_id__", Value:"team-b"}, labels.Label{Name:"instance", Value:"host1"}, labels.Label{Name:"tenant-team-a", Value:"static"}}
@alanprot
Copy link
Member

Thanks for reporting it.

I will try to take a look ASAP. Meanwhile if feel free to open a PR with a fix if you want.

@CharlieTLe
Copy link
Member Author

Seems to be a regression. The updated tests succeeds on v1.15.3, so some something must've changed between then and v1.16.1 for it to stop working.

@CharlieTLe
Copy link
Member Author

CharlieTLe commented May 10, 2024

Was able to narrow it down to this commit e81ee1d in #5593 as the regression.

CharlieTLe added a commit to CharlieTLe/cortex that referenced this issue May 10, 2024
We are creating a job which has the tenant name for each tenant to
query, but we use the incorrect index in the `ids` slice that contains
the list of tenants. This causes the querier to select results for the
first tenant in the list of tenants to match with instead of the actual
tenant the querier was intended for.

I updated the existing tests to check the labels of the results as well
instead of only relying on the number of series found.

Fixes cortexproject#5941

Signed-off-by: Charlie Le <[email protected]>
yeya24 pushed a commit that referenced this issue May 20, 2024
We are creating a job which has the tenant name for each tenant to
query, but we use the incorrect index in the `ids` slice that contains
the list of tenants. This causes the querier to select results for the
first tenant in the list of tenants to match with instead of the actual
tenant the querier was intended for.

I updated the existing tests to check the labels of the results as well
instead of only relying on the number of series found.

Fixes #5941

Signed-off-by: Charlie Le <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants