-
Notifications
You must be signed in to change notification settings - Fork 818
Extend ShuffleSharding on READONLY ingesters #6517
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Extend ShuffleSharding on READONLY ingesters #6517
Conversation
4e75055
to
d344999
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we extend replicas if the current instance is read only status, does that work?
It does @yeya24. |
ef12664
to
4006b10
Compare
Hello @danielblando, thank you for opening this PR. There is a release in progress. As such, please rebase your CHANGELOG entry on top of the master branch and move the CHANGELOG entry to the top under Thanks, |
4006b10
to
1c55b56
Compare
@@ -1005,6 +1005,12 @@ func (i *Lifecycler) changeState(ctx context.Context, state InstanceState) error | |||
|
|||
level.Info(i.logger).Log("msg", "changing instance state from", "old_state", currState, "new_state", state, "ring", i.RingName) | |||
i.setState(state) | |||
|
|||
//The instances is rejoining the ring. It should reset its registered time. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Question, what happens if we don't reset the registered time?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is to catch cases the ingester goes to READONLY and back to ACTIVE.
The query still need to extend on these cases. The change on registeredTimestamp enforce that query will continue to extend requests on these ingesters.
"same number of instances, prioritize readOnly than timestamp changes": { | ||
r1: &Desc{Ingesters: map[string]InstanceDesc{"ing1": {Addr: "addr1", State: ACTIVE, Timestamp: 123456}}}, | ||
r2: &Desc{Ingesters: map[string]InstanceDesc{"ing1": {Addr: "addr1", State: READONLY, Timestamp: 789012}}}, | ||
expected: EqualButReadOnly, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am a bit confused about the name. I understand the prioritization but it is weird to call it equal
when you have timestamp different.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm I get what you mean, but i dont have a better naming. I think this is ok as ReadOnly is less restrictive than EqualButTimestamporState
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. LGTM
Signed-off-by: Daniel Deluiggi <[email protected]>
Signed-off-by: Daniel Deluiggi <[email protected]>
Signed-off-by: Daniel Deluiggi <[email protected]>
Signed-off-by: Daniel Deluiggi <[email protected]>
Signed-off-by: Daniel Deluiggi <[email protected]>
1c55b56
to
8ad97a8
Compare
* Filter readOnly ingesters when sharding Signed-off-by: Daniel Deluiggi <[email protected]> * Extend shard on READONLY Signed-off-by: Daniel Deluiggi <[email protected]> * Remove old code Signed-off-by: Daniel Deluiggi <[email protected]> * Fix test Signed-off-by: Daniel Deluiggi <[email protected]> * update changelog Signed-off-by: Daniel Deluiggi <[email protected]> --------- Signed-off-by: Daniel Deluiggi <[email protected]> Signed-off-by: Alex Le <[email protected]>
* Purge expired postings cache items due inactivity (#6502) * Purge expired postings cache items due inactivity Signed-off-by: alanprot <[email protected]> * Fix comments Signed-off-by: alanprot <[email protected]> --------- Signed-off-by: alanprot <[email protected]> Signed-off-by: Alex Le <[email protected]> * Update thanos to 4ba0ba403896 (#6503) * Update thanos to 4ba0ba403896 Signed-off-by: Daniel Sabsay <[email protected]> * run go mod vendor Signed-off-by: Daniel Sabsay <[email protected]> --------- Signed-off-by: Daniel Sabsay <[email protected]> Co-authored-by: Daniel Sabsay <[email protected]> Signed-off-by: Alex Le <[email protected]> * Bump the actions-dependencies group across 1 directory with 2 updates (#6505) Bumps the actions-dependencies group with 2 updates in the / directory: [actions/upload-artifact](https://github.com/actions/upload-artifact) and [github/codeql-action](https://github.com/github/codeql-action). Updates `actions/upload-artifact` from 4.5.0 to 4.6.0 - [Release notes](https://github.com/actions/upload-artifact/releases) - [Commits](actions/upload-artifact@6f51ac0...65c4c4a) Updates `github/codeql-action` from 3.28.0 to 3.28.1 - [Release notes](https://github.com/github/codeql-action/releases) - [Changelog](https://github.com/github/codeql-action/blob/main/CHANGELOG.md) - [Commits](github/codeql-action@48ab28a...b6a472f) --- updated-dependencies: - dependency-name: actions/upload-artifact dependency-type: direct:production update-type: version-update:semver-minor dependency-group: actions-dependencies - dependency-name: github/codeql-action dependency-type: direct:production update-type: version-update:semver-patch dependency-group: actions-dependencies ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Signed-off-by: Alex Le <[email protected]> * calculate # of concurrency only once at the runner (#6506) Signed-off-by: SungJin1212 <[email protected]> Signed-off-by: Alex Le <[email protected]> * Implement partition compaction planner (#6469) * Implement partition compaction grouper Signed-off-by: Alex Le <[email protected]> * fix comment Signed-off-by: Alex Le <[email protected]> * replace level 1 compaction limits with ingestion replication factor Signed-off-by: Alex Le <[email protected]> * fix doc Signed-off-by: Alex Le <[email protected]> * update compaction_visit_marker_timeout default value Signed-off-by: Alex Le <[email protected]> * update default value for compactor_partition_index_size_limit_in_bytes Signed-off-by: Alex Le <[email protected]> * refactor code Signed-off-by: Alex Le <[email protected]> * address comments and refactor Signed-off-by: Alex Le <[email protected]> * address comment Signed-off-by: Alex Le <[email protected]> * address comment Signed-off-by: Alex Le <[email protected]> * update config name Signed-off-by: Alex Le <[email protected]> * Implement partition compaction planner Signed-off-by: Alex Le <[email protected]> * fix after rebase Signed-off-by: Alex Le <[email protected]> * addressed comments Signed-off-by: Alex Le <[email protected]> * updated doc and refactored metric Signed-off-by: Alex Le <[email protected]> * fix test Signed-off-by: Alex Le <[email protected]> --------- Signed-off-by: Alex Le <[email protected]> * Add max tenant config to tenant federation (#6493) Signed-off-by: SungJin1212 <[email protected]> Signed-off-by: Alex Le <[email protected]> * Add cleaner logic to clean partition compaction blocks and related files (#6507) * Add cleaner logic to clean partition compaction blocks and related files Signed-off-by: Alex Le <[email protected]> * refactored metrics Signed-off-by: Alex Le <[email protected]> * refactor Signed-off-by: Alex Le <[email protected]> * update logs Signed-off-by: Alex Le <[email protected]> --------- Signed-off-by: Alex Le <[email protected]> * Update RELEASE.md (#6511) Maintainers would like an additional week to get the partition compactor changes in before the first release candidate for 1.19. Signed-off-by: Charlie Le <[email protected]> Signed-off-by: Alex Le <[email protected]> * update thanos version to 236777732278c64ca01c1c09d726f0f712c87164 (#6514) Signed-off-by: yeya24 <[email protected]> Signed-off-by: Alex Le <[email protected]> * Fix race that can cause nil reference when using expanded postings (#6518) Signed-off-by: alanprot <[email protected]> Signed-off-by: Alex Le <[email protected]> * Add more op label values to cortex_query_frontend_queries_total metric (#6519) Signed-off-by: Alex Le <[email protected]> * Allow use of non-dualstack endpoints for S3 blocks storage (#6522) Signed-off-by: Alex Le <[email protected]> * Expose grpc client connect timeout config and default to 5s (#6523) * expose grpc client connect timeout config Signed-off-by: yeya24 <[email protected]> * changelog Signed-off-by: yeya24 <[email protected]> --------- Signed-off-by: yeya24 <[email protected]> Signed-off-by: Alex Le <[email protected]> * Hook up partition compaction end to end implementation (#6510) * Implemented partition compaction end to end with custom compaction lifecycle Signed-off-by: Alex Le <[email protected]> * removed unused variable Signed-off-by: Alex Le <[email protected]> * tweak test Signed-off-by: Alex Le <[email protected]> * tweak test Signed-off-by: Alex Le <[email protected]> * refactor according to comments Signed-off-by: Alex Le <[email protected]> * tweak test Signed-off-by: Alex Le <[email protected]> * check context error inside sharded posting Signed-off-by: Alex Le <[email protected]> * fix lint Signed-off-by: Alex Le <[email protected]> * fix integration test for memberlist Signed-off-by: Alex Le <[email protected]> * make compactor initial wait cancellable Signed-off-by: Alex Le <[email protected]> --------- Signed-off-by: Alex Le <[email protected]> * Test for nil on expire expanded postings (#6521) * Test for nil on expire expanded postings Signed-off-by: alanprot <[email protected]> * stopping ingester Signed-off-by: alanprot <[email protected]> * refactor the test to not timeout Signed-off-by: alanprot <[email protected]> --------- Signed-off-by: alanprot <[email protected]> Signed-off-by: Alex Le <[email protected]> * log when a request starts running in querier (#6525) * log when a request starts running in querier Signed-off-by: Ahmed Hassan <[email protected]> * log when a request starts running in querier for frontend processor Signed-off-by: Ahmed Hassan <[email protected]> --------- Signed-off-by: Ahmed Hassan <[email protected]> Signed-off-by: Alex Le <[email protected]> * Update build image according to 03a8f8c (#6508) Signed-off-by: Friedrich Gonzalez <[email protected]> Signed-off-by: Alex Le <[email protected]> * Deprecate -blocks-storage.tsdb.wal-compression-enabled flag Signed-off-by: SungJin1212 <[email protected]> Signed-off-by: Alex Le <[email protected]> * Fix test (#6537) Signed-off-by: Daniel Deluiggi <[email protected]> Signed-off-by: Alex Le <[email protected]> * Mark 1.19 release in progress https://github.com/cortexproject/cortex/blob/master/RELEASE.md#show-that-a-release-is-in-progress Signed-off-by: Charlie Le <[email protected]> Signed-off-by: Alex Le <[email protected]> * Prepare 1.19.0-rc.0 Signed-off-by: Charlie Le <[email protected]> Signed-off-by: Alex Le <[email protected]> * Revert "Prepare 1.19.0-rc.0" Signed-off-by: Alex Le <[email protected]> * Fixed blocksGroupWithPartition unable to reuse functions from blocksGroup (#6547) * Fixed blocksGroupWithPartition unable to reuse functions from blocksGroup Signed-off-by: Alex Le <[email protected]> * update tests Signed-off-by: Alex Le <[email protected]> --------- Signed-off-by: Alex Le <[email protected]> * Remove TransferChunks gRPC method (#6543) Signed-off-by: SungJin1212 <[email protected]> Signed-off-by: Alex Le <[email protected]> * Uupdate Ppromqlsmith (#6557) Signed-off-by: alanprot <[email protected]> Signed-off-by: Alex Le <[email protected]> * Query Partial Data (#6526) * Create partial_data Signed-off-by: Justin Jung <[email protected]> * Fix lazyquery so that warning message is returned Signed-off-by: Justin Jung <[email protected]> * Add QueryPartialData limit Signed-off-by: Justin Jung <[email protected]> * Fix broken mock Signed-off-by: Justin Jung <[email protected]> * Make response with warnings to be not cached Signed-off-by: Justin Jung <[email protected]> * Updated streamingSelect in distributor_queryable Signed-off-by: Justin Jung <[email protected]> * Update query.go Signed-off-by: Justin Jung <[email protected]> * Update replication_set Signed-off-by: Justin Jung <[email protected]> * Lint Signed-off-by: Justin Jung <[email protected]> * Lint again Signed-off-by: Justin Jung <[email protected]> * Generated doc Signed-off-by: Justin Jung <[email protected]> * Changelog Signed-off-by: Justin Jung <[email protected]> * Update config description Signed-off-by: Justin Jung <[email protected]> * Do not remove warnings from seriesSet Signed-off-by: Justin Jung <[email protected]> * Avoid cache only if the warning message contains partial data error Signed-off-by: Justin Jung <[email protected]> * Remove context usage for partial data Signed-off-by: Justin Jung <[email protected]> * Refactor how partial data info is passed + apply to series and label methods as well Signed-off-by: Justin Jung <[email protected]> * Lint + fix tests Signed-off-by: Justin Jung <[email protected]> * Fix build Signed-off-by: Justin Jung <[email protected]> * Create separate config for ruler partial data Signed-off-by: Justin Jung <[email protected]> * Genereta doc Signed-off-by: Justin Jung <[email protected]> * Add more tests Signed-off-by: Justin Jung <[email protected]> * Change error Signed-off-by: Justin Jung <[email protected]> * Fix test Signed-off-by: Justin Jung <[email protected]> * Update changelog Signed-off-by: Justin Jung <[email protected]> * Update changelog Signed-off-by: Justin Jung <[email protected]> * Nit Signed-off-by: Justin Jung <[email protected]> * Nit Signed-off-by: Justin Jung <[email protected]> --------- Signed-off-by: Justin Jung <[email protected]> Signed-off-by: Alex Le <[email protected]> * Add timeout for dynamodb ring kv (#6544) * add dynamodb kv with timeout enforced Signed-off-by: yeya24 <[email protected]> * add tests Signed-off-by: yeya24 <[email protected]> * docs Signed-off-by: Ben Ye <[email protected]> * update changelog Signed-off-by: Ben Ye <[email protected]> --------- Signed-off-by: yeya24 <[email protected]> Signed-off-by: Ben Ye <[email protected]> Signed-off-by: Alex Le <[email protected]> * Bump the actions-dependencies group across 1 directory with 2 updates (#6564) Bumps the actions-dependencies group with 2 updates in the / directory: [github/codeql-action](https://github.com/github/codeql-action) and [actions/setup-go](https://github.com/actions/setup-go). Updates `github/codeql-action` from 3.28.1 to 3.28.7 - [Release notes](https://github.com/github/codeql-action/releases) - [Changelog](https://github.com/github/codeql-action/blob/main/CHANGELOG.md) - [Commits](github/codeql-action@b6a472f...6e54559) Updates `actions/setup-go` from 5.2.0 to 5.3.0 - [Release notes](https://github.com/actions/setup-go/releases) - [Commits](actions/setup-go@3041bf5...f111f33) --- updated-dependencies: - dependency-name: github/codeql-action dependency-type: direct:production update-type: version-update:semver-patch dependency-group: actions-dependencies - dependency-name: actions/setup-go dependency-type: direct:production update-type: version-update:semver-minor dependency-group: actions-dependencies ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Signed-off-by: Alex Le <[email protected]> * Fix: expanded postings can cache wrong data when queries are issued "in the future" (#6562) * improve fuzz test for expanded postings cache Signed-off-by: alanprot <[email protected]> * create more tests on the expanded postings cache Signed-off-by: alanprot <[email protected]> * adding get series call on the test Signed-off-by: alanprot <[email protected]> * no use CachedBlockChunkQuerier when query time range is completely after the last sample added in the head Signed-off-by: alanprot <[email protected]> * adding comments Signed-off-by: alanprot <[email protected]> * increase the number of fuzz test from 100 to 300 Signed-off-by: alanprot <[email protected]> * add get series fuzzy testing Signed-off-by: alanprot <[email protected]> --------- Signed-off-by: alanprot <[email protected]> Signed-off-by: Alex Le <[email protected]> * Extend ShuffleSharding on READONLY ingesters (#6517) * Filter readOnly ingesters when sharding Signed-off-by: Daniel Deluiggi <[email protected]> * Extend shard on READONLY Signed-off-by: Daniel Deluiggi <[email protected]> * Remove old code Signed-off-by: Daniel Deluiggi <[email protected]> * Fix test Signed-off-by: Daniel Deluiggi <[email protected]> * update changelog Signed-off-by: Daniel Deluiggi <[email protected]> --------- Signed-off-by: Daniel Deluiggi <[email protected]> Signed-off-by: Alex Le <[email protected]> * Create guide doc for partition compaction Signed-off-by: Alex Le <[email protected]> * Update docs/guides/partitioning-compactor.md Co-authored-by: Charlie Le <[email protected]> Signed-off-by: Alex Le <[email protected]> Signed-off-by: Alex Le <[email protected]> * updated doc Signed-off-by: Alex Le <[email protected]> * clean white space Signed-off-by: Alex Le <[email protected]> --------- Signed-off-by: alanprot <[email protected]> Signed-off-by: Alex Le <[email protected]> Signed-off-by: Daniel Sabsay <[email protected]> Signed-off-by: dependabot[bot] <[email protected]> Signed-off-by: SungJin1212 <[email protected]> Signed-off-by: Charlie Le <[email protected]> Signed-off-by: yeya24 <[email protected]> Signed-off-by: Ahmed Hassan <[email protected]> Signed-off-by: Friedrich Gonzalez <[email protected]> Signed-off-by: Daniel Deluiggi <[email protected]> Signed-off-by: Justin Jung <[email protected]> Signed-off-by: Ben Ye <[email protected]> Signed-off-by: Alex Le <[email protected]> Co-authored-by: Alan Protasio <[email protected]> Co-authored-by: Daniel Sabsay <[email protected]> Co-authored-by: Daniel Sabsay <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: SungJin1212 <[email protected]> Co-authored-by: Charlie Le <[email protected]> Co-authored-by: Ben Ye <[email protected]> Co-authored-by: Sam McBroom <[email protected]> Co-authored-by: Ahmed Hassan <[email protected]> Co-authored-by: Friedrich Gonzalez <[email protected]> Co-authored-by: Daniel Blando <[email protected]> Co-authored-by: Justin Jung <[email protected]>
* Purge expired postings cache items due inactivity (cortexproject#6502) * Purge expired postings cache items due inactivity Signed-off-by: alanprot <[email protected]> * Fix comments Signed-off-by: alanprot <[email protected]> --------- Signed-off-by: alanprot <[email protected]> Signed-off-by: Alex Le <[email protected]> * Update thanos to 4ba0ba403896 (cortexproject#6503) * Update thanos to 4ba0ba403896 Signed-off-by: Daniel Sabsay <[email protected]> * run go mod vendor Signed-off-by: Daniel Sabsay <[email protected]> --------- Signed-off-by: Daniel Sabsay <[email protected]> Co-authored-by: Daniel Sabsay <[email protected]> Signed-off-by: Alex Le <[email protected]> * Bump the actions-dependencies group across 1 directory with 2 updates (cortexproject#6505) Bumps the actions-dependencies group with 2 updates in the / directory: [actions/upload-artifact](https://github.com/actions/upload-artifact) and [github/codeql-action](https://github.com/github/codeql-action). Updates `actions/upload-artifact` from 4.5.0 to 4.6.0 - [Release notes](https://github.com/actions/upload-artifact/releases) - [Commits](actions/upload-artifact@6f51ac0...65c4c4a) Updates `github/codeql-action` from 3.28.0 to 3.28.1 - [Release notes](https://github.com/github/codeql-action/releases) - [Changelog](https://github.com/github/codeql-action/blob/main/CHANGELOG.md) - [Commits](github/codeql-action@48ab28a...b6a472f) --- updated-dependencies: - dependency-name: actions/upload-artifact dependency-type: direct:production update-type: version-update:semver-minor dependency-group: actions-dependencies - dependency-name: github/codeql-action dependency-type: direct:production update-type: version-update:semver-patch dependency-group: actions-dependencies ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Signed-off-by: Alex Le <[email protected]> * calculate # of concurrency only once at the runner (cortexproject#6506) Signed-off-by: SungJin1212 <[email protected]> Signed-off-by: Alex Le <[email protected]> * Implement partition compaction planner (cortexproject#6469) * Implement partition compaction grouper Signed-off-by: Alex Le <[email protected]> * fix comment Signed-off-by: Alex Le <[email protected]> * replace level 1 compaction limits with ingestion replication factor Signed-off-by: Alex Le <[email protected]> * fix doc Signed-off-by: Alex Le <[email protected]> * update compaction_visit_marker_timeout default value Signed-off-by: Alex Le <[email protected]> * update default value for compactor_partition_index_size_limit_in_bytes Signed-off-by: Alex Le <[email protected]> * refactor code Signed-off-by: Alex Le <[email protected]> * address comments and refactor Signed-off-by: Alex Le <[email protected]> * address comment Signed-off-by: Alex Le <[email protected]> * address comment Signed-off-by: Alex Le <[email protected]> * update config name Signed-off-by: Alex Le <[email protected]> * Implement partition compaction planner Signed-off-by: Alex Le <[email protected]> * fix after rebase Signed-off-by: Alex Le <[email protected]> * addressed comments Signed-off-by: Alex Le <[email protected]> * updated doc and refactored metric Signed-off-by: Alex Le <[email protected]> * fix test Signed-off-by: Alex Le <[email protected]> --------- Signed-off-by: Alex Le <[email protected]> * Add max tenant config to tenant federation (cortexproject#6493) Signed-off-by: SungJin1212 <[email protected]> Signed-off-by: Alex Le <[email protected]> * Add cleaner logic to clean partition compaction blocks and related files (cortexproject#6507) * Add cleaner logic to clean partition compaction blocks and related files Signed-off-by: Alex Le <[email protected]> * refactored metrics Signed-off-by: Alex Le <[email protected]> * refactor Signed-off-by: Alex Le <[email protected]> * update logs Signed-off-by: Alex Le <[email protected]> --------- Signed-off-by: Alex Le <[email protected]> * Update RELEASE.md (cortexproject#6511) Maintainers would like an additional week to get the partition compactor changes in before the first release candidate for 1.19. Signed-off-by: Charlie Le <[email protected]> Signed-off-by: Alex Le <[email protected]> * update thanos version to 236777732278c64ca01c1c09d726f0f712c87164 (cortexproject#6514) Signed-off-by: yeya24 <[email protected]> Signed-off-by: Alex Le <[email protected]> * Fix race that can cause nil reference when using expanded postings (cortexproject#6518) Signed-off-by: alanprot <[email protected]> Signed-off-by: Alex Le <[email protected]> * Add more op label values to cortex_query_frontend_queries_total metric (cortexproject#6519) Signed-off-by: Alex Le <[email protected]> * Allow use of non-dualstack endpoints for S3 blocks storage (cortexproject#6522) Signed-off-by: Alex Le <[email protected]> * Expose grpc client connect timeout config and default to 5s (cortexproject#6523) * expose grpc client connect timeout config Signed-off-by: yeya24 <[email protected]> * changelog Signed-off-by: yeya24 <[email protected]> --------- Signed-off-by: yeya24 <[email protected]> Signed-off-by: Alex Le <[email protected]> * Hook up partition compaction end to end implementation (cortexproject#6510) * Implemented partition compaction end to end with custom compaction lifecycle Signed-off-by: Alex Le <[email protected]> * removed unused variable Signed-off-by: Alex Le <[email protected]> * tweak test Signed-off-by: Alex Le <[email protected]> * tweak test Signed-off-by: Alex Le <[email protected]> * refactor according to comments Signed-off-by: Alex Le <[email protected]> * tweak test Signed-off-by: Alex Le <[email protected]> * check context error inside sharded posting Signed-off-by: Alex Le <[email protected]> * fix lint Signed-off-by: Alex Le <[email protected]> * fix integration test for memberlist Signed-off-by: Alex Le <[email protected]> * make compactor initial wait cancellable Signed-off-by: Alex Le <[email protected]> --------- Signed-off-by: Alex Le <[email protected]> * Test for nil on expire expanded postings (cortexproject#6521) * Test for nil on expire expanded postings Signed-off-by: alanprot <[email protected]> * stopping ingester Signed-off-by: alanprot <[email protected]> * refactor the test to not timeout Signed-off-by: alanprot <[email protected]> --------- Signed-off-by: alanprot <[email protected]> Signed-off-by: Alex Le <[email protected]> * log when a request starts running in querier (cortexproject#6525) * log when a request starts running in querier Signed-off-by: Ahmed Hassan <[email protected]> * log when a request starts running in querier for frontend processor Signed-off-by: Ahmed Hassan <[email protected]> --------- Signed-off-by: Ahmed Hassan <[email protected]> Signed-off-by: Alex Le <[email protected]> * Update build image according to cortexproject@03a8f8c (cortexproject#6508) Signed-off-by: Friedrich Gonzalez <[email protected]> Signed-off-by: Alex Le <[email protected]> * Deprecate -blocks-storage.tsdb.wal-compression-enabled flag Signed-off-by: SungJin1212 <[email protected]> Signed-off-by: Alex Le <[email protected]> * Fix test (cortexproject#6537) Signed-off-by: Daniel Deluiggi <[email protected]> Signed-off-by: Alex Le <[email protected]> * Mark 1.19 release in progress https://github.com/cortexproject/cortex/blob/master/RELEASE.md#show-that-a-release-is-in-progress Signed-off-by: Charlie Le <[email protected]> Signed-off-by: Alex Le <[email protected]> * Prepare 1.19.0-rc.0 Signed-off-by: Charlie Le <[email protected]> Signed-off-by: Alex Le <[email protected]> * Revert "Prepare 1.19.0-rc.0" Signed-off-by: Alex Le <[email protected]> * Fixed blocksGroupWithPartition unable to reuse functions from blocksGroup (cortexproject#6547) * Fixed blocksGroupWithPartition unable to reuse functions from blocksGroup Signed-off-by: Alex Le <[email protected]> * update tests Signed-off-by: Alex Le <[email protected]> --------- Signed-off-by: Alex Le <[email protected]> * Remove TransferChunks gRPC method (cortexproject#6543) Signed-off-by: SungJin1212 <[email protected]> Signed-off-by: Alex Le <[email protected]> * Uupdate Ppromqlsmith (cortexproject#6557) Signed-off-by: alanprot <[email protected]> Signed-off-by: Alex Le <[email protected]> * Query Partial Data (cortexproject#6526) * Create partial_data Signed-off-by: Justin Jung <[email protected]> * Fix lazyquery so that warning message is returned Signed-off-by: Justin Jung <[email protected]> * Add QueryPartialData limit Signed-off-by: Justin Jung <[email protected]> * Fix broken mock Signed-off-by: Justin Jung <[email protected]> * Make response with warnings to be not cached Signed-off-by: Justin Jung <[email protected]> * Updated streamingSelect in distributor_queryable Signed-off-by: Justin Jung <[email protected]> * Update query.go Signed-off-by: Justin Jung <[email protected]> * Update replication_set Signed-off-by: Justin Jung <[email protected]> * Lint Signed-off-by: Justin Jung <[email protected]> * Lint again Signed-off-by: Justin Jung <[email protected]> * Generated doc Signed-off-by: Justin Jung <[email protected]> * Changelog Signed-off-by: Justin Jung <[email protected]> * Update config description Signed-off-by: Justin Jung <[email protected]> * Do not remove warnings from seriesSet Signed-off-by: Justin Jung <[email protected]> * Avoid cache only if the warning message contains partial data error Signed-off-by: Justin Jung <[email protected]> * Remove context usage for partial data Signed-off-by: Justin Jung <[email protected]> * Refactor how partial data info is passed + apply to series and label methods as well Signed-off-by: Justin Jung <[email protected]> * Lint + fix tests Signed-off-by: Justin Jung <[email protected]> * Fix build Signed-off-by: Justin Jung <[email protected]> * Create separate config for ruler partial data Signed-off-by: Justin Jung <[email protected]> * Genereta doc Signed-off-by: Justin Jung <[email protected]> * Add more tests Signed-off-by: Justin Jung <[email protected]> * Change error Signed-off-by: Justin Jung <[email protected]> * Fix test Signed-off-by: Justin Jung <[email protected]> * Update changelog Signed-off-by: Justin Jung <[email protected]> * Update changelog Signed-off-by: Justin Jung <[email protected]> * Nit Signed-off-by: Justin Jung <[email protected]> * Nit Signed-off-by: Justin Jung <[email protected]> --------- Signed-off-by: Justin Jung <[email protected]> Signed-off-by: Alex Le <[email protected]> * Add timeout for dynamodb ring kv (cortexproject#6544) * add dynamodb kv with timeout enforced Signed-off-by: yeya24 <[email protected]> * add tests Signed-off-by: yeya24 <[email protected]> * docs Signed-off-by: Ben Ye <[email protected]> * update changelog Signed-off-by: Ben Ye <[email protected]> --------- Signed-off-by: yeya24 <[email protected]> Signed-off-by: Ben Ye <[email protected]> Signed-off-by: Alex Le <[email protected]> * Bump the actions-dependencies group across 1 directory with 2 updates (cortexproject#6564) Bumps the actions-dependencies group with 2 updates in the / directory: [github/codeql-action](https://github.com/github/codeql-action) and [actions/setup-go](https://github.com/actions/setup-go). Updates `github/codeql-action` from 3.28.1 to 3.28.7 - [Release notes](https://github.com/github/codeql-action/releases) - [Changelog](https://github.com/github/codeql-action/blob/main/CHANGELOG.md) - [Commits](github/codeql-action@b6a472f...6e54559) Updates `actions/setup-go` from 5.2.0 to 5.3.0 - [Release notes](https://github.com/actions/setup-go/releases) - [Commits](actions/setup-go@3041bf5...f111f33) --- updated-dependencies: - dependency-name: github/codeql-action dependency-type: direct:production update-type: version-update:semver-patch dependency-group: actions-dependencies - dependency-name: actions/setup-go dependency-type: direct:production update-type: version-update:semver-minor dependency-group: actions-dependencies ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Signed-off-by: Alex Le <[email protected]> * Fix: expanded postings can cache wrong data when queries are issued "in the future" (cortexproject#6562) * improve fuzz test for expanded postings cache Signed-off-by: alanprot <[email protected]> * create more tests on the expanded postings cache Signed-off-by: alanprot <[email protected]> * adding get series call on the test Signed-off-by: alanprot <[email protected]> * no use CachedBlockChunkQuerier when query time range is completely after the last sample added in the head Signed-off-by: alanprot <[email protected]> * adding comments Signed-off-by: alanprot <[email protected]> * increase the number of fuzz test from 100 to 300 Signed-off-by: alanprot <[email protected]> * add get series fuzzy testing Signed-off-by: alanprot <[email protected]> --------- Signed-off-by: alanprot <[email protected]> Signed-off-by: Alex Le <[email protected]> * Extend ShuffleSharding on READONLY ingesters (cortexproject#6517) * Filter readOnly ingesters when sharding Signed-off-by: Daniel Deluiggi <[email protected]> * Extend shard on READONLY Signed-off-by: Daniel Deluiggi <[email protected]> * Remove old code Signed-off-by: Daniel Deluiggi <[email protected]> * Fix test Signed-off-by: Daniel Deluiggi <[email protected]> * update changelog Signed-off-by: Daniel Deluiggi <[email protected]> --------- Signed-off-by: Daniel Deluiggi <[email protected]> Signed-off-by: Alex Le <[email protected]> * Create guide doc for partition compaction Signed-off-by: Alex Le <[email protected]> * Update docs/guides/partitioning-compactor.md Co-authored-by: Charlie Le <[email protected]> Signed-off-by: Alex Le <[email protected]> Signed-off-by: Alex Le <[email protected]> * updated doc Signed-off-by: Alex Le <[email protected]> * clean white space Signed-off-by: Alex Le <[email protected]> --------- Signed-off-by: alanprot <[email protected]> Signed-off-by: Alex Le <[email protected]> Signed-off-by: Daniel Sabsay <[email protected]> Signed-off-by: dependabot[bot] <[email protected]> Signed-off-by: SungJin1212 <[email protected]> Signed-off-by: Charlie Le <[email protected]> Signed-off-by: yeya24 <[email protected]> Signed-off-by: Ahmed Hassan <[email protected]> Signed-off-by: Friedrich Gonzalez <[email protected]> Signed-off-by: Daniel Deluiggi <[email protected]> Signed-off-by: Justin Jung <[email protected]> Signed-off-by: Ben Ye <[email protected]> Signed-off-by: Alex Le <[email protected]> Co-authored-by: Alan Protasio <[email protected]> Co-authored-by: Daniel Sabsay <[email protected]> Co-authored-by: Daniel Sabsay <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: SungJin1212 <[email protected]> Co-authored-by: Charlie Le <[email protected]> Co-authored-by: Ben Ye <[email protected]> Co-authored-by: Sam McBroom <[email protected]> Co-authored-by: Ahmed Hassan <[email protected]> Co-authored-by: Friedrich Gonzalez <[email protected]> Co-authored-by: Daniel Blando <[email protected]> Co-authored-by: Justin Jung <[email protected]>
What this PR does:
We found an issue when a tenant has
ingestion_tenant_shard_size
lower than the number of ACTIVE ingesters or high number of READONLY ingesters when testing the new status of READONLY.Eg:
Failed Push
Lets assume we have a ring with
10 ACTIVE ingesters
50 READONLY ingesters
tenantA ingestion_tenant_shard_size of 20
The current subRing of this tenant can be created with only READONLY ingesters. In this case, DoBatch will fail as there will be no health ingesters to send data.
Early throttle
Lets assume we have a ring with
80 ACTIVE ingesters
20 READONLY ingesters
tenantA ingestion_tenant_shard_size of 20
The current subRing can be created as a mix of ACTIVE and READONLY ingesters. This will cause a subRing of size 20 but only 15 ACTIVE ingesters supposedly. The localLimit for each ingesters will be calculated over 20 as the shard size but only 15 ingester are receiving all data. The new scenario will create a subRing over just the 80 ACTIVE ingesters.
This PR introduce extension to READONLY instances on ShuffleShard. It works similar to lookback. On the write path as we dont use READONLY and extend instead to write, it will not send requests to REANDONLY on RemoteWrite. On read path, it will return a shard greather than expected as it happens with lookback.
We are also changing the registered timestamp of ingesters which returns from READONLY state. These ingesters are as they entered the ring again for a Write perspective. This will make the lookback on read extend on them
Checklist
CHANGELOG.md
updated - the order of entries should be[CHANGE]
,[FEATURE]
,[ENHANCEMENT]
,[BUGFIX]