-
Notifications
You must be signed in to change notification settings - Fork 816
Fix race on chunks multilevel cache + Optimize to avoid refetching already found keys. #6312
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix race on chunks multilevel cache + Optimize to avoid refetching already found keys. #6312
Conversation
Signed-off-by: alanprot <[email protected]>
6c779f0
to
3532f52
Compare
Signed-off-by: alanprot <[email protected]>
Test failing before the fix (first commit)
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense. Thanks a lot for fixing this issue!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How can we do better to catch this before? Maybe we shouldn't use mock in our existing test
I think the only way is integ tests.. cause the root cause was how the results of the cache interact with the overall code … but catching that would need a very specific test I guess .. idk if we will ever be able to test all combinations of configs and edge cases tbh :( |
…ready found keys. (cortexproject#6312) * Creating a test to show the race on the multilevel cache Signed-off-by: alanprot <[email protected]> * fix the race problem * Only fetch keys that were not found on the previous cache Signed-off-by: alanprot <[email protected]> --------- Signed-off-by: alanprot <[email protected]>
* Implementing Expanded Postings Cache Signed-off-by: alanprot <[email protected]> * small nit Signed-off-by: alanprot <[email protected]> * refactoring the cache so we dont need to call expire on every request Signed-off-by: alanprot <[email protected]> * Update total cache size when updating the item Signed-off-by: alanprot <[email protected]> * Fix fuzzy test after change the flag name Signed-off-by: alanprot <[email protected]> * remove max item config + create a new test case with only head cache enabled Signed-off-by: alanprot <[email protected]> * Documenting enabled as first field on the config Signed-off-by: alanprot <[email protected]> * Fix race on chunks multilevel cache + Optimize to avoid refetching already found keys. (#6312) * Creating a test to show the race on the multilevel cache Signed-off-by: alanprot <[email protected]> * fix the race problem * Only fetch keys that were not found on the previous cache Signed-off-by: alanprot <[email protected]> --------- Signed-off-by: alanprot <[email protected]> * Improve Doc Signed-off-by: alanprot <[email protected]> * create new cortex_ingester_expanded_postings_non_cacheable_queries metric Signed-off-by: alanprot <[email protected]> --------- Signed-off-by: alanprot <[email protected]>
…ready found keys. (cortexproject#6312) * Creating a test to show the race on the multilevel cache Signed-off-by: alanprot <[email protected]> * fix the race problem * Only fetch keys that were not found on the previous cache Signed-off-by: alanprot <[email protected]> --------- Signed-off-by: alanprot <[email protected]>
* Implementing Expanded Postings Cache Signed-off-by: alanprot <[email protected]> * small nit Signed-off-by: alanprot <[email protected]> * refactoring the cache so we dont need to call expire on every request Signed-off-by: alanprot <[email protected]> * Update total cache size when updating the item Signed-off-by: alanprot <[email protected]> * Fix fuzzy test after change the flag name Signed-off-by: alanprot <[email protected]> * remove max item config + create a new test case with only head cache enabled Signed-off-by: alanprot <[email protected]> * Documenting enabled as first field on the config Signed-off-by: alanprot <[email protected]> * Fix race on chunks multilevel cache + Optimize to avoid refetching already found keys. (cortexproject#6312) * Creating a test to show the race on the multilevel cache Signed-off-by: alanprot <[email protected]> * fix the race problem * Only fetch keys that were not found on the previous cache Signed-off-by: alanprot <[email protected]> --------- Signed-off-by: alanprot <[email protected]> * Improve Doc Signed-off-by: alanprot <[email protected]> * create new cortex_ingester_expanded_postings_non_cacheable_queries metric Signed-off-by: alanprot <[email protected]> --------- Signed-off-by: alanprot <[email protected]>
What this PR does:
We can have a race when we partially fetch data on the multilevel cache. The reason for that is internally, the object store changes the dic returned by the cache and this races with the cache internal implementation.
Ex: https://github.com/thanos-io/thanos/blob/d6d19c568f8dc6005a9184d3a5a994da4f0d8c75/pkg/store/cache/caching_bucket.go#L469
Right now this PR only have the unit test demonstrating the issue.
PS: I also optimized the cache to not fetch keys that were already found on the previous cache implementation.
Which issue(s) this PR fixes:
Fixes #
Checklist
CHANGELOG.md
updated - the order of entries should be[CHANGE]
,[FEATURE]
,[ENHANCEMENT]
,[BUGFIX]