-
Notifications
You must be signed in to change notification settings - Fork 818
Adds v11 schema #1538
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adds v11 schema #1538
Conversation
@cyriltovena Can you make the design doc accessible for the public? |
Oups I thought it was already sorry ! should be good now. |
This would also help advance #416 ! |
Is this covered by any end-to-end test? |
It is covered by unit tests, could you point me at the end-to-end test ? |
@gouthamve I've added some improvements:
|
Rebased and calculated the overhead. This adds 7% to the index size: https://docs.google.com/spreadsheets/d/1admjIY_SyBN8q7FJdhumNtsM85yOWxTgfCUinRtYLwY/edit Calculated using v10 as base: Lines 642 to 673 in 98db0a4
|
TIL that Google Sheets doesn't colour in the cells in a formula when you don't have edit permission. Came here to say that DynamoDB charges 100 bytes overhead per record, so the storage cost in both cases is quite a bit higher than the raw byte count, and the relative cost of 130 extra bytes in one record is much less. |
I've made the sheet editable, but given that it's even better for Dynamo, can this get an LGTM? I wanna first roll this out and then experiment with a v12 that doesn't require a |
Should we have something in the docs saying which schemas are solid and which are experimental / not recommended ? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have a couple of comments. I would like to see the blind alleys removed from the commits before approving.
Also, am I right in thinking the labels data is never cached? Is that OK?
No, it should be cached as far as I can see. Atleast the index lookups will be cached and that should be enough, no? |
I think index cache is enough, since we store the data in the index only. |
OK, I think I understand how the caching works now. |
34a8909
to
2b30729
Compare
Stores only label names and not the entire metric. Storing entire metric will bloat the index by 30% and it doesn't really make sense to do it right now. Adding just label names adds a tolerable 7% to the index. Also, in Prometheus, we don't treat __name__ as a special label. We always return it when calling /labels API and we should do the same here. Signed-off-by: Cyril Tovena <[email protected]> Signed-off-by: Goutham Veeramachaneni <[email protected]>
fix lint issue Signed-off-by: Cyril Tovena <[email protected]> removes useless loop Signed-off-by: Cyril Tovena <[email protected]> This should be on v11 not v10. Signed-off-by: Cyril Tovena <[email protected]> s/metricConstRangeKeyV1/labelNamesRangeKeyV1/ The code was first written to store the entire series, but now changed to do just labelNames. Signed-off-by: Goutham Veeramachaneni <[email protected]> Add note about v11 being experimental. Signed-off-by: Goutham Veeramachaneni <[email protected]>
Signed-off-by: Cyril Tovena [email protected]
We ran Loki for a while using the new
LabelNamesForMetricName
to retrieve all possibles labels. Turned out that it does pull too many chunks for a single query spanning over 6h (12k~)This introduces a new schema where we store the metric (labels set) within the
IndexEntry.Value
. Other schema keep the same implementation.Design doc & reason for this change: https://docs.google.com/document/d/1sVZHtQACLQZfiKnnFhXSiqyRx6YSX945s9jPjc2js6o/edit?usp=sharing
/cc @gouthamve