Skip to content

Adds v11 schema #1538

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Oct 25, 2019
Merged

Adds v11 schema #1538

merged 2 commits into from
Oct 25, 2019

Conversation

cyriltovena
Copy link
Contributor

@cyriltovena cyriltovena commented Jul 29, 2019

Signed-off-by: Cyril Tovena [email protected]

We ran Loki for a while using the new LabelNamesForMetricName to retrieve all possibles labels. Turned out that it does pull too many chunks for a single query spanning over 6h (12k~)

This introduces a new schema where we store the metric (labels set) within the IndexEntry.Value. Other schema keep the same implementation.

Design doc & reason for this change: https://docs.google.com/document/d/1sVZHtQACLQZfiKnnFhXSiqyRx6YSX945s9jPjc2js6o/edit?usp=sharing

/cc @gouthamve

@weeco
Copy link
Contributor

weeco commented Jul 29, 2019

@cyriltovena Can you make the design doc accessible for the public?

@cyriltovena
Copy link
Contributor Author

cyriltovena commented Jul 29, 2019

Oups I thought it was already sorry ! should be good now.

@cyriltovena
Copy link
Contributor Author

This would also help advance #416 !

@tomwilkie
Copy link
Contributor

Is this covered by any end-to-end test?

@cyriltovena
Copy link
Contributor Author

It is covered by unit tests, could you point me at the end-to-end test ?

@cyriltovena
Copy link
Contributor Author

@gouthamve I've added some improvements:

  • The index entry is now only returned via GetLabelWriteEntries which means the entry is created only for a unique label set.
  • I'm storing now only label names (not values) and skipping name as it will always be there.
  • deduping using a map.

@gouthamve
Copy link
Contributor

Rebased and calculated the overhead. This adds 7% to the index size: https://docs.google.com/spreadsheets/d/1admjIY_SyBN8q7FJdhumNtsM85yOWxTgfCUinRtYLwY/edit

Calculated using v10 as base:

cortex/pkg/chunk/schema.go

Lines 642 to 673 in 98db0a4

func (s v10Entries) GetLabelWriteEntries(bucket Bucket, metricName string, labels labels.Labels, chunkID string) ([]IndexEntry, error) {
seriesID := labelsSeriesID(labels)
// read first 32 bits of the hash and use this to calculate the shard
shard := binary.BigEndian.Uint32(seriesID) % s.rowShards
entries := []IndexEntry{
// Entry for metricName -> seriesID
{
TableName: bucket.tableName,
HashValue: fmt.Sprintf("%02d:%s:%s", shard, bucket.hashKey, metricName),
RangeValue: encodeRangeKey(seriesID, nil, nil, seriesRangeKeyV1),
},
}
// Entries for metricName:labelName -> hash(value):seriesID
// We use a hash of the value to limit its length.
for _, v := range labels {
if v.Name == model.MetricNameLabel {
continue
}
valueHash := sha256bytes(v.Value)
entries = append(entries, IndexEntry{
TableName: bucket.tableName,
HashValue: fmt.Sprintf("%02d:%s:%s:%s", shard, bucket.hashKey, metricName, v.Name),
RangeValue: encodeRangeKey(valueHash, seriesID, nil, labelSeriesRangeKeyV1),
Value: []byte(v.Value),
})
}
return entries, nil
}

@bboreham
Copy link
Contributor

TIL that Google Sheets doesn't colour in the cells in a formula when you don't have edit permission.
So it's a bit hard to follow.

Came here to say that DynamoDB charges 100 bytes overhead per record, so the storage cost in both cases is quite a bit higher than the raw byte count, and the relative cost of 130 extra bytes in one record is much less.

@gouthamve
Copy link
Contributor

I've made the sheet editable, but given that it's even better for Dynamo, can this get an LGTM? I wanna first roll this out and then experiment with a v12 that doesn't require a __name__.

@bboreham
Copy link
Contributor

Should we have something in the docs saying which schemas are solid and which are experimental / not recommended ?

Copy link
Contributor

@bboreham bboreham left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a couple of comments. I would like to see the blind alleys removed from the commits before approving.

Also, am I right in thinking the labels data is never cached? Is that OK?

@gouthamve
Copy link
Contributor

Also, am I right in thinking the labels data is never cached?

No, it should be cached as far as I can see. Atleast the index lookups will be cached and that should be enough, no?

@cyriltovena
Copy link
Contributor Author

Also, am I right in thinking the labels data is never cached?

No, it should be cached as far as I can see. Atleast the index lookups will be cached and that should be enough, no?

I think index cache is enough, since we store the data in the index only.

@bboreham
Copy link
Contributor

OK, I think I understand how the caching works now.
I would still like to see the blind alleys removed from the commits.

@gouthamve gouthamve force-pushed the v11schema branch 3 times, most recently from 34a8909 to 2b30729 Compare October 25, 2019 13:27
Stores only label names and not the entire metric. Storing entire metric
will bloat the index by 30% and it doesn't really make sense to do it
right now. Adding just label names adds a tolerable 7% to the index.

Also, in Prometheus, we don't treat __name__ as a special label. We
always return it when calling /labels API and we should do the same
here.

Signed-off-by: Cyril Tovena <[email protected]>
Signed-off-by: Goutham Veeramachaneni <[email protected]>
fix lint issue

Signed-off-by: Cyril Tovena <[email protected]>

removes useless loop

Signed-off-by: Cyril Tovena <[email protected]>

This should be on v11 not v10.

Signed-off-by: Cyril Tovena <[email protected]>

s/metricConstRangeKeyV1/labelNamesRangeKeyV1/

The code was first written to store the entire series, but now changed
to do just labelNames.

Signed-off-by: Goutham Veeramachaneni <[email protected]>

Add note about v11 being experimental.

Signed-off-by: Goutham Veeramachaneni <[email protected]>
@gouthamve gouthamve merged commit 77a09cc into cortexproject:master Oct 25, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants