Hotspotting on keys

Similar to #254, we often see periods where the achieved throughput is much lower than provisioned capacity on DynamoDB.  This issue is a bit of an umbrella / brain-dump.

We could use some better tools to investigate this.  E.g. some logging of keys that suffer multiple retries.  The histogram for retries tells me 99.9% are 0, which is nice but not very helpful.

I wonder if the "series index" added in #442 is causing trouble - the hash (partition) key is the same for every chunk for a particular user (instance).  [EDIT] This index is only used to iterate through timeseries for queries that don't have a metric name.  It's unusably slow.

Maybe add some more diversity to the hash key, e.g. add a hex digit derived from the sha, then you have to do 16 reads instead of 1 to scan the whole row, but those 16 will go much faster.

It looks like writes from `ingester_flush.go` to the chunk store do exponential back-off up to the timeout (1 minute), then error out and go back onto the flush queue, whereupon we start the exponential backoff again at 100ms.  And when we start again we re-write _all_ the keys even though just one was outstanding.  So it would be better to keep trying for longer.

Related: #724 


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Hotspotting on keys #733

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Hotspotting on keys #733

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions