Ingester flush should balance across tables and hashes #724

bboreham · 2018-02-26T19:03:48Z

Currently flushing is driven off a priority queue in order of the start time of the first unflushed chunk in a series.

This is especially bad when we’ve scaled down write capacity on the previous table and someone sends some samples from that time.

It’s also bad if there are a lot of timeseries that map to the same index hash (i.e. same metric name for same user on same day), because that creates a hot-spot in the data store.

It would be better to have separate queues for each table, and to send a spread of data which is known to have different hash keys. Even better if you can batch it up to reduce network overhead to the back end.

gouthamve · 2020-05-07T14:56:10Z

related #684

alanprot · 2022-08-09T20:49:29Z

Closing as chunks is deprecated.

bboreham mentioned this issue Mar 4, 2018

Hotspotting on keys #733

Closed

bboreham added the component/ingester label Mar 26, 2018

tomwilkie added the type/performance label Aug 13, 2018

bboreham added the storage/chunks Chunks storage engine label Jun 10, 2021

alanprot closed this as completed Aug 9, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ingester flush should balance across tables and hashes #724

Ingester flush should balance across tables and hashes #724

bboreham commented Feb 26, 2018

gouthamve commented May 7, 2020

alanprot commented Aug 9, 2022

Ingester flush should balance across tables and hashes #724

Ingester flush should balance across tables and hashes #724

Comments

bboreham commented Feb 26, 2018

gouthamve commented May 7, 2020

alanprot commented Aug 9, 2022