Skip to content

Compactors can't keep up with the load #3753

@agardiman

Description

@agardiman

Describe the bug
The number of blocks per tenant increases over time instead of going down.
At any given time some compactors are idle (and basically remain idle for all the time until an eventual restart) even if there are many compactions still needed for tenants that are not under compaction in a given moment.

To Reproduce
Steps to reproduce the behavior:

  1. 9 tenants, about 42M active time series per tenant
  2. 12 compactors
  3. In the compactor v1.6 dashboard, both number of blocks per each tenant and the average number of blocks are increasing over time

Expected behavior

  • the number of blocks for every tenant and the overall average to decrease over time
  • if there are X tenant and X compactors, all the X compactors to be busy compacting

Environment:

  • Infrastructure: Kubernetes
  • Deployment tool: jsonnet
  • AWS, S3
  • 12 compactors with 2CPUs and 5GB of RAM (50GB limit)
  • 9 tenants with a total of 381M active time series and 6M reqps. The metrics are evenly split between tenants.

Storage Engine

  • [X ] Blocks
  • Chunks

Additional Context
There are 3 issues:

  1. a compactor does not keep up with the load of one tenant if the tenant is big enough. I tried initially with 7 tenants with 55M active series per tenant but even if a tenant was compacted by a compactor, its blocks kept increasing. So I tried splitting the 381M active time series between 9 tenants, reducing the number of active time series per tenant to 42. But the number of blocks per tenant is still increasing over time.
  2. If there are a few tenants and a few compactors, the chance that a compactor will not be responsible for any tenant and another compactor is responsible for more than one is high because of the hashing distribution probably not working well when number of tenants is relatively low.
  3. it's not clear from logs or dashboards how to find the bottleneck or if there is anything wrong

image

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions