Skip to content

Failure to compact due to maximum index size 64 GiB #4705

@thejosephstevens

Description

@thejosephstevens

Describe the bug
Over the weekend we significantly expanded one of our clusters, pushing ~153M timeseries to our Cortex 1.11.1 cluster in a day.

To Reproduce
Steps to reproduce the behavior:

  1. Start Cortex 1.11.1 in single-tenant mode as microservices
  2. Push ~153M timeseries to it
  3. See the Compactor get to Stage 3 compaction and then fail with an error like this
level=error ts=2022-04-04T16:11:41.167458322Z caller=compactor.go:537 component=compactor msg="failed to compact user blocks" user=fake err="compaction: group 0@5679675083797525161: compact blocks [data/compact/0@5679675083797525161/01FZNK9ZNWHJWRXB2YNHZ9W8H9 data/compact/0@5679675083797525161/01FZPV9DZ1PGV0HHV9007P8ZWM]: \"data/compact/0@5679675083797525161/01FZTCHWRJHFN419FJPDM0MQNW.tmp-for-creation/index\" exceeding max size of 64GiB"

The two blocks referenced by this error are 12-hour blocks at level-3 compaction, each of which has an index of ~38 GB (summing together to ~76 GB > 64 GB).

Expected behavior
There should be a way of skipping this, forcing compaction, sharding, or something.

There's an upstream Thanos patch here which allows skipping compaction but it appears to not be used by Cortex today (found by @alvinlin123 in Cortex Slack). There's also a Thanos change here which would automatically skip past compaction if the block is too large.

Mimir appears to have the ability to get past this by sharding during compaction so multiple blocks are produced per day, each of which can have a smaller index.

Environment:

  • Infrastructure: Kubernetes (GKE)
  • Deployment tool: Jinja (derived from the cortex-jsonnet manifest)

Storage Engine

  • Blocks
  • Chunks

Additional Context

Metadata

Metadata

Assignees

No one assigned

    Labels

    keepaliveSkipped by stale bot

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions