Skip to content

Global marker missing for block deletion #4872

Closed
@danielblando

Description

@danielblando

Describe the bug
Cortex after finishing compacting a block marks it to be deleted. The way cortex does that is adding a deletion-marker at the storage level. Currently, Cortex uploads two markers when marking a block to delete. The first on the block storage level and the second on the global markers folder for the bucket.

if err := b.parent.Upload(ctx, name, bytes.NewBuffer(body)); err != nil {
return err
}
// Upload it to the global markers location too.
return b.parent.Upload(ctx, globalMarkPath, bytes.NewBuffer(body))

The code upload first to the block layer and then to the global layer.

From what I could infer of the block_cleaner, the cleaner which delete all blocks marked to be deleted uses the global folder to decide which blocks need to be deleted. It does not check for the block's deletion marker.

Cleaner
https://github.com/cortexproject/cortex/blob/master/pkg/compactor/blocks_cleaner.go#L340

Index update
https://github.com/cortexproject/cortex/blob/master/pkg/storage/tsdb/bucketindex/updater.go#L172

In a situation where the markers_bucket_client upload the marker to the bucket and failed to upload to the global markers, the block would not be deleted.

Also another issue, when marking a block to delete again thanos checks if the marker already exist and if does it will not re upload
https://github.com/thanos-io/thanos/blob/d1405e4a2ec2e7bb47cd46ce5648b5809fa77579/pkg/block/block.go#L178-L187
Not giving a chance to re upload the global marker

Is there a retry that I am missing which solve the issue?
Should we revert the order of the upload markers?

To Reproduce
Steps to reproduce the behavior:

  1. Start Cortex (SHA or version)
  2. Perform Operations(Read/Write/Others)

Hard to reproduce as we need to simulate a failure on the storage layer in the correct moment. One possibility is to run Cortex without the line to upload the global marker just as a test

Expected behavior
The bucket to be deleted

Environment:

  • Infrastructure: Kubernetes
  • Deployment tool: helm

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions