Design Proposal: Use multiple buckets for s3 chunk storage for manual cleanup

# Abstract

Add a feature that uses multiple buckets to store spans of data over a given time frame. 
Modify table manager to support issuing `s3.DeleteBucket` requests during cleanup when a given bucket has aged out. 

# Reason(s)

Presently S3 chunk cleanup is expected to be performed by the S3 storage providers retention policy. Some S3 implementations (ex: DigitalOcean Spaces) do not provide retention policies. This design is a way to keep cortex data more self-contained (not requiring external dependencies to manage the life-cycle of data). It is meant to address issue #1591

# Goals

- Self-contained data life-cycle for cortex chunks. 
- Efficient delete processing (fewest number of API requests)
- Supporting custom retention periods

# Implementation

A new set of flags would be added:
   `--s3.retain-by-bucket-period` that sets the time duration of each bucket
   `--s3.bucket-prefix` that sets the prefix of the buckets that will be created

**Ingester** will upload chunks to S3 buckets based on the `Through` timestamp of the chunk, into the given bucket that the chunk would fall into. So it's possible that a chunk that spans bucket boundaries would still be retained by being placed into a newer bucket.

**Querier** will select the bucket to retrieve chunks from based on the `Through` timestamp that chunk would place it. 

**Table-Manager** will maintain it's periodic calls to `DeleteChunksBefore` which for s3 bucket clients will call the `s3.DeleteBucket` calls to buckets that exceed the retention policy set in table manger. 

The complex part of this proposal is the creation life-cycle of the buckets. Since there are multiple ingesters one of them would have to be the "leader" that creates buckets.

 Instead I believe table-manger should be responsible for the creation of the buckets as well. Maintaining a rolling window of buckets to ensure there's always an available bucket to write to. Also often with s3 providers bucket names are global. So on creation of a new bucket it would likely be best to be of the format `{{bucket_prefix}}-{{uuid}}-{{through_timestamp}}`

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Design Proposal: Use multiple buckets for s3 chunk storage for manual cleanup #1594

Abstract

Reason(s)

Goals

Implementation

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Design Proposal: Use multiple buckets for s3 chunk storage for manual cleanup #1594

Description

Abstract

Reason(s)

Goals

Implementation

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions