Skip to content

Proposal: Remove support for TSDB blocks transfer between ingesters #2966

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
pstibrany opened this issue Jul 31, 2020 · 6 comments · Fixed by #2996
Closed

Proposal: Remove support for TSDB blocks transfer between ingesters #2966

pstibrany opened this issue Jul 31, 2020 · 6 comments · Fixed by #2996
Labels
storage/blocks Blocks storage engine

Comments

@pstibrany
Copy link
Contributor

pstibrany commented Jul 31, 2020

When ingesters start and stop, they support transfer of data from leaving ingester to joining. This mechanism allows for fast rollouts of new ingesters, because otherwise ingesters would need to flush data from memory to storage, which can take a long time (up to 1h). Transfers were the primary solution to deal with slow rollouts, until WAL support for chunks was introduced in ingesters, which made transfers obsolete. Ingesters with WAL don't need to flush data to long-term storage, as they will reload same data into memory from WAL after restart.

When TSDB blocks support was introduced to ingesters, they also supported transfers.

In this issue, we suggest to remove support for transfers of TSDB blocks between ingesters. This doesn't affect transfer of chunks.

Reasons for this proposal:

  • Transfers are rarely used with blocks storage: Blocks are always meant to be used with persistent disks to store the blocks, and since each TSDB also has WAL, ingester restart doesn't require flushing of blocks to storage. Restarts are already quite fast.
  • Transfers complicate ingesters logic when using blocks. Removing transfers removes extra complexity, both in the code and operations.
  • Dropping transfer support of blocks would simplify design of solution for issue Blocks storage unable to ingest samples older than 1h after an outage #2366.
  • There is an integration test for blocks transfers, but apart from that, it's not clear if anyone uses the feature.
  • Blocks are still experimental feature, deprecating blocks transfer now doesn't break feature-guarantees. Once blocks are marked as production ready, it will be more difficult to remove this support.
@brancz
Copy link
Contributor

brancz commented Jul 31, 2020

Have you considered doing the same as thanos-receive and just getting rid of the state entirely? It flushes the head block to and uploads it. Yes, this can produce non optimal blocks, however, if we put some work into chunk merging at compaction time, this can be optimized away. Just a thought and it could allow for Thanos and Cortex to be even more similar. :)

@pstibrany
Copy link
Contributor Author

pstibrany commented Jul 31, 2020

Cortex has an option for flushing all TSDBs on shutdown, but problem is that ingester may have many TSDBs and flushing on shutdown may run out of time due to termination grace period.

For scale-downs I prefer to use /shutdown endpoint, which does the flush (without no termination timeout) and then stops ingester component. This also doesn't rely on flush-on-shutdown flag being set or not, /shutdown always does flush.

@brancz
Copy link
Contributor

brancz commented Jul 31, 2020

We have a kubernetes controller planned (actually the controller exists but doesn't do this yet) that ensures eventual upload should it not succeed on "normal" shutdown (essentially schedules jobs with the "previous" PVC until upload succeeds, cleans up the volume and recycles it). Practice will show if this is "too eventually consistent", but if not then I think more thought through config rollout schemes can solve this as well.

@pstibrany
Copy link
Contributor Author

We have a kubernetes controller planned (actually the controller exists but doesn't do this yet) that ensures eventual upload should it not succeed on "normal" shutdown (essentially schedules jobs with the "previous" PVC until upload succeeds, cleans up the volume and recycles it). Practice will show if this is "too eventually consistent", but if not then I think more thought through config rollout schemes can solve this as well.

Nice! Our flusher component can do the same (it's a job that flushes all TSDB data from PVC), but we don't have a controller for it.

@thorfour
Copy link
Contributor

+1 for getting rid or transfers!

@ranton256
Copy link
Contributor

Given the WAL already having an implied need for stateful set with block store, removing the transfer support between ingestors does not seem like it would cause any additional hurdles and would simplify things.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
storage/blocks Blocks storage engine
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants